Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diamond steps not working, Missing input files for rule run_diamond_blastx: #208

Open
aureliendejode opened this issue Mar 7, 2024 · 6 comments

Comments

@aureliendejode
Copy link

Hello,
I am running the pipeline version v4.3.4 on a slurm cluster.
The first steps work fine (windowmasker, minimap, chunk_stats, busco and cov_stats).
I get an error for the diamond steps.
Here are the logs.

diamond

more diamond/run_sub_pipeline.log
Creating specified working directory /grps2/bmtitus/analysis/Comparative_Genomic/Genome_assemblies/Stichodactyla_haddonii/blobtoolkit/blob_work_dir/SHADDVT3/diamond.
Building DAG of jobs...
MissingInputException in rule run_diamond_blastx in file /share/apps/miniconda3/py311_23.5.2-0/installed/lib/python3.11/site-packages/blobtoolkit-snakefiles/rules/run_diamond
_blastx.smk, line 1:
Missing input files for rule run_diamond_blastx:
output: SHADDVT3_asm_bp_hap1_p_ctg.diamond.reference_proteomes.out.raw
wildcards: assembly=SHADDVT3_asm_bp_hap1_p_ctg
affected files:
/share/apps/genetic-databases/uniprot/reference_proteomes.dmnd

diamond_blastp

2/bmtitus/analysis/Comparative_Genomic/Genome_assemblies/Stichodactyla_haddonii/blobtoolkit/blob_work_dir/SHADDVT3/diamond_blastp.
Building DAG of jobs...
MissingInputException in rule run_diamond_blastp in file /share/apps/miniconda3/py311_23.5.2-0/installed/lib/python3.11/site-packages/blobtoolkit-snakefiles/rules/run_diamond
_blastp.smk, line 1:
Missing input files for rule run_diamond_blastp:
output: SHADDVT3_asm_bp_hap1_p_ctg.diamond.busco_genes.out
wildcards: assembly=SHADDVT3_asm_bp_hap1_p_ctg
affected files:
/share/apps/genetic-databases/uniprot/reference_proteomes.dmnd

Looks like the same error to me. Looking online I found that type of error with an older version of the pipeline (blobtoolkit/pipeline#9).

So I am wondering if this could be due to the way I specified the taxrule in my config file because I used that config file with an older version of the pipeline.

Here is my config file:

assembly:
accession: SHADDVT3
file: /grps2/bmtitus/analysis/Comparative_Genomic/Genome_assemblies/Stichodactyla_haddonii/blobtoolkit/SHADDVT3_asm_bp_hap1_p_ctg.fasta.gz
level: contig
prefix: SHADDVT3_asm_bp_hap1_p_ctg
span: 459760080
busco:
download_dir: /share/apps/genetic-databases/busco
basal_lineages:
- eukaryota_odb10
- bacteria_odb10
- archaea_odb10
lineages:

  • metazoa_odb10
  • eukaryota_odb10
  • bacteria_odb10
  • archaea_odb10
    reads:
    single:
  • file: /grps2/bmtitus/analysis/Comparative_Genomic/Genome_assemblies/Stichodactyla_haddonii/blobtoolkit/XBTUA20231220R84050PL40740011D01bc2031bc2031_hifireads.fastq.gz
    prefix: XBTUA20231220R84050PL40740011D01bc2031bc2031
    platform: PACBIO_SMRT
    settings:
    blast_chunk: 100000
    blast_max_chunks: 10
    blast_min_length: 1000
    blast_overlap: 0
    stats_chunk: 1000
    stats_windows:
  • 0.1
  • 0.01
  • 100000
  • 1000000
    taxdump: /share/apps/genetic-databases/taxdump
    tmp: /tmp
    similarity:
    blastn:
    name: nt
    path: /share/apps/genetic-databases
    defaults:
    evalue: 1.0e-10
    import_evalue: 1.0e-25
    max_target_seqs: 10
    taxrule: bestsumorder
    diamond_blastp:
    import_max_target_seqs: 100000
    name: reference_proteomes
    path: /share/apps/genetic-databases/uniprot
    taxrule: blastp=buscogenes
    diamond_blastx:
    name: reference_proteomes
    path: /share/apps/genetic-databases/uniprot
    taxon:
    name: Stichodactyla haddoni

version: 2

Any idea what is going on ?

Thanks for your help

@rjchallis
Copy link
Contributor

Hi I'll take a look at this to try to work out what is happening. Unfortunately the formatting of your config file has been lost - could you link it as a file or paste it inside a code block so I can see how it looks

@aureliendejode
Copy link
Author

Hope this works

assembly:
  accession: SHADDVT3
  file: /grps2/bmtitus/analysis/Comparative_Genomic/Genome_assemblies/Stichodactyla_haddonii/blobtoolkit/SHADDVT3_asm_bp_hap1_p_ctg.fasta.gz
  level: contig
  prefix: SHADDVT3_asm_bp_hap1_p_ctg
  span: 459760080
busco:
  download_dir: /share/apps/genetic-databases/busco 
  basal_lineages:
    - eukaryota_odb10
    - bacteria_odb10
    - archaea_odb10
  lineages:
   - metazoa_odb10
   - eukaryota_odb10
   - bacteria_odb10
   - archaea_odb10
reads:
  single:
   - file: /grps2/bmtitus/analysis/Comparative_Genomic/Genome_assemblies/Stichodactyla_haddonii/blobtoolkit/XBTUA20231220R84050PL40740011D01bc2031bc2031_hifireads.fastq.gz
     prefix: XBTUA20231220R84050PL40740011D01bc2031bc2031
     platform: PACBIO_SMRT
settings:
  blast_chunk: 100000
  blast_max_chunks: 10
  blast_min_length: 1000
  blast_overlap: 0
  stats_chunk: 1000
  stats_windows:
  - 0.1
  - 0.01
  - 100000
  - 1000000
  taxdump: /share/apps/genetic-databases/taxdump
  tmp: /tmp
similarity:
  blastn:
    name: nt
    path: /share/apps/genetic-databases
  defaults:
    evalue: 1.0e-10
    import_evalue: 1.0e-25
    max_target_seqs: 10
    taxrule: bestsumorder
  diamond_blastp:
    import_max_target_seqs: 100000
    name: reference_proteomes
    path: /share/apps/genetic-databases/uniprot
    taxrule: blastp=buscogenes
  diamond_blastx:
    name: reference_proteomes
    path: /share/apps/genetic-databases/uniprot
taxon:
  name: Stichodactyla haddoni

version: 2

@aureliendejode
Copy link
Author

Is the formatting ok now ? I pasted the config file in a code block

@rjchallis
Copy link
Contributor

Hi yes - the formatting is clearer now. Thanks for bumping this and sorry this has taken me a while to get onto - I'll take a look soon!

@rjchallis
Copy link
Contributor

From the error message, it looks like reference_proteomes.dmnd is not in /share/apps/genetic-databases/uniprot. If you check this file exists and if so where it is located, you should be able to run the diamond steps (possibly updating the path in the config file first).

@aureliendejode
Copy link
Author

Thanks for this.
Yes it looks like it is missing. I think some of the formatting commands for this did not go through.
I'll ask the clusters IT to change that and retest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants