Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pacbio header line format error #27

Closed
JonEilers opened this issue Sep 29, 2021 · 16 comments
Closed

Pacbio header line format error #27

JonEilers opened this issue Sep 29, 2021 · 16 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@JonEilers
Copy link

Hi,
I am getting a pacbio fasta header format error and I was wondering what format it is looking for? Here is a link to the terminal output.

The pacbio fasta headers look like this: >pacbio_SRR6282347.1.1 1 length=6524

There is a second error message I am not sure about either. The log file shows a segmentation dump

/bin/bash: line 5: 208846 Segmentation fault      (core dumped) datander '-T70' -s126 -l500 -e0.7 Ajap_genome.2
@a-ludi
Copy link
Owner

a-ludi commented Sep 29, 2021

Hi,

TL;DR: Change reads_type in snakemake.yml to something other than PACBIO_SMRT, e.g. PACBIO_SRR. (see #1 for an explanation)

The header format of PacBio reads is >{smrt_cell}/{well}/{hq_begin}_{hq_end} RQ={qual} where

  • {smrt_cell} is an alpha-numeric ID of the SMRT cell that produced the read,
  • {well} is a numeric ID of the well in the SMRT cell where the read happened,
  • {hq_begin} is the position of the first base with "high quality",
  • {hq_end} is the position of the last high-quality base and
  • {qual} is a fraction between 0 and 1, the read quality estimate.

That beeing said, your headers look like you do not have all of this info and it is not even required for DENTIST. So the easiest option is to ignore all this by changing reads_type in snakemake.yml (see above).

Cheers!

@a-ludi a-ludi self-assigned this Sep 29, 2021
@a-ludi a-ludi added the help wanted Extra attention is needed label Sep 29, 2021
@JonEilers
Copy link
Author

Thanks! I changed the read type and reran it. Got a new error: Fasta line is too long

INFO:    Converting SIF file to temporary sandbox...
File SRR6282347.fasta, Line 6: Fasta line is too long (> 9998 chars)
INFO:    Cleaning up image...
[Wed Sep 29 08:38:30 2021]
Error in rule reads2db:
    jobid: 6
    output: /home/jon/Working_Files/dentist/SRR6282347.dam, /home/jon/Working_Files/dentist/.SRR6282347.bps, /home/jon/Working_Files/dentist/.SRR6282347.hdr, /home/jon/Working_Files/dentist/.SRR6282347.idx
    shell:
        fasta2DAM /home/jon/Working_Files/dentist/SRR6282347.dam SRR6282347.fasta && DBsplit -x1000 -a -s200 /home/jon/Working_Files/dentist/SRR6282347.dam
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

@a-ludi
Copy link
Owner

a-ludi commented Sep 29, 2021

You can fix this by running the FASTA through fold:

mv SRR6282347.fasta SRR6282347.fasta~
fold -w1000 SRR6282347.fasta~ > SRR6282347.fasta

Thanks for reporting these issues. I will improve the workflow to take care of these things by itself.

@JonEilers
Copy link
Author

That worked perfectly. Have one more error message for you.

Error in rule tandem_alignment_block:
    jobid: 16
    output: /home/jon/Working_Files/dentist/TAN.Ajap_genome.1.las
    log: /home/jon/Working_Files/dentist/logs/tandem-alignment.Ajap_genome.1.log (check log file(s) for error message)
    shell:
        
            {
                cd /home/jon/Working_Files/dentist
                datander '-T70' -s126 -l500 -e0.7 Ajap_genome.1
            } &> /home/jon/Working_Files/dentist/logs/tandem-alignment.Ajap_genome.1.log
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

log file has this: /bin/bash: line 5: 241121 Segmentation fault (core dumped) datander '-T70' -s126 -l500 -e0.7 Ajap_genome.1

@a-ludi
Copy link
Owner

a-ludi commented Sep 30, 2021

It looks like you have configured max_threads: 70 in snakemake.yml. Just reduce it a bit, say to <=32. This controls basically how many threads a single process may get. There is not much benefit in having many threads per process because the speedup does not scale linearly with the number of threads.

Instead, if you are running Snakemake on a single, big machine then you can tell it with --cores how many threads it is allowed to utilize at any time. It will take care of launching as many jobs as possible so the cores get utilized.

I hope this answers your question. Maybe I should rename max_threads to threads_per_process or something. What do you think?

@JonEilers
Copy link
Author

Hmm, sounds like a good idea. Maybe add a sentence in the readme file about using <=32 cores?
Have another error :D

Error in rule mask_tandem:
    jobid: 14
    output: /home/jon/Working_Files/dentist/.Ajap_genome.tan.anno, /home/jon/Working_Files/dentist/.Ajap_genome.tan.data
    log: /home/jon/Working_Files/dentist/logs/mask-tandem.Ajap_genome.log (check log file(s) for error message)
    shell:
        Catrack -v / tan &> /home/jon/Working_Files/dentist/logs/mask-tandem.Ajap_genome.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

log file has Catrack: Cannot open /.db for 'r'

@a-ludi
Copy link
Owner

a-ludi commented Sep 30, 2021

I am have no idea what happened there. Looks a bit like a bug in Snakemake. Have you tried simply starting the workflow once more?

@JonEilers
Copy link
Author

You called it correctly. I cleaned the directory out and restarted Snakemake and it worked. At least for awhile.

Error in rule process:
    jobid: 1238
    output: /home/jon/Working_Files/dentist/insertions/batch.79.db
    log: /home/jon/Working_Files/dentist/logs/process.79.log (check log file(s) for error message)
    shell:
        dentist process --config=dentist.json  --threads=4 --auxiliary-threads=6 --mask=dentist-self-H,tan-H,dentist-reads-H --batch=3950..4000 /home/jon/Working_Files/dentist/Ajap_genome.dam /home/jon/Working_Files/dentist/SRR6282347.dam /home/jon/Working_Files/dentist/pile-ups.db /home/jon/Working_Files/dentist/insertions/batch.79.db 2> /home/jon/Working_Files/dentist/logs/process.79.log

Log file contents

Error: darg.ArgParseError@darg/source/darg.d(1281): Expected a value for positional argument '<out:insertions>'
----------------
??:? pure dentist.commandline.OptionsFor!(11).OptionsFor darg.parseArgs!(dentist.commandline.OptionsFor!(11).OptionsFor).parseArgs(const(immutable(char)[][]), darg.Config) [0x55b9154c4f66]
??:? dentist.commandline.ReturnCode dentist.commandline.runCommand!(11).runCommand(in immutable(char)[][]) [0x55b9154ba276]
??:? dentist.commandline.ReturnCode dentist.commandline.run(in immutable(char)[][]) [0x55b915449b41]
??:? _Dmain [0x55b9152d8704]

@a-ludi
Copy link
Owner

a-ludi commented Oct 4, 2021

Again I would suggest to just rerun snakemake – do not clean up the directory. Snakemake keeps track of what is left to be done.

@JonEilers
Copy link
Author

Gotcha, I reran snakemake without cleaning up the directory and it gives the same error message and the log contains the same info, just a different jobid/batch

@a-ludi
Copy link
Owner

a-ludi commented Oct 5, 2021

I am a bit puzzled because the "missing argument" is actually present as far as I can tell.

Can you tell me which version of DENTIST you are using? Please verify with one the commands below:

# if you are using pre-compiled binaries:
./bin/dentist --version

# if you are using singularity:
singularity run docker://aludi/dentist:stable dentist --version

Excepted output for Singularity:

INFO:    Using cached SIF image
dentist v1.0.2-1-gd85a86f (commit d85a86fda8da241b0de3d3b8d3b02cf9e3405302)

Copyright © 2018, Arne Ludwig <arne.ludwig@posteo.de>

Subject to the terms of the MIT license, as written in the included LICENSE file

@JonEilers
Copy link
Author

JonEilers commented Oct 5, 2021

(singularity) jon@jon-PowerEdge-R910:~/Working_Files/dentist$ singularity run docker://aludi/dentist:stable dentist --version
INFO:    Using cached SIF image
INFO:    Converting SIF file to temporary sandbox...
dentist v1.0.2-1-gd85a86f (commit d85a86fda8da241b0de3d3b8d3b02cf9e3405302)

Copyright © 2018, Arne Ludwig <arne.ludwig@posteo.de>

Subject to the terms of the MIT license, as written in the included LICENSE file
INFO:    Cleaning up image...

If it's useful to know below are the versions of singularity and snakemake that are installed in the conda environment. Here is a list of everything installed into the conda environment.

  • singularity 3.7.1 hca90b9e_0 conda-forge
  • snakemake 6.8.1 hdfd78af_0 bioconda

@a-ludi
Copy link
Owner

a-ludi commented Oct 6, 2021

Thanks, but I still do not understand the problem. It should be working just fine. 😆

Could you try running the command manually with

singularity run docker://aludi/dentist:stable dentist process --config=dentist.json  --threads=4 --auxiliary-threads=6 --mask=dentist-self-H,tan-H,dentist-reads-H --batch=3950..4000 /home/jon/Working_Files/dentist/Ajap_genome.dam /home/jon/Working_Files/dentist/SRR6282347.dam /home/jon/Working_Files/dentist/pile-ups.db /home/jon/Working_Files/dentist/insertions/batch.79.db 2> /home/jon/Working_Files/dentist/logs/process.79.log

@JonEilers
Copy link
Author

JonEilers commented Oct 6, 2021

same results. This may be a silly question, but I was looking at the command and noticed that the process-pileups command wants 5 positional arguments and command provided by the pipeline only has four. it's missing the "ignored"? Would that cause this error?

singularity run docker://aludi/dentist:stable \
	dentist process \
		--config=dentist.json  \
		--threads=4 \
		--auxiliary-threads=6 \
		--mask=dentist-self-H,tan-H,dentist-reads-H \
		--batch=3950..4000 \
		/home/jon/Working_Files/dentist/Ajap_genome.dam \
		/home/jon/Working_Files/dentist/SRR6282347.dam \
		/home/jon/Working_Files/dentist/pile-ups.db \
		/home/jon/Working_Files/dentist/insertions/batch.25.db 2> /home/jon/Working_Files/dentist/logs/process.25.log

vs

INFO:    Using cached SIF image
INFO:    Converting SIF file to temporary sandbox...
Error: darg.ArgParseError@darg/source/darg.d(1281): Expected a value for positional argument '<out:insertions>'
----------------
??:? pure dentist.commandline.OptionsFor!(11).OptionsFor darg.parseArgs!(dentist.commandline.OptionsFor!(11).OptionsFor).parseArgs(const(immutable(char)[][]), darg.Config) [0x55a81770df66]
??:? dentist.commandline.ReturnCode dentist.commandline.runCommand!(11).runCommand(in immutable(char)[][]) [0x55a817703276]
??:? dentist.commandline.ReturnCode dentist.commandline.run(in immutable(char)[][]) [0x55a817692b41]
??:? _Dmain [0x55a817521704]

Usage: dentist process-pile-ups [--allow-single-reads]
                                [--auxiliary-threads=num-threads]
                                [--bad-fraction=<frac>]
                                [--batch=<idx-spec>[,<idx-spec>...]]
                                [--config=<config-json>]
                                [--daccord=<daccord-option>[,<daccord-option>...]]
                                [--daligner-consensus=<daligner-option>[,<daligner-option>...]]
                                [--daligner-reads-vs-reads=<daligner-option>[,<daligner-option>...]]
                                [--daligner-self=<daligner-option>...]
                                [--datander-ref=<datander-option>[,<datander-option>...]]
                                [--dust-reads=<dust-option>[,<dust-option>...]]
                                [--help] [--keep-temp]
                                [--mask=<name>[,<name>...]]
                                [--max-chain-gap=<bps>] [--max-indel=<bps>]
                                [--max-relative-overlap=<fraction>]
                                [--min-anchor-length=<uint>]
                                [--min-reads-per-pile-up=<ulong>]
                                [--min-relative-score=<fraction>]
                                [--min-score=<int>] [--only=<OnlyFlag>]
                                [--proper-alignment-allowance=num] [--quiet]
                                [--revert=<option>[,<option>...]]
                                [--threads=<uint>] [--tmpdir=<string>] [--usage]
                                [--verbose] <in:reference> <in:reads> <ignored>
                                <in:pile-ups> <out:insertions>
INFO:    Cleaning up image...

@a-ludi
Copy link
Owner

a-ludi commented Oct 7, 2021

Well spotted! This means that the workflow (Snakefile) and DENTIST are somewhat out of sync. I would suggest to update everything to the latest version:

  1. Update your Snakefile to the latest version.
  2. Rename max_threads to threads_per_process in snakemake.yml (see 546bdbf).
  3. Make sure that DENTIST v2.0.0 is being used by specifying dentist_container: "docker://aludi/dentist:v2.0.0" in snakemake.yml.

Then retry by launching snakemake once more.

@JonEilers
Copy link
Author

I am guessing the below error has to do with limits on downloading from docker and not with dentist itself?

InputFunctionException in line 1776 of /home/jon/Working_Files/dentist/Snakefile:
Error:
  Exception: failed to get alignment commands: FATAL:   Unable to handle docker://aludi/dentist:v2.0.0 uri: failed to get checksum for docker://aludi/dentist:v2.0.0: Error reading manifest v2.0.0 in docker.io/aludi/dentist: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

Wildcards:
  block_reads=26
Traceback:
  File "/home/jon/Working_Files/dentist/Snakefile", line 243, in secondary_expand
  File "/home/jon/Working_Files/dentist/Snakefile", line 1793, in <lambda>
  File "/home/jon/Working_Files/dentist/Snakefile", line 421, in generate_options_for

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants