Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Step 3 gets to 85% and crashes. #437

Open
Pgjhmb opened this issue Feb 14, 2021 · 9 comments
Open

Step 3 gets to 85% and crashes. #437

Pgjhmb opened this issue Feb 14, 2021 · 9 comments

Comments

@Pgjhmb
Copy link

Pgjhmb commented Feb 14, 2021

Hello, I am running ipyrad on ezrad data for assemblies of consensus sequences within a single genus. It is believed that the species are distantly divergent. I have been running denovo without a reference on each of my samples, but each time (third time now) it has crashed after ~9 days, getting to the 85% point before doing so. Very frustrating.

Below is my params file. The cpu memory shouldn't be an issue.

I have 21 samples, between 2.5 million to 10 million reads of 300-750 bp long.

I would love any insight. I have used a whole month of crashed scripting.....

HALLLLP!
Thanks.
P

------- ipyrad params file (v.0.9.62)-------------------------------------------
Tridacna-1_26_21 ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps
/data/paolo/Tridacna_mt_phylogeny/Paired_Reads ## [1] [project_dir]: Project dir (made in curdir if not present)
## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files
## [3] [barcodes_path]: Location of barcodes file
/data/paolo/Tridacna_mt_phylogeny/Paired_Reads/*.fastq ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files
denovo ## [5] [assembly_method]: Assembly method (denovo, reference)
## [6] [reference_sequence]: Location of reference sequence file
rad ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc.
GATC, ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2)
5 ## [9] [max_low_qual_bases]: Max low quality base calls (Q<20) in a read
33 ## [10] [phred_Qscore_offset]: phred Q score offset (33 is default and very standard)
6 ## [11] [mindepth_statistical]: Min depth for statistical base calling
6 ## [12] [mindepth_majrule]: Min depth for majority-rule base calling
10000 ## [13] [maxdepth]: Max cluster depth within samples
0.85 ## [14] [clust_threshold]: Clustering threshold for de novo assembly
0 ## [15] [max_barcode_mismatch]: Max number of allowable mismatches in barcodes
2 ## [16] [filter_adapters]: Filter for adapters/primers (1 or 2=stricter)
35 ## [17] [filter_min_trim_len]: Min length of reads after adapter trim
2 ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences
0.05 ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus
0.05 ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus
4 ## [21] [min_samples_locus]: Min # samples per locus for output
0.2 ## [22] [max_SNPs_locus]: Max # SNPs per locus
8 ## [23] [max_Indels_locus]: Max # of indels per locus
0.5 ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus
0, 0, 0, 0 ## [25] [trim_reads]: Trim raw read edges (R1>, <R1, R2>, <R2) (see docs)
0, 0, 0, 0 ## [26] [trim_loci]: Trim locus edges (see docs) (R1>, <R1, R2>, <R2)
p, s, l ## [27] [output_formats]: Output formats (see docs)
## [28] [pop_assign_file]: Path to population assignment file
## [29] [reference_as_filter]: Reads mapped to this reference are removed in step 3

@isaacovercast
Copy link
Collaborator

Hello P,
What makes you think the memory isn't an issue? The number of reads per sample and the locus length will both contribute to consume a TON of RAM during step 3. How much ram do you have and how many cores? Also, what is the exact text of the error message?

@Pgjhmb
Copy link
Author

Pgjhmb commented Feb 14, 2021 via email

@isaacovercast
Copy link
Collaborator

Hey Paolo,
Can you show me the error message from when it crashes?
-isaac

@isaacovercast
Copy link
Collaborator

The long reads can also cause massive problems during clustering and alignment given that the distal ends of R1 and R2 can obtain very high error rates. You might consider looking at the results of fastqc for a couple of your samples and using the trim_reads parameter in step 2 to trim off the regions with very low base quality.

@Pgjhmb
Copy link
Author

Pgjhmb commented Feb 14, 2021 via email

@Pgjhmb
Copy link
Author

Pgjhmb commented Feb 15, 2021 via email

@isaacovercast
Copy link
Collaborator

Wait, when you say "it has crashed" you mean the server crashes? I thought you were talking about the ipyrad process. If the server is crashing that is a hardware issue my friend.

Did you consider this: "You might consider looking at the results of fastqc for a couple of your samples and using the trim_reads parameter in step 2 to trim off the regions with very low base quality."

@Pgjhmb
Copy link
Author

Pgjhmb commented Feb 16, 2021 via email

@isaacovercast
Copy link
Collaborator

Regarding my previous message: Wait, when you say "it has crashed" you mean the server crashes? I thought you were talking about the ipyrad process. If the server is crashing that is a hardware issue my friend. <- Was the original crash a hardware failure?

If you want help you need to provide more specific information, like "what samples did you retain for the single library analysis" "what parameters did you use in your params file" what was the step the assembly crashed in, what was all of the output from the assembly run and all of the output of the error message.

When you say you tried to trim the reads beforehand, how did you do this, what were the exact settings you used for trimming, did you run ipyrad after trimming and if so what happened and how was it different from pre-trimming behavior.

This message is self-explanatory: "Message: datatype + assembly_method combo not currently supported."

This is evolving away from a very specific error in ipyrad to more of a help thread, which is more appropriate for the gitter channel: https://gitter.im/dereneaton/ipyrad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants