Step 3 gets to 85% and crashes. #437

Pgjhmb · 2021-02-14T19:38:24Z

Hello, I am running ipyrad on ezrad data for assemblies of consensus sequences within a single genus. It is believed that the species are distantly divergent. I have been running denovo without a reference on each of my samples, but each time (third time now) it has crashed after ~9 days, getting to the 85% point before doing so. Very frustrating.

Below is my params file. The cpu memory shouldn't be an issue.

I have 21 samples, between 2.5 million to 10 million reads of 300-750 bp long.

I would love any insight. I have used a whole month of crashed scripting.....

HALLLLP!
Thanks.
P

------- ipyrad params file (v.0.9.62)-------------------------------------------
Tridacna-1_26_21 ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps
/data/paolo/Tridacna_mt_phylogeny/Paired_Reads ## [1] [project_dir]: Project dir (made in curdir if not present)
## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files
## [3] [barcodes_path]: Location of barcodes file
/data/paolo/Tridacna_mt_phylogeny/Paired_Reads/*.fastq ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files
denovo ## [5] [assembly_method]: Assembly method (denovo, reference)
## [6] [reference_sequence]: Location of reference sequence file
rad ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc.
GATC, ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2)
5 ## [9] [max_low_qual_bases]: Max low quality base calls (Q<20) in a read
33 ## [10] [phred_Qscore_offset]: phred Q score offset (33 is default and very standard)
6 ## [11] [mindepth_statistical]: Min depth for statistical base calling
6 ## [12] [mindepth_majrule]: Min depth for majority-rule base calling
10000 ## [13] [maxdepth]: Max cluster depth within samples
0.85 ## [14] [clust_threshold]: Clustering threshold for de novo assembly
0 ## [15] [max_barcode_mismatch]: Max number of allowable mismatches in barcodes
2 ## [16] [filter_adapters]: Filter for adapters/primers (1 or 2=stricter)
35 ## [17] [filter_min_trim_len]: Min length of reads after adapter trim
2 ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences
0.05 ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus
0.05 ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus
4 ## [21] [min_samples_locus]: Min # samples per locus for output
0.2 ## [22] [max_SNPs_locus]: Max # SNPs per locus
8 ## [23] [max_Indels_locus]: Max # of indels per locus
0.5 ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus
0, 0, 0, 0 ## [25] [trim_reads]: Trim raw read edges (R1>, <R1, R2>, <R2) (see docs)
0, 0, 0, 0 ## [26] [trim_loci]: Trim locus edges (see docs) (R1>, <R1, R2>, <R2)
p, s, l ## [27] [output_formats]: Output formats (see docs)
## [28] [pop_assign_file]: Path to population assignment file
## [29] [reference_as_filter]: Reads mapped to this reference are removed in step 3

isaacovercast · 2021-02-14T20:16:25Z

Hello P,
What makes you think the memory isn't an issue? The number of reads per sample and the locus length will both contribute to consume a TON of RAM during step 3. How much ram do you have and how many cores? Also, what is the exact text of the error message?

Pgjhmb · 2021-02-14T20:21:24Z

Hello, Thanks for getting back ot me, We have 256gb of RAM and htop does not indicate we are going into swap. Not even using half the ram. We have 32 cores, and I allocated 19 of them. The server has crashed all 3 times and I am starting to think it is not a coincidence. Thanks for any insight you can provide. I am trying to get some assemblies. Any other programs you recommend I would be interested in hearing. I might set up individual runs with the specific species reference. Paolo

…

On Sun, Feb 14, 2021 at 10:16 AM Isaac Overcast ***@***.***> wrote: Hello P, What makes you think the memory isn't an issue? The number of reads per sample and the locus length will both contribute to consume a TON of RAM during step 3. How much ram do you have and how many cores? Also, what is the exact text of the error message? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#437 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOXHCLVCOKCC4O5MTC4BQNDS7AVSTANCNFSM4XTRB3KA> .

-- *Paolo Marra-Biggs* *University of Hawaii at Mānoa * *Marine Biology MSc. Student* *ToBo Lab* *Phone: 1 (831) 254-3780*

isaacovercast · 2021-02-14T20:24:38Z

Hey Paolo,
Can you show me the error message from when it crashes?
-isaac

isaacovercast · 2021-02-14T20:26:03Z

The long reads can also cause massive problems during clustering and alignment given that the distal ends of R1 and R2 can obtain very high error rates. You might consider looking at the results of fastqc for a couple of your samples and using the trim_reads parameter in step 2 to trim off the regions with very low base quality.

Pgjhmb · 2021-02-14T20:37:45Z

Hey Issac, The internet connection has gone down at the lab (HIMB), so I will send it when once I have access. Thanks

…

On Sun, Feb 14, 2021 at 10:26 AM Isaac Overcast ***@***.***> wrote: The long reads can also cause massive problems during clustering and alignment given that the distal ends of R1 and R2 can obtain very high error rates. You might consider looking at the results of fastqc for a couple of your samples and using the trim_reads parameter in step 2 to trim off the regions with very low base quality. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#437 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOXHCLQOC3XSDP5MUK6QLFLS7AWW3ANCNFSM4XTRB3KA> .

-- *Paolo Marra-Biggs* *University of Hawaii at Mānoa * *Marine Biology MSc. Student* *ToBo Lab* *Phone: 1 (831) 254-3780*

Pgjhmb · 2021-02-15T07:10:14Z

Hey Isaac, I can't confirm that it failed due to the script or the power supply. We have UPSʻs for this very reason, and had them recently replaced. What are the file outputs after step 3? I have the following folders: Tridacna-tmpalign, Tridacna_clust_0.85, and Tridacna_edits. There is no log file, nor an error message. Because the server crashes while using a GNU screen, so it never displays the error log. Is there a way to write within the params file the creation of a log file and an error file? Thanks, Paolo On Sun, Feb 14, 2021 at 10:37 AM Paolo Marra-Biggs <paolomb@hawaii.edu> wrote:

…

Hey Issac, The internet connection has gone down at the lab (HIMB), so I will send it when once I have access. Thanks On Sun, Feb 14, 2021 at 10:26 AM Isaac Overcast ***@***.***> wrote: > The long reads can also cause massive problems during clustering and > alignment given that the distal ends of R1 and R2 can obtain very high > error rates. You might consider looking at the results of fastqc for a > couple of your samples and using the trim_reads parameter in step 2 to > trim off the regions with very low base quality. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#437 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AOXHCLQOC3XSDP5MUK6QLFLS7AWW3ANCNFSM4XTRB3KA> > . > -- *Paolo Marra-Biggs* *University of Hawaii at Mānoa * *Marine Biology MSc. Student* *ToBo Lab* *Phone: 1 (831) 254-3780*

-- *Paolo Marra-Biggs* *University of Hawaii at Mānoa * *Marine Biology MSc. Student* *ToBo Lab* *Phone: 1 (831) 254-3780*

isaacovercast · 2021-02-15T09:35:40Z

Wait, when you say "it has crashed" you mean the server crashes? I thought you were talking about the ipyrad process. If the server is crashing that is a hardware issue my friend.

Did you consider this: "You might consider looking at the results of fastqc for a couple of your samples and using the trim_reads parameter in step 2 to trim off the regions with very low base quality."

Pgjhmb · 2021-02-16T02:06:42Z

Hey Isaac, I tried taking a subset of the samples and just use 1 library, using the same parameters as listed before and I get this error. Encountered an Error. Message: IPyradError: [bwa_index] Pack FASTA... [gzread] Is a directory Then when I tried using a denovo+reference assembly method, I get this error. Encountered an Error. Message: datatype + assembly_method combo not currently supported. I have also tried to trim the reads beforehand using BBduk and trimming based on a quality score. Any help would be appreciated, Paolo On Sun, Feb 14, 2021 at 9:10 PM Paolo Marra-Biggs <paolomb@hawaii.edu> wrote:

…

Hey Isaac, I can't confirm that it failed due to the script or the power supply. We have UPSʻs for this very reason, and had them recently replaced. What are the file outputs after step 3? I have the following folders: Tridacna-tmpalign, Tridacna_clust_0.85, and Tridacna_edits. There is no log file, nor an error message. Because the server crashes while using a GNU screen, so it never displays the error log. Is there a way to write within the params file the creation of a log file and an error file? Thanks, Paolo On Sun, Feb 14, 2021 at 10:37 AM Paolo Marra-Biggs ***@***.***> wrote: > Hey Issac, > > The internet connection has gone down at the lab (HIMB), so I will send > it when once I have access. > > Thanks > > On Sun, Feb 14, 2021 at 10:26 AM Isaac Overcast ***@***.***> > wrote: > >> The long reads can also cause massive problems during clustering and >> alignment given that the distal ends of R1 and R2 can obtain very high >> error rates. You might consider looking at the results of fastqc for a >> couple of your samples and using the trim_reads parameter in step 2 to >> trim off the regions with very low base quality. >> >> — >> You are receiving this because you authored the thread. >> Reply to this email directly, view it on GitHub >> <#437 (comment)>, >> or unsubscribe >> <https://github.com/notifications/unsubscribe-auth/AOXHCLQOC3XSDP5MUK6QLFLS7AWW3ANCNFSM4XTRB3KA> >> . >> > > > -- > *Paolo Marra-Biggs* > *University of Hawaii at Mānoa * > *Marine Biology MSc. Student* > *ToBo Lab* > *Phone: 1 (831) 254-3780* > -- *Paolo Marra-Biggs* *University of Hawaii at Mānoa * *Marine Biology MSc. Student* *ToBo Lab* *Phone: 1 (831) 254-3780*

-- *Paolo Marra-Biggs* *University of Hawaii at Mānoa * *Marine Biology MSc. Student* *ToBo Lab* *Phone: 1 (831) 254-3780*

isaacovercast · 2021-02-16T09:42:01Z

Regarding my previous message: Wait, when you say "it has crashed" you mean the server crashes? I thought you were talking about the ipyrad process. If the server is crashing that is a hardware issue my friend. <- Was the original crash a hardware failure?

If you want help you need to provide more specific information, like "what samples did you retain for the single library analysis" "what parameters did you use in your params file" what was the step the assembly crashed in, what was all of the output from the assembly run and all of the output of the error message.

When you say you tried to trim the reads beforehand, how did you do this, what were the exact settings you used for trimming, did you run ipyrad after trimming and if so what happened and how was it different from pre-trimming behavior.

This message is self-explanatory: "Message: datatype + assembly_method combo not currently supported."

This is evolving away from a very specific error in ipyrad to more of a help thread, which is more appropriate for the gitter channel: https://gitter.im/dereneaton/ipyrad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step 3 gets to 85% and crashes. #437

Step 3 gets to 85% and crashes. #437

Pgjhmb commented Feb 14, 2021

isaacovercast commented Feb 14, 2021

Pgjhmb commented Feb 14, 2021 via email

isaacovercast commented Feb 14, 2021

isaacovercast commented Feb 14, 2021

Pgjhmb commented Feb 14, 2021 via email

Pgjhmb commented Feb 15, 2021 via email

isaacovercast commented Feb 15, 2021

Pgjhmb commented Feb 16, 2021 via email

isaacovercast commented Feb 16, 2021

Step 3 gets to 85% and crashes. #437

Step 3 gets to 85% and crashes. #437

Comments

Pgjhmb commented Feb 14, 2021

isaacovercast commented Feb 14, 2021

Pgjhmb commented Feb 14, 2021 via email

isaacovercast commented Feb 14, 2021

isaacovercast commented Feb 14, 2021

Pgjhmb commented Feb 14, 2021 via email

Pgjhmb commented Feb 15, 2021 via email

isaacovercast commented Feb 15, 2021

Pgjhmb commented Feb 16, 2021 via email

isaacovercast commented Feb 16, 2021