-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Step 3 gets to 85% and crashes. #437
Comments
Hello P, |
Hello,
Thanks for getting back ot me,
We have 256gb of RAM and htop does not indicate we are going into swap. Not
even using half the ram. We have 32 cores, and I allocated 19 of them. The
server has crashed all 3 times and I am starting to think it is not a
coincidence.
Thanks for any insight you can provide. I am trying to get some assemblies.
Any other programs you recommend I would be interested in hearing. I might
set up individual runs with the specific species reference.
Paolo
…On Sun, Feb 14, 2021 at 10:16 AM Isaac Overcast ***@***.***> wrote:
Hello P,
What makes you think the memory isn't an issue? The number of reads per
sample and the locus length will both contribute to consume a TON of RAM
during step 3. How much ram do you have and how many cores? Also, what is
the exact text of the error message?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#437 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOXHCLVCOKCC4O5MTC4BQNDS7AVSTANCNFSM4XTRB3KA>
.
--
*Paolo Marra-Biggs*
*University of Hawaii at Mānoa *
*Marine Biology MSc. Student*
*ToBo Lab*
*Phone: 1 (831) 254-3780*
|
Hey Paolo, |
The long reads can also cause massive problems during clustering and alignment given that the distal ends of R1 and R2 can obtain very high error rates. You might consider looking at the results of fastqc for a couple of your samples and using the |
Hey Issac,
The internet connection has gone down at the lab (HIMB), so I will send it
when once I have access.
Thanks
…On Sun, Feb 14, 2021 at 10:26 AM Isaac Overcast ***@***.***> wrote:
The long reads can also cause massive problems during clustering and
alignment given that the distal ends of R1 and R2 can obtain very high
error rates. You might consider looking at the results of fastqc for a
couple of your samples and using the trim_reads parameter in step 2 to
trim off the regions with very low base quality.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#437 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOXHCLQOC3XSDP5MUK6QLFLS7AWW3ANCNFSM4XTRB3KA>
.
--
*Paolo Marra-Biggs*
*University of Hawaii at Mānoa *
*Marine Biology MSc. Student*
*ToBo Lab*
*Phone: 1 (831) 254-3780*
|
Hey Isaac,
I can't confirm that it failed due to the script or the power supply. We
have UPSʻs for this very reason, and had them recently replaced.
What are the file outputs after step 3?
I have the following folders: Tridacna-tmpalign, Tridacna_clust_0.85, and
Tridacna_edits.
There is no log file, nor an error message. Because the server crashes
while using a GNU screen, so it never displays the error log. Is there a
way to write within the params file the creation of a log file and an error
file?
Thanks,
Paolo
On Sun, Feb 14, 2021 at 10:37 AM Paolo Marra-Biggs <paolomb@hawaii.edu>
wrote:
… Hey Issac,
The internet connection has gone down at the lab (HIMB), so I will send it
when once I have access.
Thanks
On Sun, Feb 14, 2021 at 10:26 AM Isaac Overcast ***@***.***>
wrote:
> The long reads can also cause massive problems during clustering and
> alignment given that the distal ends of R1 and R2 can obtain very high
> error rates. You might consider looking at the results of fastqc for a
> couple of your samples and using the trim_reads parameter in step 2 to
> trim off the regions with very low base quality.
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#437 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AOXHCLQOC3XSDP5MUK6QLFLS7AWW3ANCNFSM4XTRB3KA>
> .
>
--
*Paolo Marra-Biggs*
*University of Hawaii at Mānoa *
*Marine Biology MSc. Student*
*ToBo Lab*
*Phone: 1 (831) 254-3780*
--
*Paolo Marra-Biggs*
*University of Hawaii at Mānoa *
*Marine Biology MSc. Student*
*ToBo Lab*
*Phone: 1 (831) 254-3780*
|
Wait, when you say "it has crashed" you mean the server crashes? I thought you were talking about the ipyrad process. If the server is crashing that is a hardware issue my friend. Did you consider this: "You might consider looking at the results of fastqc for a couple of your samples and using the trim_reads parameter in step 2 to trim off the regions with very low base quality." |
Hey Isaac,
I tried taking a subset of the samples and just use 1 library, using the
same parameters as listed before and I get this error.
Encountered an Error.
Message: IPyradError: [bwa_index] Pack FASTA... [gzread] Is a directory
Then when I tried using a denovo+reference assembly method, I get this
error.
Encountered an Error.
Message: datatype + assembly_method combo not currently supported.
I have also tried to trim the reads beforehand using BBduk and trimming
based on a quality score.
Any help would be appreciated,
Paolo
On Sun, Feb 14, 2021 at 9:10 PM Paolo Marra-Biggs <paolomb@hawaii.edu>
wrote:
… Hey Isaac,
I can't confirm that it failed due to the script or the power supply. We
have UPSʻs for this very reason, and had them recently replaced.
What are the file outputs after step 3?
I have the following folders: Tridacna-tmpalign, Tridacna_clust_0.85, and
Tridacna_edits.
There is no log file, nor an error message. Because the server crashes
while using a GNU screen, so it never displays the error log. Is there a
way to write within the params file the creation of a log file and an error
file?
Thanks,
Paolo
On Sun, Feb 14, 2021 at 10:37 AM Paolo Marra-Biggs ***@***.***>
wrote:
> Hey Issac,
>
> The internet connection has gone down at the lab (HIMB), so I will send
> it when once I have access.
>
> Thanks
>
> On Sun, Feb 14, 2021 at 10:26 AM Isaac Overcast ***@***.***>
> wrote:
>
>> The long reads can also cause massive problems during clustering and
>> alignment given that the distal ends of R1 and R2 can obtain very high
>> error rates. You might consider looking at the results of fastqc for a
>> couple of your samples and using the trim_reads parameter in step 2 to
>> trim off the regions with very low base quality.
>>
>> —
>> You are receiving this because you authored the thread.
>> Reply to this email directly, view it on GitHub
>> <#437 (comment)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AOXHCLQOC3XSDP5MUK6QLFLS7AWW3ANCNFSM4XTRB3KA>
>> .
>>
>
>
> --
> *Paolo Marra-Biggs*
> *University of Hawaii at Mānoa *
> *Marine Biology MSc. Student*
> *ToBo Lab*
> *Phone: 1 (831) 254-3780*
>
--
*Paolo Marra-Biggs*
*University of Hawaii at Mānoa *
*Marine Biology MSc. Student*
*ToBo Lab*
*Phone: 1 (831) 254-3780*
--
*Paolo Marra-Biggs*
*University of Hawaii at Mānoa *
*Marine Biology MSc. Student*
*ToBo Lab*
*Phone: 1 (831) 254-3780*
|
Regarding my previous message: Wait, when you say "it has crashed" you mean the server crashes? I thought you were talking about the ipyrad process. If the server is crashing that is a hardware issue my friend. <- Was the original crash a hardware failure? If you want help you need to provide more specific information, like "what samples did you retain for the single library analysis" "what parameters did you use in your params file" what was the step the assembly crashed in, what was all of the output from the assembly run and all of the output of the error message. When you say you tried to trim the reads beforehand, how did you do this, what were the exact settings you used for trimming, did you run ipyrad after trimming and if so what happened and how was it different from pre-trimming behavior. This message is self-explanatory: "Message: datatype + assembly_method combo not currently supported." This is evolving away from a very specific error in ipyrad to more of a help thread, which is more appropriate for the gitter channel: https://gitter.im/dereneaton/ipyrad |
Hello, I am running ipyrad on ezrad data for assemblies of consensus sequences within a single genus. It is believed that the species are distantly divergent. I have been running denovo without a reference on each of my samples, but each time (third time now) it has crashed after ~9 days, getting to the 85% point before doing so. Very frustrating.
Below is my params file. The cpu memory shouldn't be an issue.
I have 21 samples, between 2.5 million to 10 million reads of 300-750 bp long.
I would love any insight. I have used a whole month of crashed scripting.....
HALLLLP!
Thanks.
P
------- ipyrad params file (v.0.9.62)-------------------------------------------
Tridacna-1_26_21 ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps
/data/paolo/Tridacna_mt_phylogeny/Paired_Reads ## [1] [project_dir]: Project dir (made in curdir if not present)
## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files
## [3] [barcodes_path]: Location of barcodes file
/data/paolo/Tridacna_mt_phylogeny/Paired_Reads/*.fastq ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files
denovo ## [5] [assembly_method]: Assembly method (denovo, reference)
## [6] [reference_sequence]: Location of reference sequence file
rad ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc.
GATC, ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2)
5 ## [9] [max_low_qual_bases]: Max low quality base calls (Q<20) in a read
33 ## [10] [phred_Qscore_offset]: phred Q score offset (33 is default and very standard)
6 ## [11] [mindepth_statistical]: Min depth for statistical base calling
6 ## [12] [mindepth_majrule]: Min depth for majority-rule base calling
10000 ## [13] [maxdepth]: Max cluster depth within samples
0.85 ## [14] [clust_threshold]: Clustering threshold for de novo assembly
0 ## [15] [max_barcode_mismatch]: Max number of allowable mismatches in barcodes
2 ## [16] [filter_adapters]: Filter for adapters/primers (1 or 2=stricter)
35 ## [17] [filter_min_trim_len]: Min length of reads after adapter trim
2 ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences
0.05 ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus
0.05 ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus
4 ## [21] [min_samples_locus]: Min # samples per locus for output
0.2 ## [22] [max_SNPs_locus]: Max # SNPs per locus
8 ## [23] [max_Indels_locus]: Max # of indels per locus
0.5 ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus
0, 0, 0, 0 ## [25] [trim_reads]: Trim raw read edges (R1>, <R1, R2>, <R2) (see docs)
0, 0, 0, 0 ## [26] [trim_loci]: Trim locus edges (see docs) (R1>, <R1, R2>, <R2)
p, s, l ## [27] [output_formats]: Output formats (see docs)
## [28] [pop_assign_file]: Path to population assignment file
## [29] [reference_as_filter]: Reads mapped to this reference are removed in step 3
The text was updated successfully, but these errors were encountered: