Crashed building contigs #50

shanesturrock · 2018-09-10T21:09:30Z

I'm trying to do a very large de-novo assembly on a machine with 6TB of RAM. The input data is a mix of 150 and 250bp PE and MP reads and the pregraph step was using 4TB of RAM but completed without issue. Once it moved to the contig step it immediately crashes with the following error:

Parameters: contig -g Prad_HiSeq -M 1 -R -s Prad_HiSeq.config -p 72

There are 859994916 kmer(s) in vertex file.
There are 1659176686 edge(s) in edge file.
Ran out of memory while applying -13679802112bytes
There may be errors as follows:

Not enough memory.
The ARRAY may be overrode.
The wild pointers.

The negative value looks like a signed variable overflowing to me. I believe we have enough RAM.

aquaskyline · 2018-09-11T08:25:35Z

The "Ran out of memory while applying" error came from the ckalloc function in the file standardPregraph/check.c. The ckalloc function takes a single parameter (size) with type "unsigned long long". So I don't think ckalloc is the problem, instead, some code elsewhere calling ckalloc should be the culprit, but I don't have the stack trace from you. Would it be possible for you to trace down who has called ckalloc, and probably, the problem could be solved as easy as changing the type of the variable that stores the size of memory to be allocated to 64bit.

shanesturrock · 2018-09-11T08:44:59Z

Just quickly looking at the check.c code I can see that ckrealloc is calling ckalloc with ‘new_size’ which is size_t and that is defined as an unsigned int even on 64 bit systems so maybe that’s the issue.as ckalloc is expecting an unsigned long long. Will have a poke at it some more tomorrow. Shane

…

On 11/09/2018, at 8:25 PM, Ruibang Luo ***@***.***> wrote: The "Ran out of memory while applying" error came from the ckalloc function in the file standardPregraph/check.c. The ckalloc function takes a single parameter (size) with type "unsigned long long". So I don't think ckalloc is the problem, instead, some code elsewhere calling ckalloc should be the culprit, but I don't have the stack trace from you. Would it be possible for you to trace down who has called ckalloc, and probably, the problem could be solved as easy as changing the type of the variable that stores the size of memory to be allocated to 64bit. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#50 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJhUWxxOv66Jo2wwn5smRDMVcEU0GI15ks5uZ3OEgaJpZM4WiLyI>.

shanesturrock · 2018-09-11T20:10:40Z

I did two checks to be sure. The size_t is indeed 8 bytes just like unsigned long long so that's not the problem. It turns out that the issue is two fold. The error us displaying a signed long long (%lld) at line 132 of check.c so changing that to an unsigned long long (%llu) results in the true number which is 18446744060029749504 bytes, in other words 18446744 TB so my pathetic 6TB isn't going to do it. I've obviously got a lot of input data here but my suspicion is the use of 250bp for max_rd_len because I've run this assembly previously with just 150bp max_rd_len and about 20% less input data and it completed within 2TB of RAM. I'm running again now having set max_rd_len to 150bp again and I'll see what happens. It is clearly loading the data more quickly as I only started it yesterday and it has loaded half the data already.

aquaskyline · 2018-09-13T02:37:50Z

I'm still a bit confused of how the problem was caused by changing %lld to %llu if it's a cast, not a pointer reference but please let me know if you want to propose a fix to the code. And please let me know how your new run using 150bp as max_rd_len goes. Thank you.

shanesturrock · 2018-09-13T02:41:29Z

On 13/09/2018, at 2:37 PM, Ruibang Luo ***@***.***> wrote: I'm still a bit confused of how the problem was caused by changing %lld to %llu if it's a cast, not a pointer reference but please let me know if you want to propose a fix to the code. And please let me know how your new run using 150bp as max_rd_len goes. Thank you.

I don’t think there’s a cast problem as the size of size_t and unsigned long long are the same. If it was me, I would explicitly specific unsigned long long throughout rather than using size_t. The print statement giving a negative number is a minor bug which would be unlikely to be seen by anyone who isn’t dumb enough to set max_rd_len to 250. My current run with 150bp is moving about 4x faster so should hit the same point where it crashed within a few days and then we’ll see. I was wondering if maybe switching to sparse_pregraph might allow me to use the 250bp reads in their entirety or if the contig building step is still going to try and allocate a massive amount of RAM? Shane

…

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#50 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJhUW402hU922Nq-acOZrNIAl6vDo_Apks5uacUDgaJpZM4WiLyI>.

aquaskyline · 2018-09-13T08:57:13Z

Regarding using sparse_pregraph, its memory efficiency depends very much on the complexity of the genome you are assembling. At this point I strongly suggest you to stick to pregraph. If it doesn't work with 150bp either, I would suggest you to use Megahit to create contigs first. Megahit uses about 4 times less memory than SOAPdenovo, and the contigs could be further assembled into scaffolds using the finalfusion module in SOAPdenovo.

shanesturrock · 2018-09-13T09:02:04Z

The genome is highly repetitive (around 80%) but also very large. I've previously assembled it using a subset of the data I have using 150bp PE reads plus the jumping libraries but it was quite fragmented producing 31 million scaffolds. I had less memory at the time and with more I thought I could be more ambitious but I think I'll dial it back to closer to the successful run and build up from there. I'll have a look at Megahit and finalfusion if this current run doesn't get past the contigs. Thanks!

shanesturrock · 2018-09-15T17:30:28Z

With max_rd_len set to 150 and nothing else changed, the pregraph completed fine and contig building is now running without issue.

beatusmodest · 2019-04-08T07:02:43Z

With max_rd_len set to 150 and nothing else changed, the pregraph completed fine and contig building is now running without issue.

Thank you very much, this solved my problem

WJT0925 · 2022-09-28T02:43:46Z

I set max_rd_len to 150, but I still have this problem with K= 29, but no error with K=127, what is the possible problem?

aquaskyline closed this as completed Jul 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crashed building contigs #50

Crashed building contigs #50

shanesturrock commented Sep 10, 2018

aquaskyline commented Sep 11, 2018

shanesturrock commented Sep 11, 2018 via email

shanesturrock commented Sep 11, 2018

aquaskyline commented Sep 13, 2018

shanesturrock commented Sep 13, 2018 via email

aquaskyline commented Sep 13, 2018 •

edited

shanesturrock commented Sep 13, 2018

shanesturrock commented Sep 15, 2018

beatusmodest commented Apr 8, 2019

WJT0925 commented Sep 28, 2022

Crashed building contigs #50

Crashed building contigs #50

Comments

shanesturrock commented Sep 10, 2018

aquaskyline commented Sep 11, 2018

shanesturrock commented Sep 11, 2018 via email

shanesturrock commented Sep 11, 2018

aquaskyline commented Sep 13, 2018

shanesturrock commented Sep 13, 2018 via email

aquaskyline commented Sep 13, 2018 • edited

shanesturrock commented Sep 13, 2018

shanesturrock commented Sep 15, 2018

beatusmodest commented Apr 8, 2019

WJT0925 commented Sep 28, 2022

aquaskyline commented Sep 13, 2018 •

edited