Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STARlong terminating after throwing an instance of 'std::out_of_range' #85

Closed
lucventurini opened this issue Nov 12, 2015 · 7 comments
Closed

Comments

@lucventurini
Copy link

Hi,
unfortunately I have found another bug, always on wheat. This particular crash happens on the new wheat genome assembly (see here: http://www.tgac.ac.uk/news/241/68/The-Genome-Analysis-Centre-announces-an-important-milestone-in-wheat-research/).

Command line:

--runMode alignReads --outSAMtype BAM Unsorted --readNameSeparator space --genomeDir /tgac/workarea/users/venturil/Private/Wheat/NewRelease/Reference/ --runThreadN 16 --readFilesIn reads.fa
Starting program: /tgac/software/testing/star/2.5.0a/x86_64/bin/STARlong --runMode alignReads --outSAMtype BAM Unsorted --readNameSeparator space --genomeDir /tgac/workarea/users/venturil/Private/Wheat/NewRelease/Reference/ --runThreadN 16 --readFilesIn reads.fa

Detailed stack trace obtained by running said command from inside GDB:

terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check: __n (which is 18446744073709551615) >= this->size() (which is 735943)

Program received signal SIGABRT, Aborted.
0x00002aaaabd0f8a5 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00002aaaabd0f8a5 in raise () from /lib64/libc.so.6
#1 0x00002aaaabd11085 in abort () from /lib64/libc.so.6
#2 0x00002aaaab15e115 in __gnu_cxx::__verbose_terminate_handler () at ../../../../gcc-4.9.1/libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x00002aaaab15c176 in __cxxabiv1::__terminate (handler=) at ../../../../gcc-4.9.1/libstdc++-v3/libsupc++/eh_terminate.cc:47
#4 0x00002aaaab15c1c1 in std::terminate () at ../../../../gcc-4.9.1/libstdc++-v3/libsupc++/eh_terminate.cc:57
#5 0x00002aaaab15c3d8 in __cxxabiv1::__cxa_throw (obj=0x70e860, tinfo=0x2aaaab3f3670 , dest=0x2aaaab175750 std::out_of_range::~out_of_range())

at ../../../../gcc-4.9.1/libstdc++-v3/libsupc++/eh_throw.cc:87

#6 0x00002aaaab1b768f in std::__throw_out_of_range_fmt (__fmt=) at ../../../../../gcc-4.9.1/libstdc++-v3/src/c++11/functexcept.cc:101
#7 0x0000000000443d02 in Junction::outputStream(std::ostream&, Parameters*) ()
#8 0x0000000000445446 in outputSJ(ReadAlignChunk*, Parameters) ()
#9 0x0000000000413124 in main ()

I attach a ZIP file with the Log
Debug_wheat.txt

Unfortunately, the whole genome itself and its indices are far less portable than the 3B chromosome, and the core dump is ~70GB.

@lucventurini
Copy link
Author

To be clear, this happens also with the released statically compiled binaries, not just those compiled on our machine. If I use those, the error from gdb is more terse:

(gdb) run --runMode alignReads --outSAMtype BAM Unsorted --outSAMattributes NH HI NM MD AS XS jM jI --readNameSeparator space --outFilterMultimapScoreRange 1 --outFilterMismatchNoverLmax 0.05 --scoreGapNoncan -20 --scoreGapGCAG -4 --scoreGapATAC -8 --scoreDelOpen -1 --scoreDelBase -1 --scoreInsOpen -1 --scoreInsBase -1 --alignEndsType Local --seedSearchStartLmax 50 --seedPerReadNmax 100000 --seedPerWindowNmax 1000 --alignTranscriptsPerReadNmax 100000 --alignTranscriptsPerWindowNmax 10000 --genomeDir /tgac/workarea/users/venturil/Private/Wheat/NewRelease/Reference/ --runThreadN 49 --readFilesIn /tgac/workarea/users/venturil/Private/Wheat/Reads/PacBio/FirstBatch/IsoSeq/ReadsOfInsert/A02_1/data/isoseq_flnc.fasta

Starting program: /tgac/software/testing/star/2.5.0a/src/STAR-STAR_2.5.0a/bin/Linux_x86_64_static/STARlong --runMode alignReads --outSAMtype BAM Unsorted --outSAMattributes NH HI NM MD AS XS jM jI --readNameSeparator space --outFilterMultimapScoreRange 1 --outFilterMismatchNoverLmax 0.05 --scoreGapNoncan -20 --scoreGapGCAG -4 --scoreGapATAC -8 --scoreDelOpen -1 --scoreDelBase -1 --scoreInsOpen -1 --scoreInsBase -1 --alignEndsType Local --seedSearchStartLmax 50 --seedPerReadNmax 100000 --seedPerWindowNmax 1000 --alignTranscriptsPerReadNmax 100000 --alignTranscriptsPerWindowNmax 10000 --genomeDir /tgac/workarea/users/venturil/Private/Wheat/NewRelease/Reference/ --runThreadN 49 --readFilesIn /tgac/workarea/users/venturil/Private/Wheat/Reads/PacBio/FirstBatch/IsoSeq/ReadsOfInsert/A02_1/data/isoseq_flnc.fasta

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Nov 13 13:35:32 ..... Started STAR run
Nov 13 13:35:32 ..... Loading genome
Nov 13 13:39:07 ..... Started mapping
[New Thread 0x2b01871ff700 (LWP 585953)]

[.. threads starting .. ]

[Thread 0x2b0189410700 (LWP 585970) exited]

[ .. threads exiting .. ]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2b018ce2d700 (LWP 585999)]
0x0000000000430158 in ReadAlign::multMapSelect() ()

@lucventurini
Copy link
Author

Also to be clear, the problem is triggered by something in the size of the database. If I test STAR using 4 different non-overlapping and globally comprehensive subsections of the data, i.e. all 3 subgenomes and the unassigned fraction, STAR performs as expected.

@lucventurini
Copy link
Author

One thing: I just rechecked the logs and realized that maybe there might be something amiss with the indices. A previous version of this genome that we were using led STAR to create a Suffix Array with 12,935,398,406 (~13 billion) indices; whereas the current version contains 25315452282 (~25 billion).

Might this be at the root of the error?
I am trying to reindex using the following parameter: "--genomeSAsparseD 2" which should decrease the number of indices by two-fold (if I understand it correctly).

@lucventurini
Copy link
Author

Submitted a pull merge request dealing with this bug.

@alexdobin
Copy link
Owner

Hi Luca,

there is something strange about the genome index size, could you please send me the Log.out file of the genome generation run?

Cheers
Alex

@lucventurini
Copy link
Author

Sure thing, here it is (in gzip format). Incidentally, I think I will try your new commits for the STARlong - the pull merge I requested seems to function only for Illumina reads.

Update: I tested the new commits, unfortunately the bug is still present for PacBio reads.

Cheers

Log.out.gz.txt

@alexdobin
Copy link
Owner

Hi Luca,

the Log.out from genome generation step uses default --genomeSAsparseD, however, the mapping was done with the genome that had --genomeSAsparseD 3. Also it seems that the genome generation was done with one of the "2.5.0a_alpha" patches.
Could you please generate the genome and map with the latest patch from the master (please note which one you used) and send me both Log.out files?
It seems that this genome is not publicly available yet, which makes it harder to debug.

Cheers
Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants