Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault when running STARsolo SmartSeq #1040

Closed
wir963 opened this issue Sep 23, 2020 · 38 comments
Closed

Segmentation Fault when running STARsolo SmartSeq #1040

wir963 opened this issue Sep 23, 2020 · 38 comments
Labels
issue: code Likely to be an issue with STAR code

Comments

@wir963
Copy link

wir963 commented Sep 23, 2020

Hey Alex,

I'm trying to transition from running STAR individually on Smart-seq2 data to running STARsolo --soloType SmartSeq. Unfortunately, I'm getting a segmentation fault. I would be surprised if the error is due to the genome or the fastq files as I have run them through STAR (not STARsolo) previously. I'm using STAR 2.7.6a installed from bioconda. See below for the output and let me know if you have any questions. The error occurs ~10-15 minutes after "started mapping" is logged.

Sep 23 09:18:26 ..... started STAR run
Sep 23 09:18:27 ..... loading genome
Sep 23 09:18:54 ..... processing annotations GTF
Sep 23 09:19:16 ..... inserting junctions into the genome indices
Sep 23 09:23:23 ..... started mapping
/usr/bin/bash: line 1:  9315 Segmentation fault      STAR --runThreadN 16 --soloType SmartSeq --readFilesManifest data/BC08-manifest.tsv --soloUMIdedup Exact --soloStrand Unstranded --genomeDir 'output/star-index' --outSAMunmapped Within --readFilesCommand zcat --sjdbGTFfile raw/genome/gencode.v34.chr_patch_hapl_scaff_RNAspike.gtf --sjdbOverhang 99

Thanks, Welles

@wir963
Copy link
Author

wir963 commented Sep 23, 2020

Quick update - I looked at the Log.out file and found the following at the end of the file

Starting to map file # 12
mate 1:   FASTQ/trimmed/BC08-SRR5023576_1.fastq.gz
mate 2:   FASTQ/trimmed/BC08-SRR5023576_2.fastq.gz

There are 22 total files so it must have crashed in the middle of the mapping step

@wir963
Copy link
Author

wir963 commented Sep 23, 2020

@alexdobin another update for you. When I exclude the above sample, STARsolo completes successfully. When I re-add this sample, it fails with the same segmentation fault error. I checked and I was able to process the exact same set of fastq files using STAR in regular mode so I would be surprised if there is a problem with the files but I could check. I'm including the first four lines from each of the fastq files if that helps and I'm happy to share the entire files as they're publicly available.

FASTQ/trimmed/BC08-SRR5023576_1.fastq.gz

@SRR5023576.1001 1001 length=100
GGGTAGATTTGTATGAAGAGATGGGAAAATATGACTATCATATACAAATTTAAAAAAGCAGATTGTGGCCAGGCGCAGTGGCTCATGCCTGTAATCCTAG
+SRR5023576.1001 1001 length=100
GAGGGGGGGIGGGGGGIGGIGGGGGGGIGIGGG.GGGGGIIIIGGIGIIIIIGIGIIIGIGGGIGGIIGGGIIIIIIAAGGGGGGIGGGGGGGGGGGGIG

FASTQ/trimmed/BC08-SRR5023576_1.fastq.gz

@SRR5023576.1001 1001 length=100
TTGCCATGTTGGCCAGGCTGGTCTCTAACTCCTGACCTCAGGTGATCCACTCACCTTAGCCTCCCAAAGTGCTAGGATTACAGGCATGAGCCACTGCGCC
+SRR5023576.1001 1001 length=100
AGAGGGGGIGIGIIIGGGIIIAGGIGIIIGGGGGGIGGGIIGIIGGGGIGGGIIIIIIGGIGGGGGGGGGGGIIGIIAGGGGIGGGIGIIIGGGGIGGGG

@alexdobin
Copy link
Owner

Hi Welles,

does it fail when you run just this file alone in the --readFilesManifest?
I will get this file from SRA and try to map it.

Cheers
Alex

@alexdobin alexdobin added the issue: code Likely to be an issue with STAR code label Sep 26, 2020
@wir963
Copy link
Author

wir963 commented Sep 28, 2020

Hey @alexdobin,

I'm in the middle of regenerating all of the files so I'll confirm in a few minutes hopefully but yes, it did cause an error when I ran the file alone in the --readFilesManifest.

Best, Welles

@wir963
Copy link
Author

wir963 commented Sep 28, 2020

Hey @alexdobin,

I can confirm that I can run STARsolo for all samples belonging to patient BC08 except for SRR5023576. When I create a manifest file containing just SRR5023576, I get the above error. Moreover, I think the bug is specific to STARsolo because I can run the exact same code removing the STARsolo specific commands using a manifest file containing just SRR5023576, it works.

STARsolo code that fails for SRR5023576

STAR 
--runThreadN {threads} 
--soloType SmartSeq
--readFilesManifest {input[1]} 
--soloUMIdedup Exact --soloStrand Unstranded 
--genomeDir '{input[0]}' 
--outSAMunmapped Within 
--readFilesCommand zcat 
--outSAMtype BAM Unsorted 
--outFileNamePrefix '{params.odir}' 

STAR code that works for SRR5023576

STAR 
--runThreadN {threads} 
--readFilesManifest {input[1]} 
--genomeDir '{input[0]}' 
--outSAMunmapped Within 
--readFilesCommand zcat 
--outSAMtype BAM Unsorted 
--outFileNamePrefix '{params.odir}' 

Best, Welles

@alexdobin
Copy link
Owner

Hi Welles,

can you try to run with the same sample in the manifest.tsv, i.e.

SRR5023576_1.fastq      SRR5023576_2.fastq      SRR5023576
SRR5023576_1.fastq      SRR5023576_2.fastq      SRR5023576

I have tried it and did not get any errors.
If it still seg-faults, please send me the Log.out file and links to the annotation FASTA and GTF.

Cheers
Alex

@wir963
Copy link
Author

wir963 commented Sep 30, 2020

Hey Alex,

Hmmm that's weird. I am using trimmed reads so perhaps that is the cause. I'll re-run with the untrimmed reads for that sample and let you know what happens. Thanks again for your support.

Best, Welles

@wir963
Copy link
Author

wir963 commented Oct 27, 2020

Hey Alex,

Sorry for the delay in getting back to you. I just retried using the freshly downloaded reads from SRA and I still get a Segmentation fault. I repeated this using SRR5023575 and it worked so I don't believe it is an obvious error on my part.

I am currently using the reference human genome (HG38.p13 gencode release 34) (not the primary assembly because I am more interested in filtering out all human reads than accurately quantifying the reads) concatenated with RNA spike-in sequences RNA_SPIKE_1, RNA_SPIKE_4 and RNA_SPIKE_7 (which I downloaded via https://www.ebi.ac.uk/arrayexpress/files/E-MTAB-5466/E-MTAB-5466.additional.1.zip and filtered for the three sequences of interest and hand-crafted a gtf file). On a side note, does STAR care about whether the letters for fasta sequences are capitalized or lower case?

Here's the log file - Log.out.txt. Let me know how I can be of help. I'm happy to re-run it on my end using a different genome if that's easier - just let me know.

Best, Welles

@alexdobin
Copy link
Owner

Hi Welles,

thanks!
I have mapped to the "ALL" genome and annotations, with the added spike-ins - still no seg-fault.
One last thing that is different is the GTF for spike-ins, could you please send me that file.

The lower-case letter in FASTA are no problem, they are still converted to proper bases.

Cheers
Alex

@wir963
Copy link
Author

wir963 commented Nov 2, 2020

Hey Alex,

Thanks for doing that. I'm attaching the GTF file for spike-ins
spike_in_sequences.gtf.txt. I only put it as a txt file so that GitHub would support it. Let me know if you can reproduce this issue using this. Otherwise, I'm going to be very confused.

Best, Welles

@alexdobin
Copy link
Owner

Hi Welles,

thanks for the gtf, I generated the genome with it, and still cannot reproduce the seg-fault.
Moreover, I run it through valgrind, and it did not find any problems.

Could you investigate it a bit further?

  1. Try to run it with the static executable STAR-2.7.6a/bin/Linux_x86_64_static/STAR
  2. If it still seg-faults, can you check md5 sums of the genome files and SRR fastqs, as well as STAR executable, to make sure that we are running on exactly the same data. I am attaching the md5 sum files.
    md5all.zip

Cheers
Alex

@wir963
Copy link
Author

wir963 commented Nov 6, 2020

Hey Alex,

I ran it with the static executable and got the same error. All of the md5 sums that you sent over matched except for the md5sum for genomeParameters.txt (you provided 003d29a3e3d40306a809e684854d2a07 but I got 1bb861713ae7fc7af2d3574e9d82767f). I'm guessing this means that it may be due to a difference in the generation of the STAR genome so here's the code that I used to generate the genome.

STAR --runThreadN 16 --runMode genomeGenerate --genomeDir output/star-index --genomeFastaFiles <human + spikein fasta_file> --sjdbGTFfile <human + spikein GTF file>

Am I making a mistake above? Did you use different code? The above was generated using the bioconda installation of star and I am happy to regenerate the index from the static binary if you don't see an obvious difference.

Thanks again for all of your help on this.

Best, Welles

@alexdobin
Copy link
Owner

Hi Welles,

thanks for the checks!
The genomeParameters files can differ (e.g. because you concatenated FASTAs and I listed them separately) - but the actual indexes will still be the same. Could you post your genomeParameters.txt file?

If you are willing to investigate this further, we will need to figure out which reads cause the problem. Could you run the same example with --readMapNumber 1000 to see if the seg-fault occurs for a very small number of reads?

Thanks!
Alex

@wir963
Copy link
Author

wir963 commented Nov 13, 2020

Hey Alex,

Yeah, glad to help (and very curious).
Here is the file - genomeParameters.txt. I ran with --readMapNumber 1000 and it worked (no seg-fault). Should I keep doing a binary search for the read causing this error?

Best, Welles

@wir963
Copy link
Author

wir963 commented Nov 13, 2020

Hey Alex,

I believe the error is in read 12,662 as --readMapNumber 12661 works but I get a seg-fault for --readMapNumber 12662. Given that you are not getting the error, how should we debug?

Best, Welles

@wir963
Copy link
Author

wir963 commented Nov 13, 2020

Hey Alex,

Here is line 12,662 from SRR5023576_1.fastq

@SRR5023576.12662 12662 length=100
GTATTAAGTATTACTCTATTATTCTGGAATTTAACAATGAATAAGCACATATTTTAAAGGCCACACCTACTAAGAAACAGATACAAACTGTAATTTCTAA
+SRR5023576.12662 12662 length=100
GGGGGIIIGGIIIIIIIIIIGGIGIIIIIIIIIIIIIIIIGIIIIGIIIIIIIIIGGGGGIIGIIIIGGIIIIIIIIIIIGGIIIIIIIIGGGIGIIIIG

Here is line 12,662 from SRR5023576_2.fastq

@SRR5023576.12662 12662 length=100
GGCACACACCACCACATCCAGCTTATTTTTGTATTTTTAGTAGAAACGGGGTTTCACCATGTTGGCCGGGCTGGCCTCGAACTCCTGACCCCAGGTGATC
+SRR5023576.12662 12662 length=100
GAAGGIIIIGGIIIGIIIGIIIIIIIIIGIGGGIIIIIGIIGIGGGGGGGGIGGGIIAGGGIIGGIIIIAGIIGGGGIGGGIGGGGGGGGG<GGGGGIGG

If I delete these lines from the fastq file and run STAR with --readMapNumber 12662 (which previously failed), then it works. However, if I run STAR with --readMapNumber 20000, then I get a segmentation fault. Any suggestions?

Best, Welles

@alexdobin
Copy link
Owner

alexdobin commented Nov 13, 2020

Hi Welles,

thanks a lot for these tests, I think we are zeroing on a problem. Could do you a couple of more things?

  1. Map just this problematic read alone - does it seg-fault?
  2. Compile the code without optimization, with
make clean
make gdb

from inside the source/ directory

Cheers
Alex

@wir963
Copy link
Author

wir963 commented Nov 13, 2020

Hey Alex,

  1. When I map this "problematic" read alone, it does not seg-fault.
  2. I compiled the code per your instructions and ran STAR-2.7.6a/source/STAR with --readMapNumber 20000 and it did seg-fault.

Best, Welles

@alexdobin
Copy link
Owner

Thanks! Now, can you run the debugger with

gdb --args /path/to/STAR-2.7.6a/source/STAR <all options>

and at the prompt type r - it will start running and it will stop with seg-fault. bt will tell us where the error happens.

@wir963
Copy link
Author

wir963 commented Nov 13, 2020

Hey Alex,

Here is the output of bt

#0  0x000000000042d2fe in alignToTranscript (aG=..., trS1=689108255, exN1=1, exSE1=0x2ab1f9499df0, 
    exLenCum1=0x2ab1f9ece700, distTrEnds=...) at Transcriptome_classifyAlign.cpp:47
#1  0x000000000042da56 in Transcriptome::classifyAlign (this=0x3e64cb0, alignG=0x2ab209690dd0, nAlignG=2, 
    readAnnot=...) at Transcriptome_classifyAlign.cpp:202
#2  0x000000000049b36f in ReadAlign::outputAlignments (this=0x2ab20968e010)
    at ReadAlign_outputAlignments.cpp:128
#3  0x00000000004e49e4 in ReadAlign::oneRead (this=0x2ab20968e010) at ReadAlign_oneRead.cpp:101
#4  0x00000000004d9a40 in ReadAlignChunk::mapChunk (this=0x3e64a00) at ReadAlignChunk_mapChunk.cpp:25
#5  0x00000000004d8be0 in ReadAlignChunk::processChunks (this=0x3e64a00)
    at ReadAlignChunk_processChunks.cpp:243
#6  0x000000000051460b in ThreadControl::threadRAprocessChunks (RAchunk=0x3e64a00) at ThreadControl.h:22
#7  0x00002aaaab931e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00002aaaabc44bad in clone () from /lib64/libc.so.6

Best, Welles

@alexdobin
Copy link
Owner

Thanks, Welles!
Sorry, I have to ask you to debug remotely...
Could you do

p ex1

and then

fr 1
p tr1
p trExI[tr1]

@wir963
Copy link
Author

wir963 commented Nov 13, 2020

Haha this is a new experience but sure happy to help. If everything is identical, do you know why you can't reproduce it? Is it OS specific?

p ex1 returns $1 = 2147483647

fr 1 returns

#1  0x000000000042da56 in Transcriptome::classifyAlign (this=0x3e64cb0, alignG=0x2ab209690dd0, nAlignG=2, 
    readAnnot=...) at Transcriptome_classifyAlign.cpp:202
202	            int aStatus=alignToTranscript(aG, trS[tr1], trExN[tr1], exSE+2*trExI[tr1], exLenCum+trExI[tr1], distTrEnds);

p tr1 returns $2 = 50817

p trExI[tr1] returns $3 = 323004

Best, Welles

@alexdobin
Copy link
Owner

Most likely this is a bug in the code that depends on uninitialized memory. This is one of the main insecurities of C/C++: if a variable is not explicitly initialized, its value can be anything. When you run it, this random value causes the seg-fault.
One way to look for these errors is to use valgrind tool, which actually checks all memory accesses. However, it did not report any errors, which is a puzzle.

Many thanks for these tests! ex1=2147483647 is what causes the seg-fault, so we pretty much localized the problem.
I think I have enough info now to dig into the read that causes the problem - this run included this read, right?

@wir963
Copy link
Author

wir963 commented Nov 13, 2020

Okay, good to know - thanks for the explanation. Yep, this run used the original fastq files.

Do you roughly know how long this fix might take?

@alexdobin
Copy link
Owner

Hope to figure it out over the weekend. If you waiting for just this one file, I can send you the mapping results.

@wir963
Copy link
Author

wir963 commented Nov 16, 2020

Hey Alex,

I'm hoping to run STARsolo on a large number of datasets (and I've actually run into seg faults with multiple different datasets) so no worries about sending the mapping results. Please do let me know when you've got a fix though.

Best, Welles

alexdobin added a commit that referenced this issue Nov 19, 2020
…oType SmartSeq runs. STAR_2.7.6a_patch_2020-11-16.
@alexdobin
Copy link
Owner

Hi Welles,

I believe I have fixed the bug, please try the latest patch on GitHub master.
Many thanks for helping to debug this problem!

Cheers
Alex

@wir963
Copy link
Author

wir963 commented Nov 19, 2020

Hey Alex,

Using the master branch worked for this example! Thanks for your work debugging this error! I'll run it on a couple more datasets over the next few days/weeks and I'll let you know if I run into any more issues but I'll go ahead and close it out now!

Best, Welles

@wir963 wir963 closed this as completed Nov 19, 2020
@eltonjrv
Copy link

Dear Alex or any other from the STAR team,
I've just experienced the same seg-fault issue as above using the latest version (2.7.10a).
If I remove --soloType SmartSeq, it rans well.
I wonder if the same bug reappeared and needs to be re-fixed.
Thanks in advance,
Best,
Elton

@alexdobin
Copy link
Owner

Hi Elton,

please try this patch which fixed some of the possible seg-faults:
https://github.com/alexdobin/STAR/releases/tag/2.7.10a_alpha_220818

@eltonjrv
Copy link

Hi Alex,
Thanks for pointing me out to this patch.
However, although taking longer in the "..... started mapping" stage, I'm still getting the same seg-fault (core dumped) error.
I downloaded the tar.gz source from the github address above, and compiled it with "make STAR", which appeared to be successful.
First I went straight to the STAR binary provided there, but got dependencies' error.
Hope it is solvable.
Thanks again,
Elton

@alexdobin
Copy link
Owner

Hi Elton,

this may point to another bug.
Could you please send me the Log.out file?
I will try to reproduce the problem.

@eltonjrv
Copy link

eltonjrv commented Nov 1, 2022

Hi Alex,
Here we go with a Log.out example. The same is happening to all 1,430 SmartSeq fastq files that need to analyze
SRR7666632-vsGRCh38.d1.vd1_Log.out.txt
I hope you can find the bug.
Thanks a lot,
Elton

@alexdobin
Copy link
Owner

Hi Elton,

the SmartSeq options works best with the --readFilesManifest readMani.tsv with the readMani.tsv containing the list of fastq files:

read1 \t read2 \t SampleID

where SampleID is a unique identifier for each sample, which is akin to a cell barcode.

Alternatively, you can add this option to your command line, which will supply sample ID without the manifest file:
--outSAMattrRGline ID:Sample1

@eltonjrv
Copy link

eltonjrv commented Nov 4, 2022

Hey Alex,
Thanks a lot!
It seems to be runnning smoothly with the manifest file.
Had a problem of "not enough memory" while BAM sorting. I've just put to rerun on a higher memory node from my cluster.
I'll let you know if other issue arises, but don't think it will.
Cheers,
Elton

@jenkshehmeyer
Copy link

Hello Alex,

I am getting a segmentation fault, but with --soloType CB_samTagOut (paired fastq files for mapping and a third fastq containing the cell barcode). The reads align fine if I use --readMapNumber up to a certain read number, or if I remove all STARsolo-specific parameters, so I'm wondering if the issue could be related to what is reported here.

I'm using 2.7.11b.

Thank you so much for your help,
Best,
Jenks

@alexdobin
Copy link
Owner

Hi Jenks,

it's probably a different issue, the one above was resolved. Please try to find the read that causes the seg-fault and send it to me.

@jenkshehmeyer
Copy link

Hi Alex,

Turns out, the quality string length for the barcode read for the problematic read number is incorrect. So that would explain why the alignment is failing.

However, STAR only provides the appropriate error message ("quality string length is not equal to sequence length") if I use --readMapNumber so that this problematic read is the very last read; otherwise I get a segmentation fault.

Thank you for your help with this, I bet it will run fine once I fix the fastq file.

Best,
Jenks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
issue: code Likely to be an issue with STAR code
Projects
None yet
Development

No branches or pull requests

4 participants