-
Notifications
You must be signed in to change notification settings - Fork 502
-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation Fault when running STARsolo SmartSeq #1040
Comments
Quick update - I looked at the
There are 22 total files so it must have crashed in the middle of the mapping step |
@alexdobin another update for you. When I exclude the above sample, STARsolo completes successfully. When I re-add this sample, it fails with the same segmentation fault error. I checked and I was able to process the exact same set of fastq files using STAR in regular mode so I would be surprised if there is a problem with the files but I could check. I'm including the first four lines from each of the fastq files if that helps and I'm happy to share the entire files as they're publicly available. FASTQ/trimmed/BC08-SRR5023576_1.fastq.gz
FASTQ/trimmed/BC08-SRR5023576_1.fastq.gz
|
Hi Welles, does it fail when you run just this file alone in the Cheers |
Hey @alexdobin, I'm in the middle of regenerating all of the files so I'll confirm in a few minutes hopefully but yes, it did cause an error when I ran the file alone in the Best, Welles |
Hey @alexdobin, I can confirm that I can run STARsolo for all samples belonging to patient BC08 except for SRR5023576. When I create a manifest file containing just SRR5023576, I get the above error. Moreover, I think the bug is specific to STARsolo because I can run the exact same code removing the STARsolo specific commands using a manifest file containing just SRR5023576, it works. STARsolo code that fails for SRR5023576
STAR code that works for SRR5023576
Best, Welles |
Hi Welles, can you try to run with the same sample in the manifest.tsv, i.e.
I have tried it and did not get any errors. Cheers |
Hey Alex, Hmmm that's weird. I am using trimmed reads so perhaps that is the cause. I'll re-run with the untrimmed reads for that sample and let you know what happens. Thanks again for your support. Best, Welles |
Hey Alex, Sorry for the delay in getting back to you. I just retried using the freshly downloaded reads from SRA and I still get a Segmentation fault. I repeated this using SRR5023575 and it worked so I don't believe it is an obvious error on my part. I am currently using the reference human genome (HG38.p13 gencode release 34) (not the primary assembly because I am more interested in filtering out all human reads than accurately quantifying the reads) concatenated with RNA spike-in sequences RNA_SPIKE_1, RNA_SPIKE_4 and RNA_SPIKE_7 (which I downloaded via https://www.ebi.ac.uk/arrayexpress/files/E-MTAB-5466/E-MTAB-5466.additional.1.zip and filtered for the three sequences of interest and hand-crafted a gtf file). On a side note, does STAR care about whether the letters for fasta sequences are capitalized or lower case? Here's the log file - Log.out.txt. Let me know how I can be of help. I'm happy to re-run it on my end using a different genome if that's easier - just let me know. Best, Welles |
Hi Welles, thanks! The lower-case letter in FASTA are no problem, they are still converted to proper bases. Cheers |
Hey Alex, Thanks for doing that. I'm attaching the GTF file for spike-ins Best, Welles |
Hi Welles, thanks for the gtf, I generated the genome with it, and still cannot reproduce the seg-fault. Could you investigate it a bit further?
Cheers |
Hey Alex, I ran it with the static executable and got the same error. All of the md5 sums that you sent over matched except for the md5sum for genomeParameters.txt (you provided
Am I making a mistake above? Did you use different code? The above was generated using the bioconda installation of star and I am happy to regenerate the index from the static binary if you don't see an obvious difference. Thanks again for all of your help on this. Best, Welles |
Hi Welles, thanks for the checks! If you are willing to investigate this further, we will need to figure out which reads cause the problem. Could you run the same example with --readMapNumber 1000 to see if the seg-fault occurs for a very small number of reads? Thanks! |
Hey Alex, Yeah, glad to help (and very curious). Best, Welles |
Hey Alex, I believe the error is in read 12,662 as Best, Welles |
Hey Alex, Here is line 12,662 from SRR5023576_1.fastq
Here is line 12,662 from SRR5023576_2.fastq
If I delete these lines from the fastq file and run STAR with Best, Welles |
Hi Welles, thanks a lot for these tests, I think we are zeroing on a problem. Could do you a couple of more things?
from inside the source/ directory Cheers |
Hey Alex,
Best, Welles |
Thanks! Now, can you run the debugger with
and at the prompt type |
Hey Alex, Here is the output of bt
Best, Welles |
Thanks, Welles!
and then
|
Haha this is a new experience but sure happy to help. If everything is identical, do you know why you can't reproduce it? Is it OS specific?
Best, Welles |
Most likely this is a bug in the code that depends on uninitialized memory. This is one of the main insecurities of C/C++: if a variable is not explicitly initialized, its value can be anything. When you run it, this random value causes the seg-fault. Many thanks for these tests! ex1=2147483647 is what causes the seg-fault, so we pretty much localized the problem. |
Okay, good to know - thanks for the explanation. Yep, this run used the original fastq files. Do you roughly know how long this fix might take? |
Hope to figure it out over the weekend. If you waiting for just this one file, I can send you the mapping results. |
Hey Alex, I'm hoping to run STARsolo on a large number of datasets (and I've actually run into seg faults with multiple different datasets) so no worries about sending the mapping results. Please do let me know when you've got a fix though. Best, Welles |
…oType SmartSeq runs. STAR_2.7.6a_patch_2020-11-16.
Hi Welles, I believe I have fixed the bug, please try the latest patch on GitHub master. Cheers |
Hey Alex, Using the master branch worked for this example! Thanks for your work debugging this error! I'll run it on a couple more datasets over the next few days/weeks and I'll let you know if I run into any more issues but I'll go ahead and close it out now! Best, Welles |
Dear Alex or any other from the STAR team, |
Hi Elton, please try this patch which fixed some of the possible seg-faults: |
Hi Alex, |
Hi Elton, this may point to another bug. |
Hi Alex, |
Hi Elton, the SmartSeq options works best with the
where SampleID is a unique identifier for each sample, which is akin to a cell barcode. Alternatively, you can add this option to your command line, which will supply sample ID without the manifest file: |
Hey Alex, |
Hello Alex, I am getting a segmentation fault, but with --soloType CB_samTagOut (paired fastq files for mapping and a third fastq containing the cell barcode). The reads align fine if I use --readMapNumber up to a certain read number, or if I remove all STARsolo-specific parameters, so I'm wondering if the issue could be related to what is reported here. I'm using 2.7.11b. Thank you so much for your help, |
Hi Jenks, it's probably a different issue, the one above was resolved. Please try to find the read that causes the seg-fault and send it to me. |
Hi Alex, Turns out, the quality string length for the barcode read for the problematic read number is incorrect. So that would explain why the alignment is failing. However, STAR only provides the appropriate error message ("quality string length is not equal to sequence length") if I use --readMapNumber so that this problematic read is the very last read; otherwise I get a segmentation fault. Thank you for your help with this, I bet it will run fine once I fix the fastq file. Best, |
Hey Alex,
I'm trying to transition from running STAR individually on Smart-seq2 data to running STARsolo --soloType SmartSeq. Unfortunately, I'm getting a segmentation fault. I would be surprised if the error is due to the genome or the fastq files as I have run them through STAR (not STARsolo) previously. I'm using STAR 2.7.6a installed from bioconda. See below for the output and let me know if you have any questions. The error occurs ~10-15 minutes after "started mapping" is logged.
Thanks, Welles
The text was updated successfully, but these errors were encountered: