Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault on specific fastqs when running with --soloType SmartSeq #966

Closed
KrisDavie opened this issue Jul 9, 2020 · 5 comments
Labels

Comments

@KrisDavie
Copy link

Hey @alexdobin,

The new --soloType SmartSeq option is really going to help me out in some projects I am working on, so thanks very much for implementing this. In my first tries at getting this running, I'm running into some issues though.

I have fastq files from this GEO entry, however, when I pass the full manifest to STAR, after the first couple of fastq's being processed, I get a segmentation fault (See Log.segfault.out).

STAR \
--soloType SmartSeq \
--readFilesManifest ./STARsolo_Manifest_Segfault.tsv \
--genomeDir /staging/leuven/res_00001/genomes/mus_musculus/GRCm38_iGenomes/indexes/STAR/2.7.5a \
--soloUMIdedup Exact \
--soloStrand Unstranded \
--soloFeatures Gene \
--soloCellFilter None

Jul 09 15:00:03 ..... started STAR run
Jul 09 15:00:04 ..... loading genome
Jul 09 15:00:56 ..... started mapping
[1]    9253 segmentation fault (core dumped)  STAR --soloType SmartSeq --readFilesManifest ./STARsolo_Manifest_Segfault.tsv

Running with just the first of these files (duplicated to allow solo to run), actually works (See: Log.working.out):

STAR \
--soloType SmartSeq \
--readFilesManifest ./STARsolo_Manifest_Working.tsv \
--genomeDir /staging/leuven/res_00001/genomes/mus_musculus/GRCm38_iGenomes/indexes/STAR/2.7.5a \
--soloUMIdedup Exact \
--soloStrand Unstranded \
--soloFeatures Gene \
--soloCellFilter None
Jul 09 15:10:38 ..... started STAR run
Jul 09 15:10:38 ..... loading genome
Jul 09 15:11:06 ..... started mapping
Jul 09 15:18:48 ..... finished mapping
Jul 09 15:18:48 ..... started Solo counting
Jul 09 15:18:56 ..... finished Solo counting
Jul 09 15:18:56 ..... finished successfully

But then running in the same way with the second file also causes a segfault (See Log.10_only.out):

STAR \
--soloType SmartSeq \
--readFilesManifest ./STARsolo_Manifest_Segfault_10_Only.tsv \
--genomeDir /staging/leuven/res_00001/genomes/mus_musculus/GRCm38_iGenomes/indexes/STAR/2.7.5a \
--soloUMIdedup Exact \
--soloStrand Unstranded \
--soloFeatures Gene \
--soloCellFilter None
Jul 09 15:20:42 ..... started STAR run
Jul 09 15:20:42 ..... loading genome
Jul 09 15:21:56 ..... started mapping
[1]    19060 segmentation fault (core dumped)  STAR --soloType SmartSeq --readFilesManifest  --genomeDir  --soloUMIdedup

Logs.zip
Manifests.zip

I can't see anything at first glance that is different between these files, they both map successfully if I just use STAR in regular mode:

STAR \
--genomeDir /staging/leuven/res_00001/genomes/mus_musculus/GRCm38_iGenomes/indexes/STAR/2.7.5a \
--readFilesIn 00.RAW/VISp_Sim1-Cre_KJ18__Cell_10_1.fastq 00.RAW/VISp_Sim1-Cre_KJ18__Cell_10_2.fastq \
--quantMode GeneCounts
Jul 09 15:32:43 ..... started STAR run
Jul 09 15:32:43 ..... loading genome
Jul 09 15:33:23 ..... started mapping
Jul 09 15:35:23 ..... finished mapping
Jul 09 15:35:23 ..... finished successfully

They were also both downloaded at the same time in the same manner with the following code block:

srx=SRX4213501
wget "http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term=${srx}" -O ${srx}_info.csv
count=0
for srr in `tail -n+2 ${srx}_info.csv | cut -f1 -d',' | sort -u`; do
    (( count++ ))
    fasterq-dump -S -o ${samples[${srx}]}__Cell_${count} ${srr} -t /dev/shm/vsc30922/ -e 12
done

For reference, Cell_1 from the logs is SRR7332334 and Cell_10 is SRR7332343.

This is the same behaviour with the statically linked binary in the git repo and binaries compiled manually (I tested 2.7.5a and the master repo at 1552aa0).

The machine this is running on is as follows:
CPU: 2x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, 18 cores, 1 thread per core
Memory: 755Gb RAM
Storage: GPFS volume with >10Tb free space
OS: CentOS 7

For extra information, this is also happening with another dataset, although that is private data.

Any insight you can provide here would be very helpful, and if you have anything you'd like me to test, please just let me know.

Cheers,

Kris

@alexdobin alexdobin added the bug label Jul 16, 2020
@alexdobin
Copy link
Owner

Hi Kris,

thanks for the detailed bug report. I was not able to reproduce the problem, unfortunately. I mapped the SRR7332343 to the GRCm38 mouse genome with Gencode M 24 annotations, with 3 copies of the same file in the manifest - I think this wat you did in the "10_only" case. Valgrind also did not find any problems. Could you please send me the links to the genome fasta and gtf that you used? That's the only thing different in my tests now. Also, could you try to map with this fastq listed in the manifest only once?

Thanks!
Alex

@ghuls
Copy link

ghuls commented Aug 1, 2020

I think it is the same issue that we had in:
#558

Which is fixed by: 9a5bb6a

@KrisDavie
Copy link
Author

Hey @alexdobin,

Sorry this ended up not being reproducible!

As @ghuls mentioned, this was likely the same as the linked issue. Our cluster has just gone down for maintenance, but once it is back up next week, I'll check the latest version/master and get back to you.

If in the meantime you want to test the index, I used the Illumina iGenomes (hg38) files (http://igenomes.illumina.com.s3-website-us-east-1.amazonaws.com/Homo_sapiens/UCSC/hg38/Homo_sapiens_UCSC_hg38.tar.gz) and built an index using the genome.fa and gene.gtf contained within it.

Many thanks for taking a look,

Kris

@KrisDavie
Copy link
Author

Hey @alexdobin,

Our cluster came back up, and I can confirm that with the same command and files as before, this seems to be fixed in 2.7.5b.

Many thanks, this feature is going to simplify some of my pipelines for sure!

Kris

@alexdobin
Copy link
Owner

Hi Kris,

great, thanks for confirming it!

Cheers
Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants