Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STARsolo Segfault when Sorting BAM #902

Closed
ColeWunderlich opened this issue May 12, 2020 · 6 comments
Closed

STARsolo Segfault when Sorting BAM #902

ColeWunderlich opened this issue May 12, 2020 · 6 comments
Labels
bug resolved problem or issue that has been resolved

Comments

@ColeWunderlich
Copy link

Hello Alex,

When running STARsolo on a single sample of 10x V2 3' data I am getting a segfault at the beginning of the BAM sorting step. This happens both when I run the full sample and when subsetting to 10,000 reads.

I get this error on both 2.7.3a and a fresh build of 49dc707

The Error

May 06 22:54:58 ..... started STAR run
May 06 22:54:58 ..... loading genome
May 06 22:55:37 ..... started mapping
May 06 23:23:23 ..... finished mapping
May 06 23:23:25 ..... started Solo counting
May 06 23:25:25 ..... finished Solo counting
May 06 23:25:25 ..... started sorting BAM
8576 Segmentation fault      (core dumped) STAR --parametersFiles star_10x_mapping_params_noMulti.conf --runThreadN 16 --readFilesIn <path>/451LUBR3_S14_L004_R2_001.fastq.gz <path>/451LUBR3_S14_L004_R1_001.fastq.gz --outFileNamePrefix STAR_out/

Log File (end)

Joined thread # 14
Joined thread # 15
May 06 23:23:25 ..... started Solo counting
May 06 23:23:25 ... Starting Solo post-map for GeneFull
May 06 23:23:25 ... Finished allocating arrays for Solo 0.485777 GB
May 06 23:24:33 ... Finished reading reads from Solo files nCB=306716, nReadPerCBmax=58738, nMatch=64810423
May 06 23:25:07 ... Finished collapsing UMIs
May 06 23:25:24 ... Starting Solo post-map for SJ
May 06 23:25:24 ... Finished allocating arrays for Solo 0.00175602 GB
May 06 23:25:24 ... Finished reading reads from Solo files nCB=10433, nReadPerCBmax=506, nMatch=234302
May 06 23:25:25 ... Finished collapsing UMIs
May 06 23:25:25 ..... finished Solo counting
May 06 23:25:25 ..... started sorting BAM
Max memory needed for sorting = 820896435

Parameters Used

Sorting command

--outSAMtype BAM Unsorted SortedByCoordinate

Other Params

##### Final user re-defined parameters-----------------:
parametersFiles                   star_10x_mapping_params_noMulti.conf   
runMode                           alignReads
runThreadN                        16
genomeDir                         <path>/referenceGenome/gencodeV34/STAR_index_10x/
readFilesIn                       <path>/451LUBR3_S14_L004_R2_001.fastq.gz   <path>/451LUBR3_S14_L004_R1_001.fastq.gz   
readFilesCommand                  'zcat'   
outFileNamePrefix                 STAR_out/
outSAMtype                        BAM   Unsorted   SortedByCoordinate   
outSAMattributes                  NH   HI   AS   nM   CR   CY   UR   UY   CB   UB   GX   GN   sS   sQ   sM   
outFilterType                     BySJout
outFilterMismatchNmax             999
outFilterMismatchNoverReadLmax    0.04
alignIntronMin                    20
alignIntronMax                    1000000
alignMatesGapMax                  1000000
alignSJoverhangMin                8
alignSJDBoverhangMin              1
soloType                          CB_UMI_Simple
soloCBwhitelist                   <path>/10xBC_whitelists/github/737K-august-2016.txt   
soloFeatures                      GeneFull   SJ   
soloUMIdedup                      1MM_All   
soloCBmatchWLtype                 1MM_multi

-------------------------------
##### Final effective command line:
STAR   --runMode alignReads   --runThreadN 16   --genomeDir <path>/referenceGenome/gencodeV34/STAR_index_10x/   --readFilesIn <path>/451LUBR3_S14_L004_R2_001.fastq.gz   <path>/tmp/451LUBR3_S14_L004_R1_001.fastq.gz      --readFilesCommand 'zcat'      --outFileNamePrefix STAR_out/   --outSAMtype BAM   Unsorted   SortedByCoordinate      --outSAMattributes NH   HI   AS   nM   CR   CY   UR   UY   CB   UB   GX   GN   sS   sQ   sM      --outFilterType BySJout   --outFilterMismatchNmax 999   --outFilterMismatchNoverReadLmax 0.04   --alignIntronMin 20   --alignIntronMax 1000000   --alignMatesGapMax 1000000   --alignSJoverhangMin 8   --alignSJDBoverhangMin 1   --soloType CB_UMI_Simple   --soloCBwhitelist <path>/10xBC_whitelists/github/737K-august-2016.txt      --soloFeatures GeneFull   SJ      --soloUMIdedup 1MM_All      --soloCBmatchWLtype 1MM_multi
----------------------------------------

@alexdobin alexdobin added the issue: code Likely to be an issue with STAR code label May 13, 2020
@alexdobin
Copy link
Owner

Hi Cole,

could you send me the 10,000 reads that cause this seg-fault?

Thanks!
Alex

@alexdobin alexdobin reopened this May 13, 2020
@ColeWunderlich
Copy link
Author

Hey Alex,

Sorry for the delayed response. I made a mistake in my original post, it turns out it was 1,000 and not 10,000 reads.

I am subsetting using --readMapNumber which I believe just takes the first N reads from each file. I have attached the first 1,000 reads from each of my fastqs below.

Also, I forgot to mention that I am aligning against the latest Gencode v34 primary GTF and fasta.

Cole

R1_1st_1K_reads.fastq.gz
R2_1st_1K_reads.fastq.gz

@alexdobin alexdobin added bug and removed issue: code Likely to be an issue with STAR code labels May 20, 2020
alexdobin added a commit that referenced this issue May 20, 2020
@alexdobin
Copy link
Owner

Hi Cole,

this was a bug, thanks for reporting it. Please try the latest patch from GitHub master. It works fine now on your small test example.

Cheers
Alex

@ColeWunderlich
Copy link
Author

Hey Alex,

I rebuilt STARsolo and re-ran on the full sample and everything ran fine. Thanks for the bug fix!

Also, I noticed that part of the fix involved removing hard coding things to be Gene only. Does this mean error correction will now be done for either Gene or GeneFull ? Also, when both Gene and GeneFull are specified is error correction done for both, or does one take precidence?

Thanks,
Cole

@alexdobin
Copy link
Owner

Hi Cole,

this is a good observation. If only GeneFull is present, it will be used for error correction (this had to be fixed). If both Gene and GeneFull are present, Gene will be used for error correction.

Cheers
Alex

@alexdobin alexdobin added the resolved problem or issue that has been resolved label May 26, 2020
@ColeWunderlich
Copy link
Author

Thanks Alex!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug resolved problem or issue that has been resolved
Projects
None yet
Development

No branches or pull requests

2 participants