Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STARSolo CR/UR scrambled in TranscriptomeSAM #1167

Open
mvanins opened this issue Mar 8, 2021 · 2 comments
Open

STARSolo CR/UR scrambled in TranscriptomeSAM #1167

mvanins opened this issue Mar 8, 2021 · 2 comments
Labels

Comments

@mvanins
Copy link

mvanins commented Mar 8, 2021

Hi,

In running STARSolo with --quantMode TranscriptomeSAM and --outSAMattributes CR UR, the CR and UR tags in the transcriptome bam do not match those in the genome bam. I checked versions 2.7.8a, 2.7.7a, 2.7.6a, and 2.7.5a, and I didn't see this behaviour in 2.7.5a or 2.7.6a. It appears that the sequences for these incorrect tags are being pulled from the read immediately upstream of the proper read.

To double check, I added the CB and UMI sequences to the read name in the UMI-tools style. In the genome bam CR,UR,CB,UB match the read-name sequences, and give the correct tags (CR:Z:TGTTACCGTC UR:Z:AAAACCAAAT)

$ samtools view SoloTest_Aligned.sortedByCoord.out.bam | grep NS500414:754:HGN7LBGXH:2:23311:17111:14594_TGTTACCGTC_AAAACCAAAT
NS500414:754:HGN7LBGXH:2:23311:17111:14594_TGTTACCGTC_AAAACCAAAT        0       chr12   6534855 255     7M1632N28M      *       0       0       TCAACGGATTTGGTCGTATTGGGCGCCTGGTCACC     EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE     NH:i:1  HI:i:1  AS:i:34 nM:i:0  NM:i:0  MD:Z:35 jM:B:c,21       jI:B:i,6534862,6536493  CR:Z:TGTTACCGTC UR:Z:AAAACCAAAT GX:Z:ENSG00000111640.15 GN:Z:GAPDH  CY:Z:AAAAAEEEEE UY:Z:AAAAAEEEEE sS:Z:TGTTACCGTCAAAACCAAAT       sQ:Z:AAAAAEEEEEAAAAAEEEEE       sM:i:0  CB:Z:TGTTACCGTC UB:Z:AAAACCAAAT

Whereas in the transcriptome bam they do not (CR:Z:AGTTCTTCCG UR:Z:GCCGAATCTT)

$ samtools view SoloTest_Aligned.toTranscriptome.out.bam | grep NS500414:754:HGN7LBGXH:2:23311:17111:14594_TGTTACCGTC_AAAACCAAAT
NS500414:754:HGN7LBGXH:2:23311:17111:14594_TGTTACCGTC_AAAACCAAAT        256     ENST00000229239.10      99      0       35M     *       0       0       TCAACGGATTTGGTCGTATTGGGCGCCTGGTCACC     EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE     NH:i:7  HI:i:7  CR:Z:AGTTCTTCCG UR:Z:GCCGAATCTT GX:Z:ENSG00000111640.15 GN:Z:GAPDH      CY:Z:AAAAAEEEEE UY:Z:AAAAAEEEEE sS:Z:AGTTCTTCCGGCCGAATCTT   sQ:Z:AAAAAEEEEEAAAAAEEEEE       sM:i:0  CB:Z:AGTTCTTCCG
NS500414:754:HGN7LBGXH:2:23311:17111:14594_TGTTACCGTC_AAAACCAAAT        256     ENST00000396856.5       84      0       35M     *       0       0       TCAACGGATTTGGTCGTATTGGGCGCCTGGTCACC     EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE     NH:i:7  HI:i:6  CR:Z:AGTTCTTCCG UR:Z:GCCGAATCTT GX:Z:ENSG00000111640.15 GN:Z:GAPDH      CY:Z:AAAAAEEEEE UY:Z:AAAAAEEEEE sS:Z:AGTTCTTCCGGCCGA

Searching for this read in the input files, we see that the fastq files are properly synced, and it seems that the incorrect CR/UR are from the previous read in the fastq:

$ zcat SoloTeset_R1.trimmed.fastq.gz | grep -A3 -B4 -n NS500414:754:HGN7LBGXH:2:23311:17111:14594_TGTTACCGTC_AAAACCAAAT
147812821-@NS500414:754:HGN7LBGXH:2:23311:21169:14594_AGTTCTTCCG_GCCGAATCTT 1:N:0:ATCACG
147812822-AGCAGAGTGGCGCAGCGGAAGCGTGCTGGGC
147812823-+
147812824-EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
147812825:@NS500414:754:HGN7LBGXH:2:23311:17111:14594_TGTTACCGTC_AAAACCAAAT 1:N:0:ATCACG
147812826-TCAACGGATTTGGTCGTATTGGGCGCCTGGTCACC
147812827-+
147812828-EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
$ zcat SoloTest_R2.trimmed.fastq.gz | grep -A3 -B4 -n NS500414:754:HGN7LBGXH:2:23311:17111:14594_TGTTACCGTC_AAAACCAAAT
147812821-@NS500414:754:HGN7LBGXH:2:23311:21169:14594_AGTTCTTCCG_GCCGAATCTT 2:N:0:ATCACG
147812822-AGTTCTTCCGGCCGAATCTT
147812823-+
147812824-AAAAAEEEEEAAAAAEEEEE
147812825:@NS500414:754:HGN7LBGXH:2:23311:17111:14594_TGTTACCGTC_AAAACCAAAT 2:N:0:ATCACG
147812826-TGTTACCGTCAAAACCAAAT
147812827-+
147812828-AAAAAEEEEEAAAAAEEEEE

Edit: add 2.7.6a, fix version typos

@alexdobin alexdobin added the bug label Mar 8, 2021
alexdobin added a commit that referenced this issue Mar 8, 2021
…ambled in TranscriptomeSAM file Aligned.toTranscriptome.out.bam. This bug appeared in 2.7.7a. Fixed a bug causing seg-faults with --clipAdapterType CellRanger4 option.
@alexdobin
Copy link
Owner

Hi Michael,

thanks a lot for reporting this bug.
Please try the patch I just pushed to github-master, it should have fixed the issue.
I will make a 2.7.8b release in a couple of days.

Cheers
Alex

@mvanins
Copy link
Author

mvanins commented Mar 10, 2021

Hi Alex,

Indeed, this issue is fixed in 2.7.8a_2021-03-08

Thanks,
Mike

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants