-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STARsolo: Only one read in the BAM file contain filtered barcode #954
Comments
Let's first figure out why there are only zeros in Barcodes.stats. The Log.out at the end shows that barcodes were detected. What is the content of Solo.out/Gene/Features.stats and Solo.out/Gene/Summary.csv? |
Both seems to be well populated: Features: https://termbin.com/76zg Summary: https://termbin.com/42hb In the Summary, the Reads With Valid Barcode is suspicious, but I presume this is 100% and not literally a single read with a valid barcode. |
Right, this is 100% - it means that all the barcodes match the whitelist, i.e. all were error corrected, which should not happen. This explains why most lines in Barcode.stats are zero, but nExactMatch should not be zero - should contain total number of reads. Can you try to cut ~100k lines from the original BAM, and go with the whole procedure again - convert bam2fastq and then run STARsolo. If it still has this problem, can you share this small BAM, so that I can reproduce the problem. Thanks! |
Done the procedure again, only 100k reads:
Log.final.out: https://termbin.com/riue
Somehow I have lost 47080 reads?
Never mind. I checked the aligned file out of STAR and it had above 100k reads. So the "duplicated" reads are reads that were mapped on several places. I tried to grep a read from the original file in the new file and couldn't get a match, so possibly some read names (the last two numbers) can change? Or the Also apparently some reads do not have CB tag?
but:
So continuing exploration: Barcodes.stats are exactly the same, all zeros. Features: https://termbin.com/8wu4
I don't have whitelist. I presumed that since I am working on already preprocessed data, I don't need it? Regarding
And first few lines in
Anything suspicious looking? Btw. thanks again Alex for helping me here. |
Aah, I missed that you are were running it without whitelist. I think this explains all the "weird" observations. |
I can confirm that this fixed the problem. Strange that the None value is allowed but stuff breaks with it. |
Hi Jiří does it break or does it run to completion? Cheers |
It runs into completion. What I meant by break is that (as title says) only a single read contained the filtered barcode, which was quite surprising. Even without barcode correction, I would expect more. Or less filtered barcodes. But the number of filtered barcodes remain relatively (if not completely) constant and similar (if not the same) as the number of filtered barcodes obtained from 10x |
You are right - for filtered barcodes, even without the whitelist, you should see many reads in the BAM file. Thanks! |
Can't replicate the problem any more after the problem with
|
I am remapping a BAM file created by 10x cell_ranger (which use STAR internally).
I have converted
BAM
intofastq
files with 10Xbamtofastq
. From these files, R1 seems to contain a barcode, R2 seems to contains a read and I1 contains god knows what.I have fed these into
STARsolo
(in the orderR2 R1
as described in manual) and gotAligned.sortedByCoords.out.BAM
as well as filtered barcodes inSolo.out/Gene/filtered/barcodes.tsv
which should, if I understand everything correctly, contain barcodes (CB) for well-represented cells with a lot of reads.I then used
pysam
to check this, but by running:which should search the bam file and found every single read contained particular barcode. Running this with the first barcode in
filtered/barcodes.tsv
I got... 1. A single read contained this barcode. I am completely baffled by this.Also, the number of barcodes changes from 160 000 (original BAM) to some 260 000 (new BAM).
I am also confused by
Barcodes.stats
, I got all zeros: https://termbin.com/6gptsBoth of these would suggest that maybe I didn't do something right?
Log files:
Log.out: https://termbin.com/xlgt
Log.final.out: https://termbin.com/l1e9
Thank you.
The text was updated successfully, but these errors were encountered: