Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to parse cell barcodes from bam files #5

Closed
kevingmonahan opened this issue Apr 2, 2021 · 6 comments
Closed

Failure to parse cell barcodes from bam files #5

kevingmonahan opened this issue Apr 2, 2021 · 6 comments

Comments

@kevingmonahan
Copy link

scTE fails to find cell barcode information in bam files I generated using the cell ranger pipeline:

$ scTE -i possorted_genome_bam.bam -o out_rep2 -x /software/scTE/mm10.exclusive.idx --hdf5 True -CB CB -UMI UB
  DEBUG   : Creating converter from 7 to 5
  DEBUG   : Creating converter from 5 to 7
  DEBUG   : Creating converter from 7 to 5
  DEBUG   : Creating converter from 5 to 7
  INFO    : Parameter list:
  Sample = out_rep2
  Reference annotation index = /software/scTE/mm10.exclusive.idx
  Minimum number of genes required = 200
  Minimum number of counts required = None
  Number of threads = 1
  
  INFO    : Loading the genome annotation index... 2021-04-02 18:14:28
  INFO    : Loaded '/software/scTE/mm10.exclusive.idx' binary file with 3900779 items
  ['1', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '2', '3', '4', '5', '6', '7', '8', '9', 'M', 'X', 'Y']
  INFO    : Finished loading the genome annotation index... 2021-04-02 18:15:01
  
  INFO    : Processing BAM/SAM files ...2021-04-02 18:15:01
  ERROR   : The input file possorted_genome_bam.bam has no cell barcodes information, plese make sure the aligner have add the cell barcode key, or set CB to False

The bam files have CB and UB flags:

$ samtools view possorted_genome_bam.bam | head
	A00521:52:HHVH7DMXX:1:2126:5737:27273   16      chr1    3000239 255     91M     *       0       0       TTTCATCCAGGTTTTCCTGGTTTTTTTTTAGTATAGCCTTTCATAGTAGAATCTGATGATGTTTTTGATATCCTCATGTTCTGTTGTTATG     FFFF:FFFFFFFF:FFFFFFFFFFFFF,FFFFF,FFFFF:FFF:FFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF      NH:i:1  HI:i:1  AS:i:85 nM:i:2  RG:Z:Rachel_control_rep2:0:1:HHVH7DMXX:1        RE:A:I  xf:i:0  CR:Z:ATTATCCCAGTATGCT   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:ATTATCCCAGTATGCT-1 UR:Z:AGGTCCACTT UY:Z:FFFFFFFFFF UB:Z:AGGTCCACTT
	A00521:52:HHVH7DMXX:1:2126:5936:27398   16      chr1    3000239 255     91M     *       0       0       TTTCATCCAGGTTTTCCTGGTTTTTTTTTAGTATAGCCTTTCATAGTAGAATCTGATGATGTTTTTGATATCCTCATGTTCTGTTGTTATG     FF:FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF      NH:i:1  HI:i:1  AS:i:85 nM:i:2  RG:Z:Rachel_control_rep2:0:1:HHVH7DMXX:1        RE:A:I  xf:i:0  CR:Z:ATTATCCCAGTATGCT   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:ATTATCCCAGTATGCT-1 UR:Z:AGGTCCACTT UY:Z:FFFFFFFFFF UB:Z:AGGTCCACTT
	A00521:52:HHVH7DMXX:1:1470:6668:19617   16      chr1    3000373 255     91M     *       0       0       TATGCCCTCTAGTTAGTCTGGCTAAGGGTTTATCTATCTTGTTGACTTTCTCAAAGAACCAGCTACTAGTTTGGTTGATTCTTTGAATATT     FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF      NH:i:1  HI:i:1  AS:i:87 nM:i:1  RG:Z:Rachel_control_rep2:0:1:HHVH7DMXX:1        RE:A:I  xf:i:0  CR:Z:ACTTACTCACAGGCCT   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:ACTTACTCACAGGCCT-1 UR:Z:TGGTGTTGGT UY:Z:FFFFFFFFFF UB:Z:TGGTGTTGGT
	A00521:52:HHVH7DMXX:2:1410:7952:4460    16      chr1    3009349 1       1S65M25S        *       0       0       GTTTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTGTGAACTAACCCATGTACTCTGCGTTGATACCAC     FFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF      NH:i:3  HI:i:1  AS:i:59 nM:i:2  ts:i:25 RG:Z:Rachel_control_rep2:0:1:HHVH7DMXX:2        RE:A:I  xf:i:0  CR:Z:TAGACCATCCGATATG   CY:Z:FFF:FFFFFFFFFFFF   CB:Z:TAGACCATCCGATATG-1 UR:Z:AACTATAACG UY:Z:FFFFFFFFFF UB:Z:AACTATAACG
@jphe
Copy link
Contributor

jphe commented Apr 3, 2021

Can you check if the bam file has empty CB reads? like this: jphe/scTE#7 (comment)

As if you set -CB CB, while some reads has no CB:Z tag, scTE will report such warnning.

@kevingmonahan
Copy link
Author

Yes, there were reads without CB:Z tags. After removing those reads, everything ran fine. Thanks!

@lnrlz
Copy link

lnrlz commented Jan 30, 2023

Yes, there were reads without CB:Z tags. After removing those reads, everything ran fine. Thanks!

Hi kevingmonahan, may I ask how do you filter the reads without barcodes? Thanks a lot!

@kevingmonahan
Copy link
Author

I filtered out those lines using samtools and awk.

For example:

samtools view possorted_genome_bam.bam -h | awk '/^@/ || /CB:/' | samtools view -h -b > possorted_genome_bam.clean.bam

@lnrlz
Copy link

lnrlz commented Jan 30, 2023

I filtered out those lines using samtools and awk.

For example:

samtools view possorted_genome_bam.bam -h | awk '/^@/ || /CB:/' | samtools view -h -b > possorted_genome_bam.clean.bam

I appreciate it very much! I tried the same command, but got an error (please see below):
samtools view possorted_genome_bam.bam -h | awk '/^@/ || /CB:/' |samtools view -h -b > possorted_genome_CB_clean.bam

[E::hts_hopen] Failed to open file possorted_genome_bam.bam [E::hts_open_format] Failed to open file possorted_genome_bam.bam samtools view: failed to open "possorted_genome_bam.bam" for reading: Exec format error

The samtools cannot open the bam file, which was produced from 10X Cellranger cloud analysis.
Do you have any idea about the reason for this?
Thanks again!

@kevingmonahan
Copy link
Author

kevingmonahan commented Jan 30, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants