Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{ts} debug missing tag #281

Merged
merged 2 commits into from Oct 12, 2018
Merged

{ts} debug missing tag #281

merged 2 commits into from Oct 12, 2018

Conversation

TomSmithCGAT
Copy link
Member

This enables reads missing a UMI or CB tag to be skipped rather than throwing an error (See #276).

As part of this, I've also moved the check for UMI/CB tags to before the check for bundle yielding as per all the other checks which cause reads to be skipped.

@TomSmithCGAT
Copy link
Member Author

@IanSudbery - nudge nudge

@TomSmithCGAT TomSmithCGAT merged commit 9784ce6 into master Oct 12, 2018
@TomSmithCGAT TomSmithCGAT deleted the {TS}-DebugMissingTag branch October 12, 2018 07:47
@Wenjuan-ZHU
Copy link

Wenjuan-ZHU commented Dec 4, 2019

Hi @TomSmithCGAT,

Today, I just downloaded UMI-Tools: Version 1.0.0, and try to use it to do "umi_tools dedup" on BAM files from cellranger (10X genomics data). However, I still have this error, "KeyError: "tag 'CB' not present".
I noticed that you actually had fixed this issue before. I did not know why it still did not work. is it possible for you to check whether this bug had been fixed or not, again?
many thanks.

Best regards,
Wenjuan

@Wenjuan-ZHU
Copy link

For example, NH:i:1 HI:i:1 AS:i:69 nM:i:9 RE:A:I xf:i:0 li:i:0 CR:Z:CCGTAGACACTGTGTA CY:Z:FFFFFFFFFFFFFFFF UR:Z:TCAGGAACCG UY:Z:FFFFFFFFFF UB:Z:TCAGGAACCG RG:Z:1823_BA24:0:1:HCC5FDMXX:1

@TomSmithCGAT
Copy link
Member Author

Hi Wenjun,

by default, UMI-tools expects the cell barcodes to be provided in the 'CB' tag. If you have another tag denoting the cell barcode, you can add the option --cell-tag=CELL_TAG, where CELL_TAG appears to be CR in your example above

@Wenjuan-ZHU
Copy link

Hi @TomSmithCGAT,
yes, I knew this. Here is my command:
umi_tools dedup --per-cell --output-stats=sample --stdin=test.bam --stdout=test.dedup.bam --extract-umi-method=tag --umi-tag=UB --cell-tag=CB.

However, In BAM files from cellranger standard pipeline, the majority of lines have the CB tag. But, some lines do not have CB tag, like this one.
NH:i:1 HI:i:1 AS:i:69 nM:i:9 RE:A:I xf:i:0 li:i:0 CR:Z:CCGTAGACACTGTGTA CY:Z:FFFFFFFFFFFFFFFF UR:Z:TCAGGAACCG UY:Z:FFFFFFFFFF UB:Z:TCAGGAACCG RG:Z:1823_BA24:0:1:HCC5FDMXX:1

Possibly that is why umi_tools had this error, "KeyError: "tag 'CB' not present".

Best regards,
Wenjuan

@TomSmithCGAT
Copy link
Member Author

Ah I see. Sorry, I misunderstood the problem. I'll take another look at this now

@TomSmithCGAT
Copy link
Member Author

I can't see any obvious reason why this would happen. This should be caught here:

try:
umi, cell = self.barcode_getter(read)
except KeyError:
error_msg = "Read skipped, missing umi and/or cell tag"
if self.read_events[error_msg] == 0:
# pysam renamed .tostring -> to_string in 0.14
# .tostring requies access to the parent AlignmentFile
try:
formatted_read = read.to_string()
except AttributeError:
formatted_read = read.query_name
U.warn("At least one read is missing UMI and/or "
"cell tag(s): %s" % formatted_read)
self.read_events[error_msg] += 1
continue

Would you mind posting the full error message here so I can see where the KeyError originates and confirming your version number (umi_tools --version)

@Wenjuan-ZHU
Copy link

Please check this file including the error and real example of BAM file.
test.zip

You may want to use this small BAM file to test software.

many thanks.

@TomSmithCGAT
Copy link
Member Author

Ah, right, got it! The problem is that the KeyError originates from the parsing of the BAM to generate the null distribution for the stats (code below) rather than the parsing for deduplication (code above). I'll add a patch for this now.

for read in self.inbam:
if read.is_unmapped:
continue
if read.is_read2:
continue
self.umis[self.barcode_getter(read)[0]] += 1

@TomSmithCGAT
Copy link
Member Author

Hi Wenjuan,

Could you test the version of UMI-tools on the {TS}-DebugCellTag branch. This works with the test BAM so should be fine but just want to be sure before pushing changes to the master.

Thanks,

Tom

@Wenjuan-ZHU
Copy link

Hi Tom,

After testing it, this version of UMI-tools works very well. Thank you so much for your help.

Wenjuan

@TomSmithCGAT
Copy link
Member Author

Great. The branch has been merged into master. Thanks for bringing this to our attention and for providing me with the example BAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants