Skip to content

Handle KeyError caused by missing "CB" tags when calculating cell metrics#56

Merged
samanehsan merged 4 commits intomasterfrom
se-handle-reads-with-no-cb
Jan 3, 2019
Merged

Handle KeyError caused by missing "CB" tags when calculating cell metrics#56
samanehsan merged 4 commits intomasterfrom
se-handle-reads-with-no-cb

Conversation

@samanehsan
Copy link
Copy Markdown
Contributor

@samanehsan samanehsan commented Jan 2, 2019

Purpose

Records with no "CB" tag will fail in the CellMetrics.parse_extra_fields method in the latest version (v0.3.1) of sctools. Due to this issue, this version cannot be used by the CalculateCellMetrics task of the Optimus pipeline.

Fixes HumanCellAtlas/secondary-analysis#455

Zenhub ticket: https://app.zenhub.com/workspaces/dcp-backlogs-5ac7bcf9465cb172b77760d9/issues/humancellatlas/secondary-analysis/455


Changes

  • Check that the record has a "CB" tag before checking that it is equivalent to the "CR" tag to avoid getting a KeyError.
  • Add test to verify perfect_cell_barcodes

Review Instructions

  • No instructions.

PR Checklist

  • This PR added or updated tests.
  • This PR updated docstrings or documentation.
  • This PR applied Python style guidelines, specifically followed NumPy style docstrings for this repo.
  • This PR considered generalizability beyond our own use case.

Follow-up Discussions

  • Should the number of records without a "CB" tag be stored as another variable?
  • The test data for records with missing CB tags was pulled from an Optimus run using pbmc8k data -- is there something else we could/should use instead?

If a SAM record does not have a 'CB'
tag, do not include it in the count of
perfect_cell_barcodes.
Copy link
Copy Markdown
Member

@ambrosejcarr ambrosejcarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Made some minor suggestions.


self.perfect_cell_barcodes += (
record.get_tag(consts.RAW_CELL_BARCODE_TAG_KEY) == record.get_tag(consts.CELL_BARCODE_TAG_KEY))
try:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion: break up line 459 into two lines, only cover the get_tag call that should trigger the key error with the try: except to ensure we don't also catch all the reads in case we receive data whose tags don't match our other constant for some reason.

# set the input files
_gene_sorted_bam = _data_dir + '/small-gene-sorted.bam'
_cell_sorted_bam = _data_dir + '/small-cell-sorted.bam'
_cell_sorted_bam_missing_cell_barcodes = _data_dir + '/cell-sorted-missing-cb.bam'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would use os.path.join() instead of adding the slash for a bit more generality. Also line 41 below.

"""test the sctools cell metrics CLI invocation"""
return_call = TenXV2.calculate_cell_metrics(
args=['-i', _cell_sorted_bam, '-o', _test_dir + '/gene_metrics.csv'])
args=['-i', _cell_sorted_bam, '-o', _test_dir + '/cell_metrics.csv'])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also os.path.join

def test_metrics_number_perfect_barcodes(metrics, expected_value):
"""Test that each metric correctly identifies the number of perfect barcodes where CB == CR"""
def test_metrics_number_perfect_molecule_barcodes(metrics, expected_value):
"""Test that each metric correctly identifies the number of perfect molecule barcodes where UB == UR"""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 good catch.

@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #56 into master will increase coverage by 0.42%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #56      +/-   ##
==========================================
+ Coverage   95.67%   96.09%   +0.42%     
==========================================
  Files          27       27              
  Lines        2473     2486      +13     
==========================================
+ Hits         2366     2389      +23     
+ Misses        107       97      -10
Impacted Files Coverage Δ
src/sctools/test/test_metrics.py 100% <100%> (ø) ⬆️
src/sctools/metrics/aggregator.py 96.05% <100%> (+4.77%) ⬆️
src/sctools/metrics/writer.py 93.33% <0%> (+10%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 941ae54...2afc7ea. Read the comment docs.

@samanehsan
Copy link
Copy Markdown
Contributor Author

samanehsan commented Jan 2, 2019

Thanks for the review @ambrosejcarr! I started breaking out the cell barcode comparison so that record.get_tag(consts.CELL_BARCODE_TAG_KEY) was only in the try/except but then I realized I could use the record.has_tag function instead to check for that tag. Let me know what you think!

@samanehsan samanehsan merged commit 3b1199d into master Jan 3, 2019
@rexwangcc rexwangcc deleted the se-handle-reads-with-no-cb branch January 30, 2019 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants