Handle KeyError caused by missing "CB" tags when calculating cell metrics#56
Handle KeyError caused by missing "CB" tags when calculating cell metrics#56samanehsan merged 4 commits intomasterfrom
Conversation
If a SAM record does not have a 'CB' tag, do not include it in the count of perfect_cell_barcodes.
ambrosejcarr
left a comment
There was a problem hiding this comment.
Looks good. Made some minor suggestions.
src/sctools/metrics/aggregator.py
Outdated
|
|
||
| self.perfect_cell_barcodes += ( | ||
| record.get_tag(consts.RAW_CELL_BARCODE_TAG_KEY) == record.get_tag(consts.CELL_BARCODE_TAG_KEY)) | ||
| try: |
There was a problem hiding this comment.
Minor suggestion: break up line 459 into two lines, only cover the get_tag call that should trigger the key error with the try: except to ensure we don't also catch all the reads in case we receive data whose tags don't match our other constant for some reason.
src/sctools/test/test_metrics.py
Outdated
| # set the input files | ||
| _gene_sorted_bam = _data_dir + '/small-gene-sorted.bam' | ||
| _cell_sorted_bam = _data_dir + '/small-cell-sorted.bam' | ||
| _cell_sorted_bam_missing_cell_barcodes = _data_dir + '/cell-sorted-missing-cb.bam' |
There was a problem hiding this comment.
would use os.path.join() instead of adding the slash for a bit more generality. Also line 41 below.
src/sctools/test/test_metrics.py
Outdated
| """test the sctools cell metrics CLI invocation""" | ||
| return_call = TenXV2.calculate_cell_metrics( | ||
| args=['-i', _cell_sorted_bam, '-o', _test_dir + '/gene_metrics.csv']) | ||
| args=['-i', _cell_sorted_bam, '-o', _test_dir + '/cell_metrics.csv']) |
| def test_metrics_number_perfect_barcodes(metrics, expected_value): | ||
| """Test that each metric correctly identifies the number of perfect barcodes where CB == CR""" | ||
| def test_metrics_number_perfect_molecule_barcodes(metrics, expected_value): | ||
| """Test that each metric correctly identifies the number of perfect molecule barcodes where UB == UR""" |
Codecov Report
@@ Coverage Diff @@
## master #56 +/- ##
==========================================
+ Coverage 95.67% 96.09% +0.42%
==========================================
Files 27 27
Lines 2473 2486 +13
==========================================
+ Hits 2366 2389 +23
+ Misses 107 97 -10
Continue to review full report at Codecov.
|
|
Thanks for the review @ambrosejcarr! I started breaking out the cell barcode comparison so that |
Purpose
Records with no "CB" tag will fail in the
CellMetrics.parse_extra_fieldsmethod in the latest version (v0.3.1) of sctools. Due to this issue, this version cannot be used by theCalculateCellMetricstask of the Optimus pipeline.Fixes HumanCellAtlas/secondary-analysis#455
Zenhub ticket: https://app.zenhub.com/workspaces/dcp-backlogs-5ac7bcf9465cb172b77760d9/issues/humancellatlas/secondary-analysis/455
Changes
perfect_cell_barcodesReview Instructions
PR Checklist
Follow-up Discussions
pbmc8kdata -- is there something else we could/should use instead?