Fixes to variant qc evaluation #288

jkgoodrich · 2021-01-28T23:25:52Z

First pass at a PR to fix variant QC evaluation bugs found by Grace and revert back to ranking then binning instead of using approx quantile.

…in method instead of approx quantile for binning

…hods into fixes_to_variant_qc_evaluation

…in method instead of approx quantile for binning

jkgoodrich · 2021-01-28T23:41:17Z

The current PR replaces the quantile approx functionality, but I am also open to adding it in as an option with a warning about unequal binning when there are duplicate boundaries

gtiao

Overall this is much easier and simpler to read -- thanks for putting this in so quickly!

gnomad/variant_qc/evaluation.py

gtiao · 2021-01-29T17:09:03Z

gnomad/variant_qc/evaluation.py

+                    (
+                        n_bins
+                        * (
+                            bin_ht[f"{bin_id}_rank"]


There are actually 101 bins the way the code is written, because the largest rank (say it's 1002 in a given grouping) will give a value of 1 * n_bins = 100 (because the rank 1002 will be divided by the total number of variants in the grouping, 1002). Basically, you're starting a new bin for the single final variant in each grouping.

I think if you want to only have 100 bins, you'll need to subtract 1 off the {bin_id}_rank in line 121 before dividing by hl.float64(bin_ht.bin_group_variant_counts[bin_id] on line 122. This will handle the other end (the first variant in the grouping) fine, because hl.floor() will still return zero for the first variant.

I'm not sure I agree, the rank is actually 0 to (n_variants - 1)

Notebook test:
cleanup_new_binning_function.html.zip

Can you show what the {bin_id}_rank is in the notebook, and not just the final bin assignment? I agree if the rank is 0 to (n_variants - 1) then the code as written is fine (basically the ranking already has 1 subtracted off it). It's just that I assumed hl.scan.count_where(bin_ht[f"_filter_{bin_id}"]) would be an actual count, which starts a 1 and not 0.

I thought it was too at first, but got the wrong binning when I tested so changed it

Thanks! I wonder if the behavior changes when it's a scan -- I know the sum mechanism for hl.scan() doesn't count the current row, so maybe that's what's happening here.

In any case, the code is working as intended, so no changes required.

gnomad/variant_qc/evaluation.py

…till

gtiao

Adjustments all look good, just want to understand how the ranking code works

gtiao · 2021-01-29T21:04:43Z

gnomad/variant_qc/evaluation.py

+                    (
+                        n_bins
+                        * (
+                            bin_ht[f"{bin_id}_rank"]


Can you show what the {bin_id}_rank is in the notebook, and not just the final bin assignment? I agree if the rank is 0 to (n_variants - 1) then the code as written is fine (basically the ranking already has 1 subtracted off it). It's just that I assumed hl.scan.count_where(bin_ht[f"_filter_{bin_id}"]) would be an actual count, which starts a 1 and not 0.

jkgoodrich added 5 commits January 28, 2021 16:15

Address bugs Grace identified in variant QC evaluation and use rank/b…

e009098

…in method instead of approx quantile for binning

Merge branch 'master' of https://github.com/broadinstitute/gnomad_met…

86695b5

…hods into fixes_to_variant_qc_evaluation

Address bugs Grace identified in variant QC evaluation and use rank/b…

f8cc45d

…in method instead of approx quantile for binning

Address bugs Grace identified in variant QC evaluation and use rank/b…

b55183d

…in method instead of approx quantile for binning

remove unused import

dd73e24

jkgoodrich requested a review from gtiao January 28, 2021 23:25

jkgoodrich added 2 commits January 28, 2021 16:37

Reformat with black

60fd8cc

docstring fix

0e14ab8

gtiao suggested changes Jan 29, 2021

View reviewed changes

jkgoodrich added 2 commits January 29, 2021 11:27

Address review comments

2c331fe

Add to changelog

c17b5d2

jkgoodrich requested a review from gtiao January 29, 2021 18:31

Realized there were a few uses of quantile in the docs and comments s…

29631af

…till

gtiao reviewed Jan 29, 2021

View reviewed changes

gtiao approved these changes Jan 29, 2021

View reviewed changes

gtiao merged commit 8b5bf71 into master Jan 29, 2021

gtiao deleted the fixes_to_variant_qc_evaluation branch January 29, 2021 21:56

Fixes to variant qc evaluation #288

Fixes to variant qc evaluation #288

Uh oh!

Conversation

jkgoodrich commented Jan 28, 2021

Uh oh!

jkgoodrich commented Jan 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gtiao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gtiao Jan 29, 2021

Choose a reason for hiding this comment

Uh oh!

jkgoodrich Jan 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gtiao Jan 29, 2021

Choose a reason for hiding this comment

Uh oh!

jkgoodrich Jan 29, 2021

Choose a reason for hiding this comment

Uh oh!

jkgoodrich Jan 29, 2021

Choose a reason for hiding this comment

Uh oh!

gtiao Jan 29, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gtiao left a comment

Choose a reason for hiding this comment

Uh oh!

gtiao Jan 29, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jkgoodrich commented Jan 28, 2021 •

edited

Loading

jkgoodrich Jan 29, 2021 •

edited

Loading