Added functionality to compute coverage stats from sparse MT (needed … #173

lfrancioli · 2020-02-26T20:37:51Z

…e.g. for browser)

ch-kr

looks good to me, just a few comments

ch-kr · 2020-03-02T18:13:56Z

gnomad_hail/utils/generic.py

+
+    context = None
+    for contig in contigs:
+        _context = hl.utils.range_table(ref.contig_length(contig), n_partitions=int(ref.contig_length(contig) / 5000000))


why did you choose 5000000 for this?

no specific reason really..seemed reasonable given that for each row there's very little data. But happy to reconsider if you have another suggestion!

makes sense. what about adding an n_partitions arg that is set to 50 by default, since len(chromosome 1) / 5000000 should be ~50? no strong feelings about this though.

gnomad_hail/utils/sparse_mt.py

ch-kr · 2020-03-02T19:07:46Z

gnomad_hail/utils/sparse_mt.py

+    col_key_fields = list(mt.col_key)
+    t = mt._localize_entries('__entries', '__cols')
+    t = t.join(reference_ht.annotate(_in_ref=True), how='outer')
+    t = t.annotate(__entries=hl.or_else(t.__entries, hl.range(n_samples).map(lambda x: hl.null(t.__entries.dtype.element_type))))


why is this annotate necessary?

So, what __localize does is to take the entries and represent them as an array, __unlocalize does the opposite.
Now after the outer join, __entries at rows that appear in reference_ht and not in mt have the value NA. This is fine in the Table representation for t, but in the next line we call t._unlocalize_entries('__entries', '__cols', col_key_fields), which will again spread the information in __entries array into the new MatrixTable entries. In the MatrixTable representation, an entry can be NA, but the entry array cannot, so it requires __entries to be an array of the same length as the number of samples.
For this reason, we need to first annotate __entries to be an array of the right length with NA for each element.

lfrancioli · 2020-03-03T14:53:10Z

Back to you @ch-kr

ch-kr

looks good to me, I added a note about a potential change for n_partitions but don't have a strong preference for or against the change

ch-kr · 2020-03-04T20:06:55Z

gnomad_hail/utils/sparse_mt.py

+    col_key_fields = list(mt.col_key)
+    t = mt._localize_entries('__entries', '__cols')
+    t = t.join(reference_ht.annotate(_in_ref=True), how='outer')
+    t = t.annotate(__entries=hl.or_else(t.__entries, hl.range(n_samples).map(lambda x: hl.null(t.__entries.dtype.element_type))))


ch-kr · 2020-03-04T20:11:28Z

gnomad_hail/utils/generic.py

+
+    context = None
+    for contig in contigs:
+        _context = hl.utils.range_table(ref.contig_length(contig), n_partitions=int(ref.contig_length(contig) / 5000000))


makes sense. what about adding an n_partitions arg that is set to 50 by default, since len(chromosome 1) / 5000000 should be ~50? no strong feelings about this though.

…e.g. for browser)

lfrancioli requested review from ch-kr and jkgoodrich February 26, 2020 20:38

ch-kr reviewed Mar 2, 2020

View reviewed changes

ch-kr approved these changes Mar 4, 2020

View reviewed changes

lfrancioli added 2 commits March 5, 2020 14:06

Added functionality to compute coverage stats from sparse MT (needed …

3d0c441

…e.g. for browser)

Addressed review comments (docstring)

8ba026a

lfrancioli force-pushed the sparse_mt_coverage branch from c6ac709 to 8ba026a Compare March 5, 2020 19:06

lfrancioli merged commit c4cc23d into master Mar 5, 2020

lfrancioli deleted the sparse_mt_coverage branch March 5, 2020 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added functionality to compute coverage stats from sparse MT (needed … #173

Added functionality to compute coverage stats from sparse MT (needed … #173

Uh oh!

lfrancioli commented Feb 26, 2020

Uh oh!

ch-kr left a comment

Uh oh!

ch-kr Mar 2, 2020

Uh oh!

lfrancioli Mar 3, 2020

Uh oh!

ch-kr Mar 4, 2020

Uh oh!

Uh oh!

ch-kr Mar 2, 2020

Uh oh!

lfrancioli Mar 3, 2020

Uh oh!

ch-kr Mar 4, 2020

Uh oh!

lfrancioli commented Mar 3, 2020

Uh oh!

ch-kr left a comment

Uh oh!

ch-kr Mar 4, 2020

Uh oh!

ch-kr Mar 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Added functionality to compute coverage stats from sparse MT (needed … #173

Added functionality to compute coverage stats from sparse MT (needed … #173

Uh oh!

Conversation

lfrancioli commented Feb 26, 2020

Uh oh!

ch-kr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lfrancioli commented Mar 3, 2020

Uh oh!

ch-kr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants