Skip to content

Conversation

@ch-kr
Copy link
Contributor

@ch-kr ch-kr commented Jul 22, 2020

Added constants and some functions that are used during MT/HT>VCF export

@ch-kr ch-kr requested a review from jkgoodrich July 23, 2020 21:08
Copy link
Contributor

@jkgoodrich jkgoodrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all your hard work on this KC! I have some suggested changes and comments.

return header_hist_dict


def sample_sum_check(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something we want to check on the internal HT sanity checks right? I think it should also be moved somewhere else like the other function I mentioned above, but again not positive where to put it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this could go into an assessment/sanity_checks.py type file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this should move to keep this script VCF focused and assessment/sanity_checks.py sounds good to me for this and the one above it

@ch-kr ch-kr requested a review from jkgoodrich July 29, 2020 13:57
Copy link
Contributor

@gtiao gtiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The examples in the docstrings are great -- having examples is clarifying in a way that describing things often is not!

]

if faf:
female_metrics.extend([x for x in female_metrics if "faf" in x])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this line: it looks like you're extending the existing list female_metrics by faf elements that are already in the list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

faf elements aren't in the list -- female_metrics first pulls all annotations that contain _female and then corrects that list to keep only elements with both _female and AC, AN, or nhomalt. the faf annotations (faf95_female) don't contain AC, AN, or nhomalt, so they aren't kept in that step.

that said, why do we set AC, AN, and nhomalt to 0 but leave AF? I'm going to change this to just be female_metrics = [x for x in metrics if "_female" in x] if that works with you and @jkgoodrich

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, interesting that AF is not set to 0. I will defer to @gtiao for this one, there may have been motivation for this that I am not aware of.

Copy link
Contributor

@jkgoodrich jkgoodrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more smaller changes requested, but it's looking really good. Thank you @ch-kr!

return header_hist_dict


def sample_sum_check(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this should move to keep this script VCF focused and assessment/sanity_checks.py sounds good to me for this and the one above it

ht_orig.select(*display_fields).show()


def make_filters_sanity_check_expr(ht: hl.Table) -> Dict[str, hl.expr.Expression]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, let's make an assessment/sanity_checks.py

]

if faf:
female_metrics.extend([x for x in female_metrics if "faf" in x])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, interesting that AF is not set to 0. I will defer to @gtiao for this one, there may have been motivation for this that I am not aware of.

Copy link
Contributor

@jkgoodrich jkgoodrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all of this @ch-kr! I only had one small comment (that has more to do with UKBB than the common code actually). Then I think there might have been a comment/s where @gtiao was tagged for opinion, but otherwise this looks good to me.

Copy link
Contributor

@gtiao gtiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! So excited to have all this beautiful, shiny, clean code merged into the common repo!

combo_fields = combo.split("_")
group_dict = dict(zip(group_types, combo_fields))

for_combo = make_combo_header_text("for", group_dict, prefix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea for streamlining the code a bit!

f"{prefix}hom": "|".join(
map(lambda x: f"{x:.1f}", ht.head(1).age_hist_hom.collect()[0].bin_edges)
),
f"{prefix}{call_type}": "|".join(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nice consolidation of code

Displays the number of rows in the Table that match the `cond_expr` and fail to be the desired condition (`check_description`).
If the number of rows that match the `cond_expr` is 0, then the Table passes that check; otherwise, it fails.
.. note::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this note -- it's probably something easily missed when setting up the generic field check, so it's great to have this additional reminder

@gtiao gtiao merged commit d15c4d1 into master Aug 4, 2020
@gtiao gtiao deleted the vcf_export branch August 4, 2020 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants