-
Notifications
You must be signed in to change notification settings - Fork 31
Repo reorganization #207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repo reorganization #207
Conversation
nawatts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 based on the spreadsheet.
For the __init__.py files, we might consider importing the submodules themselves instead of everything in them to avoid having so many things all in the same namespace. Or using __all__ to define what's exported.
https://docs.python.org/3/tutorial/modules.html#importing-from-a-package
- removed field_utils.py (functions were moved elsewhere) - moved vcf-related functionalities to vcf.py (I envision this will become the place for all VCF needs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a bunch @mike-w-wilson for getting this going, looks great!
A few things:
- I committed a few changes myself -- please take a look at that last commit
- I'd like to move pop colors to
ancestry.py; thoughts? Alsoancestry.pyvspopulations.py? - How about moving
random_forest.pretty_print_runsto the gnomad_qc repo? Seems very specialized and possibly not used in other projects. @jkgoodrich, did you use it for UKBB ? - should we rename
rf.pytorandom_forest.py? it'll break some code and I'm unsure it really makes a difference But happy to adapt if consensus is that it's desirable.
mike-w-wilson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Just a couple comments but this looks good to me. The moves from field_utils to annotations make sense.
- Putting pop colors in ancestry is a good idea and seems like it would be used. I don't have strong feelings on
ancestry.pyvs.populations.py. I think we discussed this in one meeting but I'll let @jkgoodrich and @gtiao weigh in. - Fine with me if it is okay with Julia.
- I went back and forth on this. I spelt it out for readability in the sphinx docs but went against my own reasoning in keeping
ld.pythe same....so again no strong feelings here. I guess most people using this package will know what rf stands for.
…utput to expression in/output
|
Regarding @nawatts 's comment on |
|
I like importing the submodules too |
… functionality hadn't found a broader use and only adds 2 lines of code, best to inline it.
jkgoodrich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, thank you Mike so much for doing this. I pretty much just confirmed that the functions were moved where expected, imports updated (thanks for redoing the import organization!) and looked at Laurent’s changes.
-
Laurent, thanks for moving/removing the field_utils I feel like we weren’t sure about them and this makes more sense.
-
For ancestry.py vs. populations.py. I still lean towards ancestry for some reason, but I am not sure why since we refer to everything in there with pops or populations so that might be better.
-
I do currently use random_forest.pretty_print_runs in UKBB actually.
-
I personally like random_forest.py better, but fine with either.
| ) | ||
|
|
||
|
|
||
| def bi_allelic_expr(t: Union[hl.Table, hl.MatrixTable]) -> hl.expr.BooleanExpression: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, this makes sense to move here from field utils, thanks for moving
| logger = logging.getLogger(__name__) | ||
| logger.setLevel(logging.INFO) | ||
|
|
||
| ANNOTATIONS_HISTS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a little odd here since it ends up being more specific to the given QC pipeline (granted not for all) @lfrancioli what do you think?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm fine with having this here as in my experience these values are good starting points at least and the different metrics are pretty standard. Totally open to suggestions though -- did you have another module in mind?
I think adding a comment about this is would be nice (but I can see this being outside the scope of this PR and part of another effort to document constants)
gnomad/utils/file_utils.py
Outdated
| v.dtype in {hl.tstr, hl.tint32, hl.tfloat32, hl.tint64, hl.tfloat64, hl.tbool}}) | ||
|
|
||
|
|
||
| def ht_to_vcf_mt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess it does make since to move this to a utils.vcf because it is specific to writing to a vcf and since we are planning to add more to the vcf module at somepoint. Good catch
gnomad/utils/vcf.py
Outdated
|
|
||
| # Create an MT with no cols so that we acn export to VCF | ||
| info_mt = info_ht.to_matrix_table_row_major(columns=['s'], entry_field_name='s') | ||
| return info_mt.filter_cols(False) No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like github is complaining about no newline at end of file
|
Thanks for weighing in @jkgoodrich ! A couple of things based on your comments: So I guess what I'm saying is that the current state of things is good with me ;) |
|
@laurent, one last thing before I run Black and merge! I've moved pop_colors to ancestry. I left all pops and subpops in the dictionary and kept the lower case pop names. I thought someone had mentioned switching to uppercase, which would match the pop_names dictionary, but I could not find it in slack..I'm fine either way but do think matching the pop_names keys would make sense, though I know this goes against history. |
|
This is awesome @mike-w-wilson So I think we should go all lowercase. Also, could you update the names with the more complete ones (and double check the colors match) those here: Note that the Amish isn't in the R code as it's new for v3, so needs to be added. If you see anything else that's interesting to have w.r.t. pops in these constants please transfer over too! |
|
@laurent, added the complete pop names and pop colors but removed any unused pops with clashes. |
lfrancioli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done!
faa5345 to
9cb2fa2
Compare
Summary
This PR is pretty massive but since it is a reorganization of the entire repo, I couldn't think of another way to do it. This organization was drafted and previously discussed here:
https://docs.google.com/spreadsheets/d/1pVpru0acCCr1AvSRyf89X-LxPKhwK-mYCBtZvICl9fQ/edit#gid=0
The changes include import statements, loggers, and where functions live. Six functions have also been removed as they are built into hail or unnecessary for other reasons. They are:
gnomad.utils.generic.reverse_complement_bases(bases)
gnomad.utils.generic.get_sample_data(mt, …)
gnomad.utils.generic.phase_by_transmission(…)
gnomad.utils.generic.phase_trio_matrix_by_transmission(tm)
gnomad.utils.generic.explode_trio_matrix(tm)
gnomad.utils.generic.sample_pcs_uniformly(…)
gnomad.utils.liftover.main(args)
Once this is approved, I can run Black.