Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial subset script for v3 #48
Initial subset script for v3 #48
Changes from 12 commits
bfe6bda
7cc11b4
48655d8
3dec95b
84dd46d
7d90cc1
2b18a06
bfd58a6
2544004
887351b
43f388e
b7e3c79
c413b93
ef72c96
d8124de
d9e48e7
73ade62
85f326f
9e3aa37
255ad04
5a38e05
3681815
399c682
b0aa68a
1e83259
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allele-specifc -> Allele-specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't say this anymore, but there are more below. Must be copy paste error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is meant by
including ref, RAW format
? Might help to elaborate. Also what is meant by (informative)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, definition was taken from https://github.com/broadinstitute/gatk/blob/master/src/test/resources/org/broadinstitute/hellbender/tools/walkers/GnarlyGenotyper/twoSampleASDB.vcf so I'm not entirely sure what informative actually means here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently @ch-kr has this in the common code
"Description": "Depth over variant genotypes (does not include depth of reference samples)"
I think it is fine to keep as is for now for this callset (gnomad v3.0 currently looks like there is no description), but we should decide on the best explanation and be consistent within our callsets.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree -- gnomad_methods should be the truth data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave this based off of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above about (informative)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add newline after
"""
and another before:param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this comment a TODO or add the warning in now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add newlane after
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this
20
a parameter and add20
as a defaultThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add this comment higher up and let's make it a TODO so we don't forget to do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename
release_only
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I wonder if it is better to just hardcode sparse as True because we know that
get_gnomad_v3_mt
is sparse and this script isn't currently generalized to take other datasets. You can add a Note explaining that knowledge of sparse vs dense this is very important to the downstream steps and we can modify when we generalize this more.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment for why this is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a warning or error if pop exists, but is not in
GENOME_POPS
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added choices to the pop arg which errors out when the specified pop is not in GENOME_POPS as soon as the script is first started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is great, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe change
add_meta
toexport_meta
because add_meta sounds like it will somehow add this to the VCF exportThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, I think you can remove the sparse argument because you already predefine using
get_gnomad_v3_mt
and this script is still specific to gnomAD v3.0 (though I do know for future subsets we will want to generalize this more)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this matters at all, but just a note that
hl.experimental.sparse_split_multi
doesn't filter out*
alleles. I'm not sure if that will be a problem or not, but it was something we ran into in UKBBThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if these were originally filtered out but I havent had issues with the two previous subsets I ran
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment about why this step is needed (just because it is specific to a sparse dataset)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will only be for a sparse MT (or densified sparse) so if you decide to keep the sparse argument, this needs to be under the if sparse block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above to specify what the
post-filtering
refers toThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also same comment as above, will you need to specify
Number
for any of these?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added in the adj and raw callstats for subset as well as updated the header dict to include number where necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally just a question for my knowledge. Why import here rather than at the top of the script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To not crowd the namespace with unnecessary variables/methods but I'm not sure it's really necessary since this is not yet generalized for wider use. Im going to move it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specify this should be a text file (I am assuming). I got confused at first and expected it to be a path to a HT