-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fan et al. 2022 #97
Fan et al. 2022 #97
Conversation
rule get_hg37_coords: | ||
input: | ||
bed38="marker38.bed", | ||
chain="hg38ToHg19.over.chain.gz", | ||
output: | ||
bed37="marker37.bed", | ||
bedunmapped="marker-unmapped.bed", | ||
shell: | ||
""" | ||
liftOver {input} {output} | ||
[ ! -s {output.bedunmapped} ] | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of the SNPs/InDels don't have RSIDs, so positions for GRCh37 and GRCh38 must be provided explicitly. The authors provide only GRCh38 positions, so I whipped out UCSC's 38 --> 37 liftOver chain file for that conversion. Yay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh fun!!
data.sort_values('Marker').to_csv(output[0], sep='\t', index=False) | ||
data.sort_values(['Marker', 'VariantIndex']).to_csv(output[0], sep='\t', index=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a slight change here...
mh06PK-24844 2 G GA | ||
mh06FHL-001 2 C CATT | ||
mh06FHL-001 7 A AC | ||
mh06FHL-002 0 AT A,GT | ||
mh06FHL-002 14 TG T | ||
mh06FHL-002 25 C CA | ||
mh06FHL-002 26 A AG | ||
mh06PK-24844 1 C CT | ||
mh06PK-24844 2 G GA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...to enforce a consistent order here.
This looks like a Monday morning job... :) |
No rush! |
On second thought, please hold off on this. There's another massive PR I'd like to resolve first... |
Ok this is ready for review @rnmitchell! |
Looks good! |
This branch adds a new marker collection from Fan et al. 2022. All 22 of these markers include SNPs with no RSIDs, so population frequency estimates based on 1000 genomes haplotypes cannot be determined.
Closes #93. Closes #95.