-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kelly's super cool java program for comparing variant sets #21
Conversation
pull in julie's latest changes
off by one error in writing normalize variants
…ilable in maven central repo
don't check allele match when evaluating SVs at exact breakpoint
clearer error message for variant pos collisions
minor fix to normalization slide display
pull in julie's latest changes
pull in julie's latest changes
@@ -0,0 +1,5 @@ | |||
eclipse.preferences.version=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calldiff/.settings, .project and .classpath should probably maybe all be git ignored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dunno. If you are using Eclipse, then having these already present makes life easier. If you aren't using Eclipse, then they won't hurt you....
Kelly's super cool java program for comparing variant sets
Hi Julie,
Per our previous discussion, here is my code i've written that diffs callsets. Please feel free to comment on any of the code that doesn't make sense. Here is a list of known deltas against your python code:
(1) This code is able to consume variant calls from both a VCF file and the GA4GH variants API.
(2) This code doesn't do any left-normalization or pre-rescue evaluation. (Per our discussion last friday, there exist cases that this code would fail to identify equivalent variation due to lack of normalization)
(3) This code explores all subsets of nonoverlapping variants during rescue and chooses the pair that maximizes the sum of cardinalities of the subsets of calls it rescues.
(4) This code generates all possible haplotypes during rescue (consequently, it also knows how to take advantage of the genotype information in conjunction with multiple alts)
(5) This code exploits phaseset information to reduce the set of possible haplotypes it needs to generate.
(6) This code doesn't distinguish SVs from indels like the python code does (per our discussion from friday, we acknowledge this is definitely something that needs to be done)
(7) This code only reports precision/recall - it doesn't produce any annotated VCF output.
(8) This algorithm is symmetric with respect to its inputs.