Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HaplotypeCaller consult for Laura #4561

Open
sooheelee opened this issue Mar 22, 2018 · 4 comments

Comments

@sooheelee
Copy link
Contributor

commented Mar 22, 2018

For @ldgauthier, upon her return from the Montreal workshop. We would like to know if this is expected behavior from HaplotypeCaller.

Researcher has uploaded read-level data to our FTP site. I've recapitulated their results with GATK4 in a dsde-docs issue ticket at https://github.com/broadinstitute/dsde-docs/issues/3008. Data may be private so please follow up in the dsde-docs repo.

@ldgauthier

This comment has been minimized.

Copy link
Contributor

commented Mar 27, 2018

We were still making changes to the assembly in versions 3.2 and 3.3. For example: broadinstitute/gsa-unstable#582 Nothing is popping out at me as being the breaking change after 3.3 though.

I think the problem is that there are too many haplotypes in that region. There are at least 8 plausible variants, which makes for ~256 haplotypes. We pick the "best" 128 to evaluate likelihoods against. Here it seems that what we're choosing as the best don't include the SNP. But actually it's not even in the graph.
image
(The 280 ref vs 211 split is the het SNP at 89,100,730 so the missing variant should be split out of the big reference string above but it's not)

The raw graph has the variant on a dangling head (I highlighted the base in the middle path in the figure of the raw_readthreading_graph), but it must not be merged back in properly.
image

I wonder if that PR above was the one that changed things. Maybe @vruano will take a look?

@ldgauthier ldgauthier assigned vruano and unassigned ldgauthier Mar 27, 2018

@chandrans

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2018

Thank you Laura and Valentin for looking into this. @ldgauthier @vruano

@vruano

This comment has been minimized.

Copy link
Contributor

commented Apr 3, 2018

@ldgauthier do you still have the full image for the raw graph around? Is it possible for you to post it without make it blow up the screen (I guess there might be a markdown option to chose the disply size.

One thing that stops dangling head/tails from being merged are furcations from the point they merge into the reference path. So for example if the middle chain containing the SNP and the right chain merge first before merging into the left/red reference chain then that would prevent the merging of either of the two non-reference branches.

@ldgauthier

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2018

Yep, that looks like exactly what happens.
image
(If that's not enough context, I can email you the .dot files -- github won't let me attach them)

I've never understood what it means when the whole branch has weight 1/1 with dashed arrow, like for the rightmost path. Is that just to show that that will get pruned eventually?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.