-
Notifications
You must be signed in to change notification settings - Fork 67
Updated analysis: snv-callers #275
Comments
Note that this was previously also discussed in #30 (comment) |
Hi all, Then I build the mnps using this tool: Then I take those results and run the consensus caller on them: These are all tied together, using this workflow script: I haven't had a chance to detail this out in the README, playing catch up with that! |
Thanks @migbro! That is very close to what I was going to be doing, so that is really helpful. The reason I am asking is really for my second question: Because it seems like if your code is already working well, should we just drop in replace (or supplement) the strelka MAF file with the one you generated that includes MNVs in the data download? That would definitely be the least effort at our end, especially since you have already done most (all?) of the work! Then we would just rerun to make updated consensus files. |
Hmmm, I'll have to get back to you on that one. What I do see, before reannotation, is that the consensus caller seems to keep the genotype and depth information from the first caller of the consensus for each call. So, the format ends up varying. a bit, but you'll see something like:
as an example snippet from the benchmarking set. So, for strelka2 supported mnps, since trying to recalculate DP and GT for strelkla2 would be tricky, I use the information that would essentially be whatever the first base pair of that mnp was. Hopefully this makes sense. I can follow up maybe this weekend or Monday on a what a final vcf/maf would look like, but I imagine it's basically the same with an annotation. I don't think VEP would recalculate depth, but maybe it does! |
I guess I am asking about before the final consensus, as I think we want to keep the callers separate for now. It sounds like you are saying you use the first bp of the MNP in the strelka VCF, which I think is probably good choice. Is that right? What happens with the consensus is always going to be a challenge to settle on. |
Hi @jashapiro , yes, the first base pair, if and when an mnp is built with strelka2, is used to inform read depth and genotype information. As for the final file, the good news is that the vcf2maf script that we use from MSKCC, in additional to annotating with VEP, also seem to "standardize" the formatting of the AD and DP files in the annotated vcf. |
Closed by #279 or should we keep this open for some reason @jashapiro ? |
Mostly closed. The only remaining question is whether to reconstitute MNVs. I would probably close this with #293, and we can file a new issue if MNV reconstruction is desired later. |
Sounds good! |
Are we able to close this issue since the MNV part of the snv caller analysis has been implemented? Are there any un-addressed aspects of this issue? |
Looks like closed with #293 |
What analysis module should be updated and why?
snv-callers: The generation of the SNV consensus
Why should the module be updated?
The current SNV consensus strategy will lose some multinucleotide mutations due to differences between strelka2 and other callers. As noted in https://www.biorxiv.org/content/biorxiv/early/2019/04/30/623702.full.pdf, strelka does not make any multinucleotide (MNV) calls, instead calling as strings of SNVs, whereas other callers make
DNP
,TNP
andONP
calls for di-, tri- and quad+ nucleotide variants, respectively.What changes need to be made? Please provide enough detail for another participant to make the update.
There seem to be two main options:
scripts/01-setup_db.py
. This may be a bit simpler to implement at the outset, but does lose haplotype information.scripts/02-merge_callers.R
, then write a second query to pull just the MNVs from callers that create them, split those up, and look for confirmation in the strelka data. This will make the consensus creation code a bit uglier, but may end up quicker overall, and can also be made to preserve haplotypes more easily.I will be pursuing option 2 to start.
What input data should be used? Which data were used in the version being updated?
v10, when it arrives
When do you expect the revised analysis will be completed?
Early the week of 11/18
Who will complete the updated analysis?
@jashapiro, with assistance as needed from @cansavvy
The text was updated successfully, but these errors were encountered: