New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add genotyping script #234
Comments
This a philosophical difference between the approach taken by GRIDSS and both other callers. GRIDSS itself is a breakpoint ( /single breakend) caller. Critically, it makes no assumptions about the ploidy of the sample. Genotyping a SV requires that a) the ploidy is known, and b) genotyping the SV uder the ploidy assumption is actually meaningful. Except for a subset of simple events, the SV itself results in aneuploidy. When there is aneuploidy, genotyping under a diploid assumption does not make sense, hence why GRIDSS chooses not to do it. That said, GRIDSS reports the most accurate variant allele fraction of any of the short read variant callers. You can genotype SVs by using the
|
Would a small R script that genotyped under diploid assumptions be a generally useful utility? |
Thanks, I think I have got it. But do you have some suggestions about the threshold value of VAF to judge the genotype of SVs? For example, when VAF < 0.5, genotype is 0/1; when VAF > 0.5, genotype is 1/1. And can you give some explanations about how to confirm the threshold value of VAF? |
I think the script is very useful and necessary, because we can compare the genotypes of gridss SVs with the genotypes of the same SVs generated by other SV callers (e.g., manta, lumpy). And we can also use the genotypes for other further analyses. |
How would you like the genotype of of non-diploid regions to be defined? |
For example:
A cut-off at VAF=0.5 is definitely incorrect as that's right in the middle of the heterzygous distribution (at least for indel events).
The threshold to use will come straight out of your model. You should check that it make sense emperically and if not, you'll need to tweak your error model. |
I think we can ignore the non-diploid regions when we genotype the SVs, we can only genotype the SVs in diploid regions. If necessary, we can genotype the SVs in non-diploid regions independently (e.g., the SVs in mitochondrion). |
Thanks for your details about this, I entirely agree with you about using the genotype model of SNV to SVs, and I also think about this yesterday. The VAF=0.5 is just an example to express what I mean, it is definitely incorrect.
I generally understand but writing script to get the genotypes is a little difficult for me. If possible, can you provide a common script about genotyping SVs? Thank you. |
What about a purity-adjusted VAF instead of a genotype for your use case? I guess that it could simply be calculated as |
Thanks for your good suggestions, I have tried to run the latest version 'purple-2.33.jar' several times. But I think the purple pipeline is not suitable for my analysis. Because it can only be used to human, and all my samples are sheep, so do you have other suggestions? |
Coming back to this again: the key determination is the #fragments supporting ref, and the #fragments supporting the variant. For simple deletions we can port SNV logic over but it becomes a bit more complicated for other events. Take a duplication-like event ( Options for implementing are:
|
Hi there, has this not been followed up? Would be beyond useful to have something like this available... |
Unfortunately, this has not progressed in GRIDSS itself. Thus far, our efforts have been focused on complex event reconstruction (e.g. breakage-fusion-bridge) and derivate chromsome reconstruction. Since this also incorporates CNA data, it is done by LINX (https://www.biorxiv.org/content/10.1101/781013v1), which is designed for human tumour data. |
A breakpoint causes a change in copy number at the break junction so the approach of "only genotype the SVs in diploid regions" doesn't work since at least one side of the break junction is not going to be diploid. The only exception to this is if there is a compensatory breakpoint at the same position (e.g. inversion). To me, this request is similar to the ones leading to I know it's not the answer you're looking for, but GRIDSS does include sufficient annotation at each breakpoint that a simple genotyping model can be applied for the simple cases (the most trivial being REF=0 -> homozygous). |
If there's no CN change, it's probably not a deletion so a caller shouldn't call and genotype it as such. VCFv4.4 will include the |
Hello, thanks very much for your previous answers. And I have another small question recently. Why gridss can not generate the genotypes of SVs (i.e., 0/1,1/1), and some other SV callers can give the genotypes, such as manta and lumpy. So I wonder if there are any scripts can help to give the genotypes of SVs (generated by gridss), and I know SVTyper (https://github.com/hall-lab/svtyper/) can give the genotypes of SVs (the ungenotyped SVs and bam file should be provided).
Thank you,
Best wishes.
The text was updated successfully, but these errors were encountered: