Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minigraph-cactus VCF has overlapping variants after left align #1460

Open
Han-Cao opened this issue Aug 9, 2024 · 4 comments
Open

Minigraph-cactus VCF has overlapping variants after left align #1460

Han-Cao opened this issue Aug 9, 2024 · 4 comments

Comments

@Han-Cao
Copy link

Han-Cao commented Aug 9, 2024

Hi,

In current MC pipeline, the default output VCF (i.e., after vcfbub and bcftools norm) may have overlapping variants due to left align.

For example:

In mc.raw.vcf.gz, there is no overlap between the below 2 variants

chr1    702357  >31615734>31615730      AGA     AAA,A   60      .      AT=>31615734>31615732>31615731>31615730,>31615734<31615733>31615731>31615730,>31615734>31615730
chr1    702369  >31615730>31615728      A       AA      60      .      AT=>31615730>31615728,>31615730<31615729>31615728

After running bcftools norm -f GRCh38.fa, the output variants are overlapping:

chr1    702356  >31615734>31615730      AAG     AAA,A   60      .      AT=>31615734>31615732>31615731>31615730,>31615734<31615733>31615731>31615730,>31615734>31615730
chr1    702358  >31615730>31615728      G       GA      60      .      AT=>31615730>31615728,>31615730<31615729>31615728

Do you think these 2 records should be merged into one multiallelic record? Moreover, as overlapping variants are not accepted by some tools (e.g., pangenie), this could be an issue for downstream analysis.

@glennhickey
Copy link
Collaborator

I see what you mean. Is there a bcftools command (or other tool) that can do this?

@Han-Cao
Copy link
Author

Han-Cao commented Aug 16, 2024

I tried bcftools norm -m +any, but it cannot merge these records (v1.20). I am not sure whether other tool can do it.

What I currently do is skipping bcftools norm -f to generate a non-overlapping multiallelic VCF (like old cactus) for tools require non-overlapping input.

@minglibio
Copy link

Same issue here @glennhickey . The variants are even in the same position.

1       62173   >4733>4736      G       A       60      .       AC=3;AF=0.3;AN=10;AT=>4733>4734>4736,>4733<4735>4736;NS=8;LV=0  GT      0|.     0|.     .|.     1|.     .       .|.     .|.     .|.     .|.     0|0     .|.     1|1     .|.     0       .       0       0|.
1       62173   >4736>4767      GTGGTGGCACAGCCGTGATCACAGACTCAGGGTGATGTGGGTCCCCATGGTGGCACAGCCGTGACCATGACCTCAGGGTGACGTGGGTCCCCA   G,GTGGTGGCACAGCCGTGACCATGACCTCAGGGTGCTATGGGTCCCCATGGTGACACCACCACGACAACCACGGCCTCAGGGTGACGTGGGTCCCCA      60      .       AC=7,2;AF=0.7,0.2;AN=10;AT=>4736>4737>4738>4740>4741>4743>4744>4746>4747>4749>4750>4752>4753>4755>4756>4758>4760>4761>4763>4764>4766>4767,>4736>4767,>4736>4737<4739>4740<4742>4743<4745>4746<4748>4749<4751>4752<4754>4755<4757>4758<4759>4760<4762>4763<4765>4766>4767;NS=8;LV=0    GT      0|.           1|.     .|.     1|.     .       .|.     .|.     .|.     .|.     1|1     .|.     2|2     .|.     1       .       1       1|.

@glennhickey
Copy link
Collaborator

There's a tool in vcflib, vcfcreatemulti that seems to be able to merge overlapping variants. It's included in the Cactus docker release (but not binary release). I'd be curious to know how that works out. In the meantime, I'm leaning towards disabling the normalizer by default in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants