Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not detecting mutations which can be seen in IGV #2

Open
mbdabrowska1 opened this issue Apr 29, 2024 · 1 comment
Open

Not detecting mutations which can be seen in IGV #2

mbdabrowska1 opened this issue Apr 29, 2024 · 1 comment

Comments

@mbdabrowska1
Copy link

Hi, I have an issue with ClusterV sometimes not detecting mutations that can be seen in IGV and are present at a relatively high allele frequency. In the following examples you can see that the mutation is clearly visible in IGV, but in the ClusterV report it doesn't seem to be called:

BARCODE19

RT:V106I mutation expected (GTA -> ATA at nucleotide 2411, NC_001802.1 reference)

Screenshot from 2024-04-29 10-06-07

And the corresponding report:
Screenshot from 2024-04-29 10-20-09

The coverage around the region isn't very high. Could this be causing the issue? I feel like it should still be seen in consensus unless I'm misunderstanding how Flye assembles the fragment.
Screenshot from 2024-04-29 10-10-18

BARCODE25

RT:E138A mutation expected (GAG -> GCG mutation at nucleotide 2508, NC_001802.1 reference)

Screenshot from 2024-04-29 10-15-26

Corresponding report:
Screenshot from 2024-04-29 10-20-45

And coverage:
Screenshot from 2024-04-29 10-16-12

BARCODE43 - separate run

RT:V106I mutation expected (GTA -> ATA at nucleotide 2411, NC_001802.1 reference):

Screenshot from 2024-04-29 10-18-54

Report:
Screenshot from 2024-04-29 10-21-17

Coverage:
Screenshot from 2024-04-29 10-19-12

Any help with this would be greatly appreciated! Please let me know if you require the original files as I'm happy to share those via email.

@sujunhao
Copy link
Collaborator

sujunhao commented May 14, 2024

Hi,
The missing variants with high depth from output may have multiple causes.

It may be from (1) the missing calling from the variant caller, Clair-Ensemble model trained at Guppy5 data in ClusterV; (2) the read with variants are filtered, the original bam filtering reads with large indel are filtered in ClusterV, and the filtering process may filter read with your mentioned variants. the filtered file is in [YOUR INPUT FILE NAME]_f.bam.

For issue (2), adjusting the filtering setting in --indel_l may solve the issue.
For issue (1), we have extensively tested ClusterV to avoid this situation happening, however, when using data in different chemistry or from different basecalling from ONT data, the problem may exist. In this case, we need time and effort to evaluate and further adjust our variant calling model.

In case the adjustment of filtering does not solve the problem, Could you please share your files with me for further testing on my side? You can send it to my email, jhsu@connect.hku.hk, if needed.

Regards,
JH

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants