Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a issue in Ivar - mpileup #14

Closed
Alema91 opened this issue Jan 17, 2022 · 4 comments
Closed

Create a issue in Ivar - mpileup #14

Alema91 opened this issue Jan 17, 2022 · 4 comments
Assignees

Comments

@Alema91
Copy link

Alema91 commented Jan 17, 2022

No description provided.

@Alema91 Alema91 self-assigned this Jan 17, 2022
@Alema91
Copy link
Author

Alema91 commented Jan 20, 2022

In progress. Tomorrow, I would like to move forward in git and advance in this issue.

@Alema91
Copy link
Author

Alema91 commented Feb 7, 2022

Describe the bug

We have observed some strange variants detection in ivar variants for SARS-CoV-2 genomes analysis. Some of them are found in paired-end overlapping positions, giving rise to the question on how does ivar deal with overlapping variants.

Here is a variant example called with VarScan from a private genome:

CHROM POS REF ALT GENE EFFECT DP REF_DP ALT_DP AF sample software lineage
NC_045512.2 25793 G A ORF3a missense_variant 11 0 11 1,000 217066 VarScan B.1.1.7

In this example, this variant was not called by ivar variants and therefore it could not be found in the corresponding vcf file. Moreover, we continued to look into the reasons why ivar variants was not saving this variant. Thus, we decided to check the mpileup and we found the following:

mpileupissue_img1
Image 1. IGV variant rendering.

issuempileup_img2
Image 2. Mpileup file.

As a result, we detected that, the variant is not passing the quality control of each pb (image 2). In addition to true low quality bases, mpileup and ivar variants are considering overlapping segments as low quality bases too. This is explained by #1146 Valeriuo comment.

To sum up, ivar variants is calling variants differently when -x param is used or not.

To reproduce

# Disable overlap
samtools mpileup \\
        -a \\
        --count-orphans \\
        --no-BAQ \\
        --ignore-overlaps \\
        -x \\
        --max-depth  \\
        --fasta-ref fasta \\
        --min-BQ $params.min_base_qual \\
        bam | ivar variants -q 0.25 -t 0.75 -m 20 -r fasta features -p sample

# Enable overlap
samtools mpileup \\
        -a \\
        --count-orphans \\
        --no-BAQ \\
        --ignore-overlaps \\
        --max-depth  \\
        --fasta-ref fasta \\
        --min-BQ $params.min_base_qual \\
        bam | ivar variants -q 0.25 -t 0.75 -m 20 -r fasta features -p sample

Expected behavior

We might suggest that ivar variants and mpileup overestimate and infraestimate the quality of overlapping reads. This issue may be related to #97, #100, #104, #113.

@Alema91
Copy link
Author

Alema91 commented Apr 4, 2022

Issue done #125

@Alema91 Alema91 closed this as completed Apr 4, 2022
@svarona
Copy link
Member

svarona commented Apr 5, 2022

Describe the bug

We have observed that when using pair-end reads, if R1 and R2 from same pair overlap in a position, quality assignment in that position for R2 is decreased to minimum quality value in phred33 (!), while R1 is increased to maximum quality in phred64 i, as you can see in the image below:

issuempileup_img2
Image 1. Mpileup file.

This behaviour is explained by #1146 Valeriuo comment.

This creates problems in downstream analyses because we are mixing phred33 and phred64 qualities. In our case we are seeing failures when calling for variants with ivar variants for SARS-CoV-2 genomes analysis, because ivar is not considering phred64 nucleotides, decreasing position's depth.

To reproduce

# Enable read-pair overlap detection
samtools mpileup \\
        -a \\
        --count-orphans \\
        --no-BAQ \\
        --max-depth  \\
        --fasta-ref fasta \\
        --min-BQ $params.min_base_qual \\
        bam | ivar variants -q 0.25 -t 0.75 -m 20 -r fasta features -p sample

Temporal workaround

# Disable read-pair overlap detection
samtools mpileup \\
        -a \\
        --count-orphans \\
        --no-BAQ \\
        --ignore-overlaps \\
        --max-depth  \\
        --fasta-ref fasta \\
        --min-BQ $params.min_base_qual \\
        bam | ivar variants -q 0.25 -t 0.75 -m 20 -r fasta features -p sample

Expected behavior

We suggest that mpileup should assign the highest or lowest quality in the same phred encoding.

This issue may be related to #97, #100, #104, #113.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants