Pangoline Lineage calling despite "N" at defining SNP position #492

MantaRay87 · 2022-10-17T13:01:43Z

Not even sure if pangolin itself is the right repository to mention this issue/question or if it has been explained before.

While using some of our Illumina sequence data to compare it with our ONT sequences I noticed the following:

with pangolin-data v1.14 the consensus files were called as BA.5.2. But with the new pangolin-data v1.15.1 they are called as BA.5.2.28. The lineage defining SNP of BA.5.2.28 is ORF1a:T1437I (from pango-designation issue #1133). But our consensi in question have a missing Amplicon at that particular stretch of the genome, hence in the consensus sequences are stated "N" --> How/why does pangolin assign this lineage anyway? How can I be sure it is correct?

We have the same Issue with the ONT data where lineage BA.5.2.21 is called after the update but also here we have "N" at this particular stretch in the sequence at this position.

The text was updated successfully, but these errors were encountered:

AngieHinrichs · 2022-10-17T20:14:29Z

Thanks for reporting this @MantaRay87. I think I might know the cause but it would help to have some more info:

Can you include a few complete lines of output? Assuming you're using the default analysis mode (UShER), the comment column indicates whether there were multiple equally optimally parsimonious placements (EPPs) in the lineage tree.
Can you share an example sequence or a few IDs if the sequences are in public repositories?

When a sequence has an 'N' or other IUPAC ambiguous base, UShER will impute the value based on matches at other positions. When there are multiple EPPs, AFAIK UShER picks the node with the most descendants, which would make it pick BA.5.2 when it matches both BA.5.2 and BA.5.2.28 because of the N at 4575 -- but when adding UShER mode to pangolin, I added a bit of logic to override UShER's favorite when there was a plurality of matches in one lineage. Unfortunately I have seen some cases in which one lineage gets a plurality of matches only because of Ns matching multiple branches within the lineage -- so that "voting" might be harming more than helping, given the amount of amplicon dropout that is common now. Without looking at complete output for your sequences (or ideally, running pangolin on the sequences myself), I can't be sure that's causing the odd assignments, but that's my best guess.

MantaRay87 · 2022-10-18T06:41:36Z

Hey @AngieHinrichs thanks for the fast reply. I sure can:

Pango_output.csv

The corresponding Gisaid IDS can be found here:
GISAID_IDs.csv

The ONT sequences in which all 4 samples are called as BA.5.2.21 are not public. If you also need output and sequences here let me know :)

AngieHinrichs · 2022-10-18T16:54:48Z

Perfect, thanks. Yes, it is what I guessed: usher picks BA.5.2 for all four of those sequences, but since there are more matches within the BA.5.2.28 branch due to the Ns, the "voting" picks BA.5.2.28. I probably should remove the "voting" from pangolin. I will try to find time soon to see how many sequences' assignments will change, and evaluate whether it looks like an improvement overall.

In the meantime, the conflict and comment columns of the lineage_report.csv output might be useful for flagging these. conflict is the proportion of placements in a lineage other than the lineage that won the "voting" and was assigned. For the four example sequences, it is either 0.25 (when comment column has Usher placements: BA.5.2(1/4) BA.5.2.28(3/4)) or 0.4 (when comment has Usher placements: BA.5.2(2/5) BA.5.2.28(3/5)).

…cov-lineages#492.

…#492. (#521)

AngieHinrichs mentioned this issue Jan 16, 2023

Pangolin v4.2 stuck on "Using UShER as inference engine." #500

Closed

AngieHinrichs added a commit to AngieHinrichs/pangolin that referenced this issue May 11, 2023

Remove max_count/max_lineage 'voting' logic from usher_parsing -- refs …

974e5e1

…cov-lineages#492.

AngieHinrichs mentioned this issue May 11, 2023

Remove max_count/max_lineage 'voting' logic from usher_parsing #521

Merged

AngieHinrichs added a commit that referenced this issue May 15, 2023

Remove max_count/max_lineage 'voting' logic from usher_parsing -- refs …

09e78b1

…#492. (#521)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pangoline Lineage calling despite "N" at defining SNP position #492

Pangoline Lineage calling despite "N" at defining SNP position #492

MantaRay87 commented Oct 17, 2022

AngieHinrichs commented Oct 17, 2022

MantaRay87 commented Oct 18, 2022

AngieHinrichs commented Oct 18, 2022

Pangoline Lineage calling despite "N" at defining SNP position #492

Pangoline Lineage calling despite "N" at defining SNP position #492

Comments

MantaRay87 commented Oct 17, 2022

AngieHinrichs commented Oct 17, 2022

MantaRay87 commented Oct 18, 2022

AngieHinrichs commented Oct 18, 2022