-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential Problem assigning B.1.623 and B.1.621 #115
Comments
Hi, We are also seeing all of our B.1.623 sequences with B.1.621 mutations listed above clustering together and growing rapidly over time. These "B.1.623" are all branching off of the same ancestral node in B.1.621 and are genetically unrelated to the canonical B.1.623. I think this might be an emerging sublineage of B.1.621, suggesting B.1.621.1 for the sublineage. Jade Wang |
Hello, all. I have also experienced this issue in our sequencing results. List |
@rmcolq In the mean time. This is the phylogenetic tree we put together. The leaf names are a substring of GISAID virus names if the genomes make it onto GISAID. The "Assigned_Pangolin_Lineage" indicates the lineage to which the genome was assigned (B.1.621 or B.1.623). As you can see a subset genomes assigned to B.1.623 (blue) are a sublineage of B.1.621 (yellow with red branch). This sublineage are genomically similar and they all have S:E484K, S:N501Y, as well as other mutations Dr. Endler pointed out. The other B.1.623 on the other hand do not all have these mutations and are genetically distinct from the B.1.621 clade. |
Dear Rachel @rmcolq , |
Dear Rachel @rmcolq , Attached a list of 48 GISAID IDs and virus names (tsv with first 4 columns from GISAID) of our sequences that were assigned B.1.623 and should be B.1.621. They are all from a cluster in Vorarlberg so not much diversity there. All the best, |
Hi, According to cov-glue, they have: I will double checking our depth of coverage and attempt to resubmit. Is there another way for us to send you the fasta files in the mean time? Thank you, |
@chrisruis I took an initial look at this, but it really isn't a simple case and I don't have the experience to resolve it. It looks like when the ids provided are uploaded to https://hgwdev-angie.gi.ucsc.edu/cgi-bin/hgPhyloPlace they mostly cluster together on the big tree, but the nearby lineages are not nicely coloured by pangolin lineage - probably haven't sampled enough of the diversity in the region in pango-designations to date. There are several subtrees, the clearest of which is this: |
@rmcolq it looks the default subtree size of 50 was used -- it's really meant for uploading only a handful of sequences and seeing the names of sequences that they're most closely related to. For bigger tree questions like this, I would use a much larger subtree size like 1000 (the max is currently 5000) -- it gives a lot more context, and you can then click on branches of interest to zoom in and see the leaf labels. Regarding the appearance of B.1 and B.1.429 calls in the branch that's mostly B.1.621 -- I usually interpret that as either pangoLEARN being affected by noise / new mutations that have appeared in other lineages, or perhaps lots of Ns (or contamination, new mutations etc) leading to multiple equally parsimonious placements for that sequence. UShER picked the placement that you see, but that might not be the only plausible placement for that sequence. Unfortunately there's no way to tell about Ns aside from examining the original sequence at this point. (If I had tons of time, I would find a way to get more of that info into the metadata...) One more thing: https://hgwdev-angie.gi.ucsc.edu/cgi-bin/hgPhyloPlace is unstable -- it's my sandbox and may be completely broken when I'm working on something new. https://genome.ucsc.edu/cgi-bin/hgPhyloPlace is the official location, although its tree & metadata may lag by a day (or sometimes a few days) and its code lags by ~one week to one month relative to the more-stable-than-hgwdev-angie test server hgwdev.gi.ucsc.edu. :) |
I pasted in both Héctor's and Lukas's sample IDs with a subtree size of 1000 so you can see more B.1.621 sequences (purple) around the cluster that is very consistently being misidentified as B.1.623 (shades of blue): https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/pango-designation-115.json?c=pango_lineage |
That looks much clearer! Thanks @AngieHinrichs. It appears that there are 2 extra ORF1a mutations (N2147D and T2952I) on the branch leading to the cluster, but otherwise it clearly fits well within the B.1.621 lineage. I don't see evidence of these new samples being corresponding to an introduction to a new place. Are you all happy with a lineage definition of B.1.621 for your samples (in which case I will pick some to add to the designations list)? |
I've added as many of these as I could to a newly designated lineage B.1.621.1, released in pango-designation v1.2.29. This will filter into the training over the next week. |
Copied over from cov-lineages/pangolin#253
Dear Pangolin team,
When assigning lineages to patient samples with pangolin I stumbled over a cluster that looks like B.1.621 but was assigned B.1.623.
The mutations we find definitely fit the B.1.621 in GISAID and not B.1.623, so either the lineage definition was changed or something else went wrong. I uploaded the sequences to USHER (https://hgwdev-angie.gi.ucsc.edu/cgi-bin/hgPhyloPlace) and got a strange result there too. Usher classified the sequence as B.1.621, but the closest sequences where B.1.623 (I guess classified by pangolin in GISAID).
An example of the AA changes our sequences have :
ORF1ab:T1055A; ORF1ab:T1538I; ORF1ab:N2147D; ORF1ab:T2952I; ORF1ab:T3255I; ORF1ab:Q3729R; ORF1ab:P4715L; ORF1ab:P5743S; ORF1ab:P5815S; ORF1ab:A6532V; ORF1ab:S6713L; S:T95I; S:Y144S; S:R346K; S:E484K; S:N501Y; S:D614G; S:P681H; S:D950N; ORF3a:Q57H; ORF3a:V256fs; ORF3a:V259L; ORF8:T11K; ORF8:P38S; ORF8:S67F; N:T205I
Of these the following fit B.1.621 in https://outbreak.info/situation-reports?pango=B.1.621
ORF1ab:T1055A; ORF1ab:T1538I; ORF1ab:T3255I; ORF1ab:Q3729R; ORF1ab:P4715L; ORF1ab:P5743S; S:T95I; S:Y144S; S:R346K; S:E484K; S:N501Y; S:D614G; S:P681H; S:D950N; ORF3a:Q57H; ORF8:T11K; ORF8:P38S; ORF8:S67F; N:T205I
and the following B.1.623 in https://outbreak.info/situation-reports?pango=B.1.623
ORF1ab:P4715L; S:N501Y; S:D614G; S:P681H; ORF3a:Q57H; with 9 AA changes missing
Attached please find a fasta with 10 of the sequences.
I use:
pangolin==3.0.3
lineages-2020-05-19-2
pangoLEARN-2021-06-15
scorpio-0.3
constellations-0.0.5
pango-designation v1.2.13
All the best and thanks again fro the great tools you provide,
Sorry, just checked on GISAID (17/06/2021, 1:24:19 PM), and one of the closest sequences according to USHER, EPI_ISL_2220456, is also classified as B.1.623 (version 3.1.1, lineages version 2021-06-15), although the mutations eg:
Spike D614G, Spike D950N, Spike E484K, Spike ins145N, Spike N501Y, Spike P681H, Spike R346K, Spike T95I, Spike Y144T, Spike Y145S
would rather fit B.1.621 #62
than B.1.623 #25
Lukas
b.1.623_run_994_10.fasta.gz
Dr. Lukas Endler
Senior Postdoc
CeMM
Research Center for Molecular Medicine
of the Austrian Academy of Sciences
Lazarettgasse 14, AKH BT 25.3
1090 Vienna, Austria
Phone +43-1/40160-70055
lendler@cemm.oeaw.ac.at
www.cemm.at
The text was updated successfully, but these errors were encountered: