Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sub-lineages and new designations within P.1 #231

Closed
andersonbrito opened this issue Sep 28, 2021 · 5 comments
Closed

Sub-lineages and new designations within P.1 #231

andersonbrito opened this issue Sep 28, 2021 · 5 comments

Comments

@andersonbrito
Copy link

by Anderson Brito

Description: Sub-lineages and new designations within P.1, and some issues related to existing sub-lineages.

Sub-lineage of: P.1

Proposed lineage name: P.1.13
Earliest sequence: 2021-03-30
Most recent sequence: 2021-06-28
Countries circulating: USA

Proposed lineage name: P.1.14
Earliest sequence: 2021-02-15
Most recent sequence: 2021-09-15
Countries circulating: Brazil, Canada, Argentina, USA, Japan

Proposed lineage name: P.1.15
Earliest sequence: 2021-05-10
Most recent sequence: 2021-09-06
Countries circulating: Chile, Argentina, Brazil, Colombia

Proposed lineage name: P.1.16
Earliest sequence: 2021-03-03
Most recent sequence: 2021-08-03
Countries circulating: Belgium, Netherlands

Proposed lineage name: P.1.17
Earliest sequence: 2021-03-24
Most recent sequence: 2021-07-30
Countries circulating: Mexico, USA, Germany

Proposed lineage name: P.1.18
Earliest sequence: 2021-03-16
Most recent sequence: 2021-09-06
Countries circulating: Brazil, Paraguay, Chile, Spain

Genomes: Click here to download a TSV file listing all lineages and genomes. If you drag-and-drop that file onto this nextstrain build, you will see a new Colour by option named 'new_lineages', which will highlight all the new lineages described above (with a suffix '_new').

Suggestion for updating existing lineages: In that TSV file you will also see all the P.1 sublineages, from P.1.1 to P.1.11. Note in the screenshots below that those lineages are currently found in many parts of the 'Gamma' clade.

Possible need for withdrawal: P.1.5 is shown as a nested clade within the P.1.4 clade. Maybe P.1.5 could be withdrawn, being merged within P.1.4, or renamed P.1.4.1 (see here).

Evidence
Screen Shot 2021-09-27 at 10 16 10 PM
Screen Shot 2021-09-27 at 10 16 27 PM
Screen Shot 2021-09-27 at 10 16 31 PM
Screen Shot 2021-09-27 at 10 16 39 PM

@AngieHinrichs
Copy link
Member

FWIW, from placing the first 3 _new lineage sequences on the UCSC/UShER tree...

P.1.13_new: These cluster well on a branch with 1071 sequences almost entirely from the USA (P.1 + C1912T + G10610T): https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/pango-designation-231.13.json?c=pango_lineage_usher&label=nuc%20mutations:G10610T

P.1.14_new: These do not cluster well -- they are spread across the P.1 branch of the tree that has a total of 87633 sequences in the 2021-09-27 tree. With a subtree size of 5000, the sequences still fall into more than a dozen different subtrees.

P.1.15_new: Except for Brazil/MG-HLAGYN-1853476/2021 which is off by itself in a separate subtree of 5000, and Colombia/SAN-INS-VG-3947/2021 and Argentina/INEI105451/2021 which are not included in the tree because they have too many equally parsimonious placements (lots of Ns), the remaining 13 sequences are on a branch of 3103 sequences (P.1 + G11291A + A2596G + C18129T) that are mostly Chile followed by Argentina and a smattering of other countries: https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/pango-designation-231.15.json?label=nuc%20mutations:C18129T

@AngieHinrichs
Copy link
Member

P.1.16_new: These cluster well on a branch with 2618 sequences mostly from Belgium & Netherlands (P.1 + G11296T + A25746G): https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/pango-designation-231.16.json?label=nuc%20mutations:A25746G

Also, there's a pretty clear introduction & spread in Netherlands up at the top when you enable coloring by country: https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/pango-designation-231.16.json?c=country&label=nuc%20mutations:A25746G

@andersonbrito
Copy link
Author

Thank for providing this comprehensive view about those clades, @AngieHinrichs.

Concerning the existing lineages, is it possible to recalibrate the designations to fix the misassigned genomes from P.1.1 to P.1.11 (Figures above)?

@AngieHinrichs
Copy link
Member

P.1.17_new: These 26 samples are spread across a branch with 7188 sequences, so the branch doesn't fit in the max subtree size of 5000, but this subtree has 16 of the 26 and gives a pretty good idea: https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/pango-designation-231.17.json?c=country
The top 10 countries are

   3822 USA
   1751 Mexico
    743 Luxembourg
    389 Germany
    115 Spain
     71 France
     56 England
     46 Canada
     44 Belgium
     22 Argentina

chrisruis pushed a commit that referenced this issue Sep 30, 2021
chrisruis pushed a commit that referenced this issue Oct 4, 2021
@chrisruis
Copy link
Collaborator

Thanks @andersonbrito We've added lineages P.1.13-P.1.17 in and have taken this opportunity to update the designations throughout the P.1 clade.

We've started P.1.13 to coincide with the introduction(s) into the USA so have started this lineage on a branch with G10610T (Orf1ab:V3449F).

We've started P.1.14 to coincide with the introduction(s) into Canada so have started this lineage on a branch with G9105A (Orf1ab:S2947N).

We've started P.1.15 to coincide with the introduction(s) into Chile so have started this lineage on a branch with C18129T (synonymous).

We've started P.1.16 to coincide with the introduction(s) into Belgium/other European countries so have started this lineage on a branch with A25746G (Orf3a:I118M).

We've started P.1.17 to coincide with the introduction(s) into the USA/Mexico so have started this lineage on a branch with C346T (synonymous). Within P.1.17, there is a subclade in Luxembourg that we've designated P.1.17.1 to start on a branch with C21707T (S:H49Y).

It looks like the proposed P.1.18 sequences are quite broadly spread through P.1 and we couldn't see a clear lineage in there so we haven't designated this one.

We've also updated each of the previously designated P.1 sublineages by designating all sequences to a sublineage if they are in the corresponding clade in the latest UShER tree and have <5% ambiguity.

We've also updated the designations for P.1 to include all sequences in the P.1 clade in the latest UShER tree that have <5% ambiguity and are not in one of the P.1 sublineages.

It looks like P.1.4 and P.1.5 are separate clades in the designations but are mixed in the assignments. They are both defined by the same mutation, T23599G (S:N679K), which is potentially why they are being misassigned. Hopefully the updated designations will solve this.

The number of designated sequences within each of these lineages is: P.1 - 30788, P.1.1 - 2647, P.1.2 - 600, P.1.3 - 29, P.1.4 - 326, P.1.5 - 12, P.1.6 - 211, P.1.7 - 1529, P.1.8 - 173, P.1.9 - 159, P.1.10 - 2498, P.1.10.1 - 83, P.1.10.2 - 24, P.1.11 - 35, P.1.12 - 559, P.1.13 - 930, P.1.14 - 15461, P.1.15 - 2755, P.1.16 - 2196, P.1.17 - 2997, P.1.17.1 - 890.

These changes are in v1.2.85

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants