Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for lineage within B.1.324 with N501Y, P681H and others #25

Closed
rambaut opened this issue Mar 3, 2021 · 7 comments
Closed

Proposal for lineage within B.1.324 with N501Y, P681H and others #25

rambaut opened this issue Mar 3, 2021 · 7 comments
Milestone

Comments

@rambaut
Copy link
Contributor

rambaut commented Mar 3, 2021

Distinct lineage within B.1.325 with some VOC-like mutations:

image

List of genomes:
B.1.X_2020-03-03.taxa.txt

ORF1ab: T2952I
        T3202M
        H3580Q
        P4197S
        S4398L
G14505A

spike:  S494P
        N501Y
        P681H
        E1111K
        
ORF8: 27925-27959 deletion (35 nucleotides) from residue 10

A28218T

N:        M234I
@rambaut rambaut added proposed Proposal for a new lineage urgent Proposal flagged for urgent review (provide explanation) labels Mar 3, 2021
@rambaut
Copy link
Contributor Author

rambaut commented Mar 3, 2021

Presumed lineage designation B.1.324.1

@rambaut rambaut added this to the B.1.324.1 milestone Mar 3, 2021
@AngieHinrichs
Copy link
Member

Distinct lineage within B.1.325 with some VOC-like mutations:

Presumed lineage designation B.1.324.1

I was curious about which one was the typo (although the issue title is a good tiebreaker) so I ran pangolin on the sequences in B.1.X_2020-03-03.taxa.txt and got this tally of assignments:

      6 B.1
     21 B.1.165
      7 B.1.324

So I was curious about B.1.165 and B.1.324. I extracted the sequences with those lineages in pangoLEARN/pangoLEARN/data/lineages.metadata.csv, combined those with the sequences in B.1.X_2020-03-03.taxa.txt, uploaded them to my development server for the UShER web interface, which has a more recent GISAID tree and more recent version of UShER etc. than the main site, and here's a Nextstrain view of how those sequences clustered on our tree (uploaded seqs in red plus 1000 randomly selected sequences from across the tree for context):

https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/B.1.165_324_X_2020-03-03.json

The red cluster with the longest branch is the proposed new B.1.324.1, and in our tree, the sequence of nucleotide mutations leading to that cluster is as follows:

C241T > C14408T > A23403G > C3037T > G25563T > C1059T > G23012A > T23042C > C11005A, G14559A, A23063T, C23604A > A6851C, T8704C, C9120T, C9870T, C12854T, C13458T, G24893A, A28272T, G28975A

The next red cluster down is B.1.165 with this sequence of mutations:

C241T > C14408T > A23403G > C3037T > G25563T > C1059T > C20233T

And finally B.1.324 with this:

C241T > C14408T > A23403G > C3037T > G25563T > C1059T > C7528T

I'm guessing it's C7528T that is suggesting a link between the new lineage and B.1.324, since many of the new lineage sequences have C7528T -- but not all of them do. Here is the Nextstrain view, filtered to highlight samples with genotype 7528T:

https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/B.1.165_324_X_2020-03-03.json?gt=nuc.7528T&label=nuc%20mutations:C1059T

If you trust UShER's parsimony-based inferences, it looks like C7528T happened independently in B.1.324 and the new lineage, and a lot more recently in the new lineage than in B.1.324.

I was not able to identify any mutations in common (after C241T > C14408T > A23403G > C3037T > G25563T > C1059T) between the B.1.165 samples and the new lineage samples. So, never mind about B.1.165.

But I think there's some evidence that the new lineage is not descended from B.1.324. Beyond the fact that only a subset of the new lineage sequences have C7528T, there are also at least two sequences (e.g. USA/MA-JLL-D18/2020|EPI_ISL_593478|2020-04-28) that have C241T > C14408T > A23403G > C3037T > G25563T > C1059T > G23012A > T23042C in common with the long branch to the new lineage.

I can share the GISAID protobuf file privately if you're interested in running usher with it locally. Or please feel free to upload a few hundred sequences at a time to my development server and let me know if you have any problems or feature requests. It does take a few minutes to upload a few hundred sequences, align them to the reference, run UShER and extract Nextstrain JSON, but UShER+Nextstrain = pretty awesome I think. :) We plan to add Nextstrain/Auspice JSON output to the matUtils program in the UShER package soon.

@AngieHinrichs
Copy link
Member

Are there any plans to designate a lineage for this? B.1.X? I see that some of the sequences were explicitly assigned to B.1. in the recent hackathon updates to lineages.csv:

Aruba/AW-RIVM-11402/2021,B.1
England/MILK-F9DB71/2021,B.1
England/MILK-F9DBDB/2021,B.1
Aruba/AW-RIVM-10521/2021,B.1
USA/NY-NYCPHL-002149/2021,B.1
USA/NY-PRL-2020_1229_00H03/2020,B.1
USA/NY-PRL-2021_0205_06H10/2021,B.1
USA/NY-PRL-2021_0125_00G09/2021,B.1
USA/NY-PRL-2021_0205_06C02/2021,B.1
USA/NY-PRL-2021_0202_00A05/2021,B.1
USA/CO-CDPHE-2100232198/2021,B.1

@AngieHinrichs
Copy link
Member

UCSC recently started sequencing genomes in Santa Cruz county, California, and found 8 cases from this lineage. It has expanded quite a bit in New York and Florida recently, and has appeared in at least 20 US states and 6 countries (USA, Aruba, UK, Netherlands, France, Mexico). While investigating this, we found that unfortunately the deletion of 35 bases in ORF8 appears to be masked, as Ns or even reference sequence, in many submitted sequences (perhaps to circumvent automated rejection of the sequences by repository submission pipelines due to the frameshift deletion).

We submitted a manuscript to bioRxiv describing the as yet undesignated lineage, which we're calling B.1.x pending lineage assignment: https://www.biorxiv.org/content/10.1101/2021.04.05.438352v1

Complicating matters a bit, Public Health England used the name "B.1.324.1" to describe the UK samples in this lineage in SARS-CoV-2 variants of concern and variants under investigation in England / Technical briefing 7 although this lineage does not appear to have descended from B.1.324. PHE also uses the name cornstalk-handprint which in turn lists VUI-21MAR-01 and VUI202103/01 as alternate names.

An official Pango B.1.X (but not X) designation would help bring some order to the naming confusion. I don't think it should be B.1.324.1 because this lineage seems to have evolved independently of B.1.324, but we'll be glad to use whatever you assign.

Thanks,
@AngieHinrichs @russcd @bpt26

@AngieHinrichs
Copy link
Member

To show how the lineage has grown recently, here is an interactive Nextstrain view of 185 sequences in the lineage that our manuscript describes (S:S494P, S:N501Y, S:P681H, N:M234I): https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/pango-designation-25.json?label=nuc%20mutations:C11005A,G28975A

The three sequences in red (ME-HETL-J2066, -J1882, -J1776) are from https://github.com/tewhey-lab/ncov-ME/tree/master/fasta (shared by Ryan Tewhey; they were rejected by repositories due to the deletion, requiring manual confirmation); the remaining sequences are from GISAID and/or GenBank & COG-UK. Clicking on a branch zooms in, and when zoomed in sufficiently, you can see full names and dates (many in March) Near the bottom of the page, there is a "Download Data" link with the option to save the view as a Newick or Nexus tree.

@aineniamh aineniamh removed their assignment Apr 28, 2021
@amniewiadomska
Copy link

Hello. Just wondering if this was resolved. I've noticed over 170 genomes in April assigned as B.1 with the following mutations.
ORF1ab:
NSP2_T85I, NSP4_T439M, NSP6_H11Q, NSP4_T189I, NSP9_P57S, NSP12_P323L

Spike:
S494P, N501Y, P681H, D614G, K854N*, E1111K

  • most but not all have this one.

Other: NS3_Q57H, N_M234I

Anna.

@chrisruis
Copy link
Collaborator

@rambaut @AngieHinrichs @amniewiadomska thanks for all your comments on this. We've now designated this as lineage B.1.623 in 1.2.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants