-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for lineage within B.1.324 with N501Y, P681H and others #25
Comments
Presumed lineage designation B.1.324.1 |
I was curious about which one was the typo (although the issue title is a good tiebreaker) so I ran pangolin on the sequences in B.1.X_2020-03-03.taxa.txt and got this tally of assignments:
So I was curious about B.1.165 and B.1.324. I extracted the sequences with those lineages in pangoLEARN/pangoLEARN/data/lineages.metadata.csv, combined those with the sequences in B.1.X_2020-03-03.taxa.txt, uploaded them to my development server for the UShER web interface, which has a more recent GISAID tree and more recent version of UShER etc. than the main site, and here's a Nextstrain view of how those sequences clustered on our tree (uploaded seqs in red plus 1000 randomly selected sequences from across the tree for context): https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/B.1.165_324_X_2020-03-03.json The red cluster with the longest branch is the proposed new B.1.324.1, and in our tree, the sequence of nucleotide mutations leading to that cluster is as follows: C241T > C14408T > A23403G > C3037T > G25563T > C1059T > G23012A > T23042C > C11005A, G14559A, A23063T, C23604A > A6851C, T8704C, C9120T, C9870T, C12854T, C13458T, G24893A, A28272T, G28975A The next red cluster down is B.1.165 with this sequence of mutations: C241T > C14408T > A23403G > C3037T > G25563T > C1059T > C20233T And finally B.1.324 with this: C241T > C14408T > A23403G > C3037T > G25563T > C1059T > C7528T I'm guessing it's C7528T that is suggesting a link between the new lineage and B.1.324, since many of the new lineage sequences have C7528T -- but not all of them do. Here is the Nextstrain view, filtered to highlight samples with genotype 7528T: If you trust UShER's parsimony-based inferences, it looks like C7528T happened independently in B.1.324 and the new lineage, and a lot more recently in the new lineage than in B.1.324. I was not able to identify any mutations in common (after C241T > C14408T > A23403G > C3037T > G25563T > C1059T) between the B.1.165 samples and the new lineage samples. So, never mind about B.1.165. But I think there's some evidence that the new lineage is not descended from B.1.324. Beyond the fact that only a subset of the new lineage sequences have C7528T, there are also at least two sequences (e.g. USA/MA-JLL-D18/2020|EPI_ISL_593478|2020-04-28) that have C241T > C14408T > A23403G > C3037T > G25563T > C1059T > G23012A > T23042C in common with the long branch to the new lineage. I can share the GISAID protobuf file privately if you're interested in running usher with it locally. Or please feel free to upload a few hundred sequences at a time to my development server and let me know if you have any problems or feature requests. It does take a few minutes to upload a few hundred sequences, align them to the reference, run UShER and extract Nextstrain JSON, but UShER+Nextstrain = pretty awesome I think. :) We plan to add Nextstrain/Auspice JSON output to the matUtils program in the UShER package soon. |
Are there any plans to designate a lineage for this? B.1.X? I see that some of the sequences were explicitly assigned to B.1. in the recent hackathon updates to lineages.csv:
|
UCSC recently started sequencing genomes in Santa Cruz county, California, and found 8 cases from this lineage. It has expanded quite a bit in New York and Florida recently, and has appeared in at least 20 US states and 6 countries (USA, Aruba, UK, Netherlands, France, Mexico). While investigating this, we found that unfortunately the deletion of 35 bases in ORF8 appears to be masked, as Ns or even reference sequence, in many submitted sequences (perhaps to circumvent automated rejection of the sequences by repository submission pipelines due to the frameshift deletion). We submitted a manuscript to bioRxiv describing the as yet undesignated lineage, which we're calling B.1.x pending lineage assignment: https://www.biorxiv.org/content/10.1101/2021.04.05.438352v1 Complicating matters a bit, Public Health England used the name "B.1.324.1" to describe the UK samples in this lineage in SARS-CoV-2 variants of concern and variants under investigation in England / Technical briefing 7 although this lineage does not appear to have descended from B.1.324. PHE also uses the name cornstalk-handprint which in turn lists VUI-21MAR-01 and VUI202103/01 as alternate names. An official Pango B.1.X (but not X) designation would help bring some order to the naming confusion. I don't think it should be B.1.324.1 because this lineage seems to have evolved independently of B.1.324, but we'll be glad to use whatever you assign. Thanks, |
To show how the lineage has grown recently, here is an interactive Nextstrain view of 185 sequences in the lineage that our manuscript describes (S:S494P, S:N501Y, S:P681H, N:M234I): https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/pango-designation-25.json?label=nuc%20mutations:C11005A,G28975A The three sequences in red (ME-HETL-J2066, -J1882, -J1776) are from https://github.com/tewhey-lab/ncov-ME/tree/master/fasta (shared by Ryan Tewhey; they were rejected by repositories due to the deletion, requiring manual confirmation); the remaining sequences are from GISAID and/or GenBank & COG-UK. Clicking on a branch zooms in, and when zoomed in sufficiently, you can see full names and dates (many in March) Near the bottom of the page, there is a "Download Data" link with the option to save the view as a Newick or Nexus tree. |
Hello. Just wondering if this was resolved. I've noticed over 170 genomes in April assigned as B.1 with the following mutations. Spike:
Other: NS3_Q57H, N_M234I Anna. |
@rambaut @AngieHinrichs @amniewiadomska thanks for all your comments on this. We've now designated this as lineage B.1.623 in 1.2.3 |
Distinct lineage within B.1.325 with some VOC-like mutations:
List of genomes:
B.1.X_2020-03-03.taxa.txt
The text was updated successfully, but these errors were encountered: