Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CM.2 Sublineage (BA.2.3.20 + ORF1a:S2103F) with S:F486S (8 seq) #1356

Closed
ryhisner opened this issue Nov 21, 2022 · 3 comments
Closed

CM.2 Sublineage (BA.2.3.20 + ORF1a:S2103F) with S:F486S (8 seq) #1356

ryhisner opened this issue Nov 21, 2022 · 3 comments
Labels
BA.2.3.20/CM proposed sublineage of BA.2.3.20 designated
Milestone

Comments

@ryhisner
Copy link

ryhisner commented Nov 21, 2022

Description

Sub-lineage of: CM.2
Earliest sequence: 2022-11-3, Australia — EPI_ISL_15736785
Most recent sequence: 2022-11-12, Indonesia, Bali — EPI_ISL_15826751, EPI_ISL_15826751, EPI_ISL_15826758
Countries circulating: Indonesia (5), Australia (3)
Number of Sequences: 8
GISAID Query: Spike_E484R, Spike_F486S, NSP3_S1285F
CovSpectrum Query: Nextcladepangolineage:CM.2* & S:F486S & C1519T
Substitutions on top of CM.2:
Spike: F486S
Nucleotide: G16089A, T23019C

USHER Tree
Usher includes a false mutation in every BA.2.3.20 sequence that is uploaded (S:A484G / C23013G), so that should be ignored in the tree below. The Canadian sequence included by Usher in this lineage is almost certainly unrelated as it has the synonymous nucleotide translation C1519T, found in none of the others, and also lacks the synonymous nucleotide mutation G16089A, which is found in all other members of this lineage.

https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/jsons/main/CM.2%20%2B%20F486S%20-%20subtreeAuspice1_genome_387d2_ad53d0.json

image

Evidence
This appears to be one of six separate lineages of BA.2.3.20 to have independently evolved S:F486S. The others are:

• BA.2.3.20 + R346T + F486S – 3 seq, two USA (Florida), one Australia (Sydney). It’s possible this one is related to CM.7, but
I doubt it. It does not have the double-nucleotide mutation of CM.7.
• CM.7 – Very oddly has two nucleotide mutations at S:486— T23019C & T23020C, the latter synonymous
• CM.8.1 – also has G446S (7 seq)
• Isolated singlet from Australia, collected on September 18 – EPI_ISL_15258516
• Isolated singlet from Canada, collected on October 12 – EPI_ISL_15587023

With all sequences in this lineage appearing on or after November 3, it appears to be the fastest growing of all the BA.2.3.20 lineages with S:F486S.

Genomes

Genomes EPI_ISL_15736785, EPI_ISL_15754395, EPI_ISL_15754427, EPI_ISL_15755993, EPI_ISL_15820471, EPI_ISL_15826751, EPI_ISL_15826753, EPI_ISL_15826758
@ryhisner ryhisner changed the title CM.2 Sublineage (BA.2.3.20 + ORF1a:S2103F) with S:F486S (9 seq) CM.2 Sublineage (BA.2.3.20 + ORF1a:S2103F) with S:F486S (8 seq) Nov 21, 2022
@corneliusroemer corneliusroemer added the BA.2.3.20/CM proposed sublineage of BA.2.3.20 label Nov 22, 2022
@corneliusroemer corneliusroemer added this to the CM.2.1 milestone Nov 22, 2022
@AngieHinrichs
Copy link
Member

Usher includes a false mutation in every BA.2.3.20 sequence that is uploaded (S:A484G / C23013G)

When I download the sequences, align to the reference (NC_045512.2) using https://github.com/roblanf/sarscov2phylo/blob/master/scripts/global_profile_alignment.sh (which uses mafft), and extract base 23013 I do see a G for all 9 sequences on the highlighted branch. I also see A>G for all when I align and look for differences with minimap2 --cs ... | paftools.js call -L 1000 - | grep $'\t'23013. What method is telling you that they don't have 23013G ?

Omicron has A23013C and then 23013 is mutated again from C to G in these sequences, that might be muddling things.

@ryhisner
Copy link
Author

ryhisner commented Dec 7, 2022

Sorry, @AngieHinrichs, I just saw this reply. It's not just this particular lineage in which this happens. Every single BA.2.3.20 that I upload to Usher always says that it has a private mutation of C23013G. As you say, every BA.2.3.20 actually does have C23013G, so I was wrong to describe it as a false mutation. It's a real mutation, so I suppose the problem is that Usher assumes BA.2.3.20 sequences don't have C23013G unless you've just uploaded them.

It's all confusing to me, but I can't tell you how much I appreciate all the effort you put into making these trees right. We'd all be lost without you!

@AngieHinrichs
Copy link
Member

OMG @ryhisner I believe you've turned up a bug in the process that makes the public tree, that causes it to drop a mutation under very narrow conditions that are met for BA.2.3.20. I'm in the process of writing it up and will file an issue in the yatisht/usher repo and hopefully get that fixed soon. Thanks so much for reporting that!

It only affects the public tree, not the full tree that I spend most of my time on. @joshuailevy and anyone else who uses the public tree, heads up, BA.2.3.20's path should have C23013G but it currently doesn't!

In the meantime, try selecting the full tree including GISAID sequences (with ~13 million sequences now) instead of the public tree (with about half as many). (You can also paste or upload a list of EPI_ISL IDs or names without IDs instead of FASTA sequences; that is still a little slow on our main site, but some speedups for that mode are being tested on dev.usher.bio, with caveat that that's less stable than usher.bio.)
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BA.2.3.20/CM proposed sublineage of BA.2.3.20 designated
Projects
None yet
Development

No branches or pull requests

3 participants