Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BA.2.10 sublineage with S:W64R, 141-144del, 243-244del, G446S, F486P, R493Q, S494P, P1143L (15 seq as of 2022-08-07, India) #898

Closed
silcn opened this issue Aug 1, 2022 · 42 comments · Fixed by #927
Labels
designated recommended Recommended for designation by pango team member
Milestone

Comments

@silcn
Copy link

silcn commented Aug 1, 2022

Proposal for a sublineage of BA.2.10
Earliest sequence: 2022-06-24 (India-Maharashtra)
Countries detected: India (5 from Maharashtra, 1 from Karnataka)

Mutations on top of BA.2.10:
S:W64R, 141-144del, 243-244del(*), G446S, F486P, R493Q (reversion), S494P, P1143L
ORF1b:G662S
ORF7a:A105V
nuc:G4255A
(*) this is called as 242-243del by Nextclade, but it's clear from the nucleotides that it's really 243-244del. Both have the same effect (LAL -> L)

Sequences from this lineage were spotted independently by myself, @ryhisner and @zach-hensel. I'm proposing it due to its large number of Spike mutations, now that it's been detected in more than one Indian state. Lots of convergent evolution going on here, e.g. the ubiquitous R493Q reversion, ORF1b:G662S and S:G446S shared with BA.2.75, the 2-nucleotide mutation S:F486P shared with Constellation 1 from #844, S:243-244del and S:S494P shared with the Delta/BA.2 recombinant spotted by @c19850727 (#895)...

India_F486P

https://nextstrain.org/fetch/github.com/silcn/subtreeAuspice1/raw/main/auspice/subtreeAuspice1_genome_363bb_7e5310.json?branchLabel=Spike%20mutations&c=gt-S_64,446&label=nuc%20mutations:G15451A

Sequences:
EPI_ISL_13929780
EPI_ISL_14056750
EPI_ISL_14056762
EPI_ISL_14149647
EPI_ISL_14154126
EPI_ISL_14162192

Cov-spectrum query: https://cov-spectrum.org/explore/World/AllSamples/Past6M/variants?variantQuery=%5B6-of%3A+4255A%2C+ORF1b%3A662S%2C+S%3A64R%2C+S%3A446S%2C+S%3A486P%2C+S%3A494P%2C+S%3A1143L%2C+ORF7a%3A105V%5D&

@thomasppeacock thomasppeacock added the monitor currently too small, watch for future developments label Aug 2, 2022
@silcn
Copy link
Author

silcn commented Aug 3, 2022

4 new sequences from Telangana, India:
EPI_ISL_14215421
EPI_ISL_14215485
EPI_ISL_14215519
EPI_ISL_14215522

@silcn silcn changed the title BA.2.10 sublineage with S:W64R, 141-144del, 243-244del, G446S, F486P, R493Q, S494P, P1143L (6 seq, India) BA.2.10 sublineage with S:W64R, 141-144del, 243-244del, G446S, F486P, R493Q, S494P, P1143L (10 seq as of 2022-08-03, India) Aug 3, 2022
@thomasppeacock
Copy link

Sequence just uploaded from from Odisha as well: EPI_ISL_14228411

This looks like another BA.2.75-like lineage (long branch length, mostly mutation in Spike, probably qualifies as a second generation variant). Stuck a monitoring label on it for now but think this should be prioritised for designation if we get many more sequences.

@shay671
Copy link

shay671 commented Aug 5, 2022

Hey @thomasppeacock
This sample looks like another jump over the initial one. Nextclade QC seems legit.

image

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice6_genome_31a9d_d08a20.json?c=pango_lineage_usher&label=nuc%20mutations:G23040A,C24990T

@c19850727
Copy link

c19850727 commented Aug 5, 2022

@shay671 it seems to me that it might have some QC issue?
My first impression is that the chance for both ORF1a:3675/3677 and S:144 having rare mutations must be very low.
And also, S:L24S (often happens for S:24del) and S:A27S are vanilla BA.2 mutations, not sure why they end up in the "mutation on branch" tooltip...

@silcn
Copy link
Author

silcn commented Aug 5, 2022

@shay671 @c19850727 yeah, it's a clear QC issue with deletions, seems to be very common in sequences from Odisha.

"Deletions replaced with NNNs" is reasonably common and Usher knows how to deal with that, but something weirder is happening here: small sections of the genome adjacent to the deletion get duplicated, e.g. where the deletion at ORF1a:3675-3677 should be, there is instead a repeat of ORF1a:3672-3674, creating a little cluster of "mutations".

And instead of T---------CA in S:24-27, Odisha samples usually have TCANNNNNNTCA, which is where the extra "mutations" at S:24 and S:27 are coming from.

@thomasppeacock
Copy link

Sequence from Assam: EPI_ISL_14290857 thats 5 different Indian states now - going to recommend as this is clearly widespread and I think this is going to keep growing, and it has some pretty interesting properties.

@thomasppeacock thomasppeacock added the recommended Recommended for designation by pango team member label Aug 6, 2022
@FedeGueli
Copy link
Contributor

FedeGueli commented Aug 6, 2022

i can confirm,after monitoring it from the start via covspectrum, that the early growth rate of this one is apparently comparable to BA.2.75 and BA.2.38.3.X , hard to say more or less but i think a fast designation of this one is required. Also BA.2.75 was designated with just 12-18 sequences thx to the great early spot by @silcn

@silcn
Copy link
Author

silcn commented Aug 7, 2022

3 more from Maharashtra: EPI_ISL_14291303, EPI_ISL_14291304, EPI_ISL_14291417

@shay671
Copy link

shay671 commented Aug 7, 2022

Fantastic work (as always) @silcn.

Each state in India should be considered as a different country; it's like we found samples in 5 European countries with clear saltation and super converging mutations.
It's a 2nd generation variant. There are so many like this in 1-2 samples. But this is just the 3rd one (including B.1.637.1) to transmit in the community successfully. From my POV, this should be worth designation.

By the way, @silcn - would love to work together on the analysis (I'm part of an Israeli team running an algorithm for saltation tracking that might integrate with your methods) contact through tweeter.

@cvejris
Copy link

cvejris commented Aug 7, 2022

This is a very interesting lineage. The second only among Omicrons which picked up ORF1b:G662S. The first was - guess what - BA.2.75. The only pre-Omicron VOC with fixed ORF1b:G662S was - guess what - Delta.
This mutation (G671S in RdRp/NSP12) was already studied and appears as fitness-enhancing: https://doi.org/10.1101/2022.01.07.475295

@silcn
Copy link
Author

silcn commented Aug 7, 2022

@shay671 I don't have a twitter account and don't intend to get one. My methods are rather more manual than yours - you keep doing your thing. If a 2nd gen variant arises in Europe or the USA you're much more likely to spot it than I am.

I see there is already a nickname floating around for this lineage. This is likely a futile request, but please could people hold off on this sort of thing until we have a clearer idea of this lineage's significance? I agree that e.g. BA.5 should have had its own name, but giving out nicknames too easily risks a "boy who cried wolf" effect.

The same goes for estimates of the growth advantage based on a very small number of sequences from a country known to have large and inconsistent geographical biases in its coverage. I remember the good old days when BA.2.75 had a 900% advantage over BA.5.

@silcn silcn changed the title BA.2.10 sublineage with S:W64R, 141-144del, 243-244del, G446S, F486P, R493Q, S494P, P1143L (10 seq as of 2022-08-03, India) BA.2.10 sublineage with S:W64R, 141-144del, 243-244del, G446S, F486P, R493Q, S494P, P1143L (15 seq as of 2022-08-07, India) Aug 7, 2022
@shay671
Copy link

shay671 commented Aug 7, 2022

You are right, now is still an early time to get a nickname for this variant.
But, BA.2.75 does require a nickname, just for the sake of it being in the heart of the news reports. This is precisely the reason the WHO started this method in the first place, not for a scientific way to classify variants that are 100% going to be the sole dominant of all cases, but rather to fight the urge many news reporters have to nickname variants based on their geographic origin and by doing so to stigmatize entire nations. I'm not 100% okay with the Centauros name, but the other option is that news reporters call it "the Indian Omicron." Is it better?

@silcn
Copy link
Author

silcn commented Aug 7, 2022

Yes, I agree that BA.2.75 is deserving of a name. I have more reservations about the way it ended up getting its current nickname than about the nickname itself. I'm also not sure BA.2.75 would have got nearly as much attention in the media in early July if it hadn't got a "cool-sounding" nickname. In that sense, maybe it's a good thing it got nicknamed as soon as it did, because it helped draw more attention to it.

On the other hand, I'm worried that if this becomes a trend then people might start giving names to every small cluster of sequences with a bunch of spike mutations, many of which won't end up spreading much - and if the media pick up on the names and start reporting on these too, maybe with a way-too-early growth advantage chart for good measure, then it'll become harder for readers to tell when something is actually worrying.

@shay671
Copy link

shay671 commented Aug 7, 2022

@cvejris
The mutations in NSP12b is super important. Its found in B.1.617.2 and also in B.1.628 (later called XB) and, of course, in BA.2.75. This is interesting convergence as RBD convergence, and in some aspects, even more critical.
But, When looking at the Usher build, it seems that this mutation was acquired in a previous stage, giving rise to maybe a stepwise evolvement in part.
https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_71c5_ead110.json?c=pango_lineage_usher&label=nuc%20mutations:T21752C

image

@corneliusroemer
Copy link
Contributor

Does someone have a good GISAID query for this? I can only catch 12 out of 15 with the following: NSP12_G671S,Spike_P1143L

@silcn
Copy link
Author

silcn commented Aug 8, 2022

@corneliusroemer unfortunately every defining mutation is covered by NNNs in at least one sequence, so I don't think you can hit all 15. "Spike_G446S, Spike_F486P, NSP13_R392C" gets 14 of them; NSP13_R392C is only there to exclude BA.1 sequences. But I don't expect that'll work too well in the long term as it'll miss sequences with RBD dropouts.

@corneliusroemer
Copy link
Contributor

So do you have a combination of 2 or 3 that are 100% specific (at least within last 2 months) and also as sensitive as possible (in aggregate)?

I worry that the Spike RBD is often dropout in Indian sequences. We probably miss at least 10% like this, the Spike_P1143L seems more robust? I guess one can use NSP12_G671S,Spike_P1143L and Spike_G446S, Spike_F486P, NSP13_R392C?

@silcn
Copy link
Author

silcn commented Aug 8, 2022

I don't feel like spending too much time coming up with the best 100% specific query when the answer will probably change within a week :)

Agree that Spike_P1143L is likely to be most robust - the one sequence that missed it has a 3-nucleotide NNN at that codon only. Not very convenient that there's hardly anything to work with outside the spike.

@FedeGueli
Copy link
Contributor

FedeGueli commented Aug 10, 2022

Another sequence popped up:

EPI_ISL_14355872

from Mumbai

Edit it seems to have a rare S:C432R

@silcn
Copy link
Author

silcn commented Aug 10, 2022

@FedeGueli good spot. That S:C432R almost certainly isn't real - it's right next to a block of NNNs and Indian sequences frequently have erroneous mutations close to NNNs.

Also EPI_ISL_14357352 from Odisha.

@thomasppeacock
Copy link

Sequence from Singapore just uploaded as well: EPI_ISL_14358912

@corneliusroemer
Copy link
Contributor

Alright, I'll make a PR for this considering it keeps popping up in new states and countries.

corneliusroemer added a commit that referenced this issue Aug 10, 2022
Added new lineage BA.2.10.4 from #898 with 7 new sequence designations
@corneliusroemer corneliusroemer added this to the BA.2.10.4 milestone Aug 10, 2022
@alantsangmb
Copy link

alantsangmb commented Aug 17, 2022

Might I confirm this BA.2.10.4 lineage is already accepted and designated?
Thank you.

@corneliusroemer
Copy link
Contributor

Yes indeed - it's accepted and designated but cannot be assigned by pangolin or Nextclade yet (needs data update)

@alantsangmb
Copy link

@corneliusroemer Thank you so much.

@silcn
Copy link
Author

silcn commented Aug 23, 2022

This keeps popping up across India, though the only non-Indian sequences are the sequence from a traveller in Singapore that's already been mentioned, and a sequence from Hong Kong that might also be from a traveller. At least 28 sequences total.

@ghost
Copy link

ghost commented Aug 24, 2022

I did an analysis with @shay671 to try and understand how many sequences of BA.2.10.4 there are worldwide and the geographic spread of this variant.
we found 26 sequences (1 Hong Kong, 1 Singapore, 24 India)

all sequences that were found in my query were recognized in UShER as BA.2.10.4 except for this one: EPI_ISL_14478719 even though it has 7/9 defining mutations (not including deletions). I think it is BA.2.10.4 but because of bad quality was placed in a different part of the tree in UShER.

all sequences were recognized as BA.2.10.4 in nextclade but none in pango

here are the full results:
BA.2.10.4 sequences.xlsx

@silcn
Copy link
Author

silcn commented Aug 24, 2022

@talyashe 27 including EPI_ISL_14478719, I take it? The 28th I had in mind was EPI_ISL_14291304.

@ghost
Copy link

ghost commented Aug 25, 2022

@silcn about EPI_ISL_14291304 in my opinion this sequence is very bad quality so you can't tell for sure.
it only has 3 out of 10 mutations that define BA.2.10.4 over its predecessor, all the other positions aren't sequenced. out of the 3 mutations it does have one is also in BA.1. even though the mutations T23018C, T23019C are relatively rare, i'm still not sure it's enough to determine it's definitely BA.2.10.4. I also checked for other mutations that are relatively prevelant within BA.2.10.4 (even though there aren't many) and this sample doesn't have them.
either way it is from Maharashtra and we already know of sequences there so I don't think it changes our understanding about the spread very much if it is BA.2.10.4 or not.

@FedeGueli
Copy link
Contributor

5 new sequences: France (1) Usa (Virginia 2) India (1 Telangana 1 Karnataka) all collected in August :

EPI_ISL_14708679
EPI_ISL_14722574
EPI_ISL_14724310
EPI_ISL_14737517
EPI_ISL_14737692

@FedeGueli
Copy link
Contributor

New tree: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_41771_fd0270.json?branchLabel=aa%20mutations&c=pango_lineage_usher&label=nuc%20mutations:G4255A,T23018C,T23019C
Schermata 2022-08-31 alle 23 23 11

Main branch seems to be the Orf3a:G100C probably if this will continue to grow will be a good candidate for a sublineage proposal

@FedeGueli
Copy link
Contributor

one more sequence from California. and it is part of the Orf3a:100C clade.

@FedeGueli
Copy link
Contributor

One more sequence from Karnataka :EPI_ISL_14810256

@FedeGueli
Copy link
Contributor

4 seqs from MH- India collected between July and early August

@FedeGueli
Copy link
Contributor

One more from Pune, Mh, India

@FedeGueli
Copy link
Contributor

FedeGueli commented Sep 11, 2022

another new one from MH. 38 samples for gisaid Spike_F486P, NSP13_R392C

@FedeGueli
Copy link
Contributor

5 from India (plus 4 from Uk spotted yesterday by @thomasppeacock )

@FedeGueli
Copy link
Contributor

4 new sequence from India

@FedeGueli
Copy link
Contributor

1 new sequence from Pennsylvania (fourth samples from United States, from three different states)

@FedeGueli
Copy link
Contributor

1 more sequence from England

@FedeGueli
Copy link
Contributor

1 seqs from England total is 54

@FedeGueli
Copy link
Contributor

59 as today ( 5 new from India)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
designated recommended Recommended for designation by pango team member
Projects
None yet
8 participants