Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2nd-Generation BA.2 Saltation Lineage, >30 spike mutations (3 seq, 2 countries, Aug 14) #2183

Closed
ryhisner opened this issue Aug 14, 2023 · 104 comments
Assignees
Labels
BA.2 designated Saltation Appears on long branch length with no intermediates
Milestone

Comments

@ryhisner
Copy link

ryhisner commented Aug 14, 2023

Description
Sub-lineage of: BA.2
Earliest sequence: 2023-7-24, Denmark
Most recent sequence: 2023-7-31; Denmark & Israel
Countries circulating: Denmark (2), Israel
Number of Sequences: 3
GISAID AA Query: Spike_E484K, V445H
GISAID Nucleotide Query: T22032C, C22033A, A22034G
CovSpectrum Query: T22032C & C22033A & A22034G
Substitutions/Deletions/Insertions on top of BA.2:
Spike: ins16_MPLF (ins21608_TCATGCCGCTGT), R21T, S50L, ∆69-70, V127F, ∆Y144, F157S, R158G, ∆N211, L212I, L216F, H245N, A264D, I332V, D339H, K356T, R403K, V445H, G446S, N450D, L452W (2-nuc), N460K, N481K, ∆V483, A484K (2-nuc), F486P, E554K (Denmark seq only), A570V, P621S, I670V (Israel seq only), H681R, S939F, P1143L
N: Q229K
M: D3H, T30A, A104V
ORF1a: A211D, V1056L, N2526S, A2710T, V3593F, T4175I
Nucleotide: C897A, G3431T, A7842G, C8293T, G8393A, G11042T, A12160G, C12789T, T13339C, T15756A, A18492G, ins21608TCATGCCGCTGT, C21711T, G21941T, T22032C, C22208T, A22034G, C22295A, C22353A, A22556G, G22770A, G22895C, T22896A, G22898A, A22910G, C22916T, ∆23009-23011, G23012A, C23013A, T23018C, T23019C, C23271T, C23423T, A23604G, C24378T, C24990T, C25207T, A26529C, A26610G, C26681T, C26833T, C28958A

USHER Tree (for what it's worth)
https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/jsons/main/2nd-Gen_BA.2.json?c=gt-nuc_897&label=id:node_5341437
image

​​Evidence
One day after the first sequence i this lineage was uploaded from Israel (Sunday 13-Aug), two sequence were uploaded from Denmark (Monday 14-Aug), and one of them has a collection date a week earlier than the Israel sequence. This one's already gone international and is likely circulating in a country with little genetic surveillance. The only question at this point is whether this will be a situation like BS.1.1 or BA.2.83, where a hugely divergent, 2nd-generation lineage spreads but never has a large impact or whether this will be closer to a BA.1-type situation.

Genomes

Genomes EPI_ISL_18096761, EPI_ISL_18097315, EPI_ISL_18097345
@ryhisner ryhisner changed the title 2nd-Generation BA.2 Saltation Lineage, ~31 spike mutations (3 seq, 2 countries, Aug 14) 2nd-Generation BA.2 Saltation Lineage, >30 spike mutations (3 seq, 2 countries, Aug 14) Aug 14, 2023
@FedeGueli
Copy link
Contributor

FedeGueli commented Aug 14, 2023

Alternative non spike nuc query: A7842G, C8293T, G8393A

C897A, G3431T, A7842G, G8393A is another query by @HynnSpylor

@shay671
Copy link

shay671 commented Aug 14, 2023

Folks , regarding the Israeli sample :
Its a patient. had contact with 2 people living with which where infected before her (a week or so).
All 3 has no immunocompromised background. They are nor chronic patients.

@FedeGueli

This comment was marked as resolved.

@aviczhl2
Copy link
Contributor

aviczhl2 commented Aug 14, 2023

alternative discussion sars-cov-2-variants/lineage-proposals#606

@aviczhl2

This comment was marked as resolved.

@FedeGueli

This comment was marked as resolved.

@FedeGueli

This comment was marked as resolved.

@silcn
Copy link

silcn commented Aug 14, 2023

This is missing C9866T = ORF1a:L3201F, which was present in almost all BA.2 outside southern Africa due to a founder effect. Suggests a southern African origin for this variant, potentially even the Omicron source.

There are a few shared mutations with BA.1 as I think has been alluded to on Twitter. The ones in Spike probably arose independently even if this did come from the Omicron source, but G8393A = ORF1a:A2710T is fairly rare outside BA.1 and might be suggestive of recombination.

(note: given that it's unlikely a recombinative origin can ever be proved, and the evidence for this coming from the Omicron source is much weaker than for BA.4 and BA.5, I would suggest this gets the next available BA.2.x designation rather than BA.6)

@FedeGueli
Copy link
Contributor

FedeGueli commented Aug 14, 2023

This is missing C9866T = ORF1a:L3201F, which was present in almost all BA.2 outside southern Africa due to a founder effect. Suggests a southern African origin for this variant, potentially even the Omicron source.

There are a few shared mutations with BA.1 as I think has been alluded to on Twitter. The ones in Spike probably arose independently even if this did come from the Omicron source, but G8393A = ORF1a:A2710T is fairly rare outside BA.1 and might be suggestive of recombination.

yeah the 9866C branch of BA.2 was common only in SA @corneliusroemer proposed a bunch of sublineage of them with at least one got designated i recall. If i dont recall badly they were successfully exported just in Germany
( and Germany was where XAK emerged another BA.1/Ba.2 complex recomb and also BA.5.9 the S:R346I branch of BA.5 stemming from the politomy so likely South african too).

I am going to check RKI seqs on Open CovSpectrum

@FedeGueli
Copy link
Contributor

G8393A, G11042T, A12160G, C12789T, T13339C, T15756A, A18492G, ins21608TCATGCCGCTGT, C21711T, G21941T, T22032C, C22208T, A22034G, C22295A, C22353A, A22556G, G22770A, G22895C, T22896A, G22898A, A22910G, C22916T, ∆23009-23011, G23012A, C23013A, T23018C, T23019C, C23271T, C23423T, A23604G, C24378T, C24990T, C25207T, A26529C, A26610G, C26681T, C26833T, C28958A

A12160G is just a reversion from G12160A of BA.4/5 so unreal due to misrooting by Usher i think.

@silcn
Copy link

silcn commented Aug 14, 2023

This is missing C9866T = ORF1a:L3201F, which was present in almost all BA.2 outside southern Africa due to a founder effect. Suggests a southern African origin for this variant, potentially even the Omicron source.

BA.2-without-9866T had a branch with C26681T which reached 5-10% of BA.2-without-9866T in South Africa in early 2022. This could be a descendant of that branch, in which case it likely wouldn't be from the Omicron source.

@silcn
Copy link

silcn commented Aug 14, 2023

Ah, this could be it, there's a small branch with S:939F within the C26681T branch.
939F

It's not a direct descendant of any of these sequences but I reckon the correct placement has it descending from the base of this branch.

@aviczhl2

This comment was marked as resolved.

@NkRMnZr

This comment was marked as resolved.

@AKruschke
Copy link

Any thoughts on G446S and F486P?

Might it be a recombinante of BA.2.75 + XBB.1.5 + BA.2 ?

@victorlin

This comment was marked as resolved.

@silcn
Copy link

silcn commented Aug 14, 2023

Might it be a recombinante of BA.2.75 + XBB.1.5 + BA.2 ?

More likely just a lot of convergent evolution, many of the RBD mutations are things we've seen before, just not all together. The mutations at 481-484 are the really new part.

@ryhisner

This comment was marked as resolved.

@krosa1910
Copy link

Note: S:H245N also appears in BA.2.3.20* (all of them), is it very common in all saltation?

@oobb45729
Copy link

This is missing C9866T = ORF1a:L3201F, which was present in almost all BA.2 outside southern Africa due to a founder effect. Suggests a southern African origin for this variant, potentially even the Omicron source.

There are a few shared mutations with BA.1 as I think has been alluded to on Twitter. The ones in Spike probably arose independently even if this did come from the Omicron source, but G8393A = ORF1a:A2710T is fairly rare outside BA.1 and might be suggestive of recombination.

(note: given that it's unlikely a recombinative origin can ever be proved, and the evidence for this coming from the Omicron source is much weaker than for BA.4 and BA.5, I would suggest this gets the next available BA.2.x designation rather than BA.6)

Could it be a reversion? ORF1a:L3201 might be an important residue. ORF1a:L3201P is in both Iota and Lambda.

@jasondorjeshort
Copy link

jasondorjeshort commented Aug 15, 2023

Note: S:H245N also appears in BA.2.3.20* (all of them), is it very common in all saltation?

483- and 245N were in #1692 (a few dozen sequences from Ukraine in February). 245N has also been seen in at least BQ.1.1.48 as a point mutation. Actually most of the RBD and NTD mutations (or different ones at the same positions, ~35 in total compared to ~12 outside the S1) have been in some previous interesting variant - it's extremely improbable.

I've never commented on either of these github projects before (been following since the BA.1 issue), but I do believe this thread has been shared on social media and could (depending on resharing and spread over the upcoming days) get some attention from the public. Best to be ready for that.

@oobb45729
Copy link

I want to highlight the mutation S:S50L here. As of today, a search of S:S50L only gives less than 1000 results on covSPECTRUM. For a C-to-U mutation, it's not much.
https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?variantQuery=S%3AS50L&
However, a large percentage of them are chronic singlets! There are some mutations that seem to occur more often in chronic sequences than non-chronic sequences. S:S50L is one of the most extreme cases. Up till now, very rarely we've seen clusters with S:S50L detected in multiple people.

I wondered whether there are mutations that like to occur with S:S50L together. I found potential candidates, S:P621X.

1
I kind of expect P621S to show up since P621S also is often associated to chronic sequences, but P621L/R/T are much rarer mutations, which only have less than 600 results together on covSPECTRUM as of today. P621T only has 48 results and 8 of them are with S50L (an AY.103 sublineage, which is one of rare cases that an S50L lineage spread to multiple people).

I think there might be something special about the S50L+P621X combination.

@oobb45729
Copy link

S:P1143L (and P1143S) are also associated with chronic sequences often.

@corneliusroemer
Copy link
Contributor

Thanks for raising opening this issue @ryhisner and the productive discussion everyone. In particular getting some extra epidemiological info @shay671!

It would be great if we could keep discussion here in this issue limited to new sequences and phylogenetics (putative parent lineages, discussion of recombinant nature or not) with exception for extra epidemiological info like Shay shared about patient history.

Discussion of the putative function of individual mutations is off topic here and better placed in sars-cov-2-variants/lineage-proposals#606 or in other issues in that repo.

I'll try to moderate this issue a little as it may get a significant attention/readership. If I hide comments as off-topic, this is just to keep the most salient information most easily locatable. For broader discussion please use sars-cov-2-variants/lineage-proposals#606 or open another issue there - for example if you have questions that aren't directly related to this lineage (e.g. how to save Usher trees for longer than 2 days).

@corneliusroemer corneliusroemer added urgent Proposal flagged for urgent review (provide explanation) BA.2 Saltation Appears on long branch length with no intermediates Recommend if grows An interesting lineage that should be prioritised for designation if it continues to grow at all labels Aug 15, 2023
@corneliusroemer
Copy link
Contributor

corneliusroemer commented Aug 18, 2023

@trilisser Yes, @AngieHinrichs is working on a pangolin-data release so BA.2.86 should also be called by pangolin over the next few days.

Nextclade now calls it also using the main dataset (not just master).

@FedeGueli It would be good to stick to verifiable data, e.g. if we don't have published SGTF data let's not speculate about it (or do so in the other repo) to keep it clean here. It would be great if you could edit previous posts rather than making lots of one sentence comments as these clog up people's email inboxes and notifications.

@FedeGueli
Copy link
Contributor

+1 England: EPI_ISL_18111770 hCoV-19/England/GSTT-230817LSBC55/2023

@trilisser Yes, @AngieHinrichs is working on a pangolin-data release so BA.2.86 should also be called by pangolin over the next few days.

Nextclade now calls it also using the main dataset (not just master).

@FedeGueli It would be good to stick to verifiable data, e.g. if we don't have published SGTF data let's not speculate about it (or do so in the other repo) to keep it clean here.

I dont speculate i asked and they said that to me.

@ryhisner
Copy link
Author

S:R21T is in all the sequences. The only sequences that don't have it are the two from Denmark, both of which have no coverage in that part of spike.

@ryhisner
Copy link
Author

ryhisner commented Aug 18, 2023

I've tried to put together a representation of the tree with only the real differences on each branch. Some of the branch lengths are incorrect, but I believe the list of real mutational differences is real, assuming there aren't mutations missing from any sequences that aren't covered by NNN's.

image

@FedeGueli
Copy link
Contributor

FedeGueli commented Aug 18, 2023

I've tried to put together a representation of the tree with only the real differences on each branch. Some of the branch lengths are incorrect, but I believe the list of real mutational differences is real, assuming there aren't mutations missing from any sequences that aren't covered by NNN's.

Great work. the most recent sample of each branch is 4 muts away only from a common ancestor.

@silcn
Copy link

silcn commented Aug 18, 2023

Reports that a third sequence has been found in Denmark: https://en.ssi.dk/news/news/2023/three-cases-of-ba-2-86-have-been-detected-in-denmark

No sign of it yet on GISAID, Denmark is only uploading on Mondays so it will presumably be in the next batch unless it gets uploaded early like the England sequence was.

@corneliusroemer
Copy link
Contributor

corneliusroemer commented Aug 18, 2023

Here's a Nextstrain tree which shows the same result as @ryhisner's annotated Usher tree: https://nextstrain.org/groups/neherlab/ncov/BA.2.86

image

I'll try to keep it updated daily as new sequences are uploaded over the next few days.

Just to confirm, the English sequence doesn't change the common ancestor sequence I shared yesterday.

@Over-There-Is
Copy link
Contributor

Over-There-Is commented Aug 18, 2023

There are too many Ns around S:144 in the Britain sequence, so I think the S:-144Y undeletion is artefact, and the N:-33G of the Danish sequence do so.
2023-08-19
And this should be 23009..23011 (3 bp).

@corneliusroemer
Copy link
Contributor

corneliusroemer commented Aug 19, 2023

The third Danish sequence has been uploaded together with a batch of 29 other Danish sequences with collection dates in the past month (note: I've been told that Danish dates are all rounded to Monday's for privacy reasons - I don't know whether rounded up, down or to the nearest Monday):
hCoV-19/Denmark/DCGC-647694/2023|EPI_ISL_18114953|2023-08-07

A very rough frequency estimate would hence put BA.2.86 at 1-5% in Denmark at the end of July.

Sequence fits squarely into the Danish/English lineage and is clean (including the insertion now), with extra synonymous C7528T from the Danish/English polytomy.
image
https://nextstrain.org/groups/neherlab/ncov/BA.2.86

The (un)deletions mentioned by @Over-There-Is in the Nextstrain tree should be ignored. Unfortunately there's no such concept as an unknown deletion. I'm now masking all gaps to prevent these artefacts. Thanks for the suggestion!

@silcn
Copy link

silcn commented Aug 19, 2023

New Danish sample doesn't have the NNNs in spike, and has one additional silent mutation C7528T relative to the first two.

So far in Denmark:
1/49 with "collection date" 24/07
1/45 with "collection date" 31/07
1/16 with "collection date" 07/08

Consistent with a rise in frequency but also not exactly suggestive of a BA.1-like explosion.

@FedeGueli
Copy link
Contributor

New Danish sample doesn't have the NNNs in spike, and has one additional silent mutation C7528T relative to the first two.

So far in Denmark: 1/49 with "collection date" 24/07 1/45 with "collection date" 31/07 1/16 with "collection date" 07/08

Consistent with a rise in frequency but also not exactly suggestive of a BA.1-like explosion.

From a rough 2% to a 6% in a week is still a lot, but unsure how much we can be confident with these data not knowing exactly the sampling strategy If hospitalized focused one could vary a lot if severity changes, and could also skew the prevalence toward elderly people may not representing real prevalence in pop. If less severe for example it could be more prevalent than what we are seeing.

@silcn
Copy link

silcn commented Aug 19, 2023

From a rough 2% to a 6% in a week is still a lot, but unsure how much we can be confident with these data not knowing exactly the sampling strategy If hospitalized focused one could vary a lot if severity changes, and could also skew the prevalence toward elderly people may not representing real prevalence in pop. If less severe for example it could be more prevalent than what we are seeing.

Yeah, and the sample sizes are too small to draw any firm conclusions of course.
For reference, BA.5 in Denmark went 0.2%, 0.5%, 1.3%, 3.9%, 14.2%, 27.1% in six successive weeks in April-May 2022 and I wouldn't be surprised if this turns out to be similar.

@MCB6
Copy link

MCB6 commented Aug 20, 2023

Some useful negative results may be the absence of BA.2.86 in the 71 new samples from Israel collected around August 10th from a different facility, 5 of those samples remained unassigned in GISAID but don't have the new strain's mutations in any case.
Also Austria reported lack of BA.2.86's mutations in 40 wastewater samples

@corneliusroemer
Copy link
Contributor

corneliusroemer commented Aug 21, 2023

Number of BA.2.86 out of all samples collected in calendar weeks starting on Monday per continent and also globally

Date of analysis: 2023-08-21, 10:20am UTC, GISAID data queried via the GISAID web interface

week starting global europe north america south america asia africa oceania
2023-07-17 0 of 8400 0 of 1462 0 of 3260 0 of 54 0 of 3312 0 of 15 0 of 297
2023-07-24 1 of 7242 1 of 1384 0 of 3286 0 of 25 0 of 2480 0 of 3 0 of 246
2023-07-31 3 of 3790 1 of 984 1 of 1748 0 of 6 1 of 874 0 of 8 0 of 170
2023-08-07 2 of 888 2 of 293 0 of 276 0 of 5 0 of 285 0 of 11 0 of 18
2023-08-14 0 of 17 0 of 8 0 of 0 0 of 0 0 of 9 0 of 0 0 of 0

Edit: there was a typo in 2023-07-17/Europe - I first had a 1 there due to line copying, but it is 0

Note: @theosanderson remarked that "It seems likely that the UK sequence was only deposited this early becase of its genotype, so it's probably better to think of the 2023-08-07 numbers as 1 rather than 2". The UK sequence had an extremely low collection-submission delay of only ~5 days.

@theosanderson
Copy link
Contributor

theosanderson commented Aug 21, 2023

(It seems likely that the UK sequence was only deposited this early becase of its genotype, so it's probably better to think of the 2023-08-07 numbers as 1 rather than 2)

@HynnSpylor
Copy link
Contributor

One more seq detected in US International Airport, with travel history in Japan (EPI_ISL_18121060)

@Over-There-Is
Copy link
Contributor

One more seq detected in US International Airport, with travel history in Japan (EPI_ISL_18121060)

With extra C222T, C1960T, T12775C, G22200Trev(S:G213Vrev, artefact?) and without S:Ins16MPLF(artefact?)
It belongs to neither the Danish branch nor the Israeli branch.

Danish branch: A6183G(Orf1a:K1973R), C12815T (Denmark 3, UK 1)
Israeli branch: C2173T, A23570G(S:I670V), C28153T(Orf8:T87I), G29000A(N:G243S) (Israel 1, US 1)

@silcn
Copy link

silcn commented Aug 22, 2023

The Michigan Department of Health, in their press release, claimed that the CDC had informed them 7 sequences had been detected worldwide. Possible the CDC were already aware of this Virginia ex-Japan sequence.

@ryhisner
Copy link
Author

One more seq detected in US International Airport, with travel history in Japan (EPI_ISL_18121060)

With extra C222T, C1960T, T12775C, G22200Trev(S:G213Vrev, artefact?) and without S:Ins16MPLF(artefact?) It belongs to neither the Danish branch nor the Israeli branch.

Danish branch: A6183G(Orf1a:K1973R), C12815T (Denmark 3, UK 1) Israeli branch: C2173T, A23570G(S:I670V), C28153T(Orf8:T87I), G29000A(N:G243S) (Israel 1, US 1)

There are NNN's in the S:211-212 area, so S:G213E is undoubtedly there, and I'm sure the insert is there as well. This is a Gingko Bozoworks sequence, so the quality is not good. I have no idea why the CDC doesn't do the travel sequences. CDC sequences are always top-notch; Ginkgo sequences never are.

@silcn
Copy link

silcn commented Aug 22, 2023

The two new South Africa sequences pointed out by @emily-smith1 in the open discussion thread (EPI_ISL_18125249 and EPI_ISL_18125259) both branch off directly from @corneliusroemer's common ancestor sequence - they don't share any extra mutations with each other or the other 7 sequences. One has 3 mutations from the common ancestor, the other 7.

The collection dates are 07/24 and 07/28. South Africa has uploaded 22 sequences with collection date 07/24 onwards, the most recent being 08/10.
The 07/24 sequence is from Gauteng. There are 7 sequences from Gauteng with collection date 07/24 onwards, with the most recent being 07/27.
The 07/28 sequence is from Mpumalanga, which borders Gauteng. It is the most recent sequence from Mpumalanga, with the previous one being 07/11.

Food for thought for country of origin speculations: Mpumalanga was one of the two provinces where BA.2-9866T+26681T+S:939F was found in Feb 2022.

I will happily admit I was probably wrong earlier about the Denmark and Israel/Michigan sequences being from two distinct transmissions from the source that didn't have time to pick up any mutations yet.

@NkRMnZr
Copy link

NkRMnZr commented Aug 24, 2023

Just realized that S:F157S in BA.2.86 is also 2-nuc change, here's a list of them:

  • S:F157S: T22032C, C22033A
  • S:G339H: G22577C, G22578A (S:G339D/G22578A on B.1.1.529 basal)
  • S:V445H: G22895C, T22896A
  • S:L452W: C22916T, T22917G
  • S:E484K: G23012A and C23013A_reversion consider it's S:E484A on B.1.1.529 basal
  • S:F486P: T23018C, T23019C

@silcn
Copy link

silcn commented Aug 24, 2023

49 new sequences from Denmark, no BA.2.86. Updated ratios:
1/52 with "collection date" 24/07
1/62 with "collection date" 31/07
1/43 with "collection date" 07/08

@unrulyturnip
Copy link

update from denmark: low levels of BA.2.86 found in wastewater, one more confirmed case.
https://twitter.com/SSI_dk/status/1695027179711533235?s=20

@ryhisner
Copy link
Author

ryhisner commented Aug 27, 2023

update from denmark: low levels of BA.2.86 found in wastewater, one more confirmed case. https://twitter.com/SSI_dk/status/1695027179711533235?s=20

EDIT: I almost forgot to thank @JosetteSchoenma for first calling my attention to the presence of T4579A in this new sequence!

There is one fascinating aspect to the most recent Denmark sequence: the synonymous mutations A4576T and T4579A.

T->A and A->T mutations are rare, as seen in the figure below from @jbloom. https://jbloomlab.github.io/SARS2-mut-spectrum/rates-by-clade.html
image

It's therefore surprising to see that T4579A has occurred in over 45,000 sequences and A4576T in over 10,000. Even more remarkable is how often the two have appeared together. A4576T is in over 23% of sequences with T4579A, while T4579A occurs in over 98% of sequences with A4576T.

image

I first noticed this peculiar mutational combination because it was in BA.5.2.23—a lineage that competed surprisingly well considering it's relative lack of immune-evasion spike mutations—and when I look more closely, it became clear that the co-occurrence of these mutations is TRS-related. (TRS = transcription regulation sequence)

Below is a diagram I made using Nextclade that shows wild-type nucleotide sequence (top row), nuc sequence with A4576T + T4579A (middle row), and the TRS-L from the beginning of the SARS-CoV-2 genome (bottom row).

image

.
The similarity is clear. As far as I know, BA.5.2.23 is the only designated lineage to have A4576T + T4579A as defining mutations, but less than 20% of sequences with A4576T + T4579A have been BA.5.2.23, so this is a very homoplasic mutational combo. Below is the lineage distribution of the 10,000+ sequences with A4576T + T4579A along with a graph of its prevalence throughout the pandemic, all via Cov-Spectrum.
https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?nextcladeQcSnpClustersScoreTo=22&variantQuery=A4576T+%26+T4579A&

image .

.
Further support for the idea that this mutational pair is TRS-related comes from the dozens of sequences with much more extended homology for the TRS-L, involving the four additional mutations A4571T, A4572G, C4573T, A4574T. About half of these sequences (most from India or Thailand) appear likely to be artifacts, but the others appear genuine.
image

@cvejris
Copy link

cvejris commented Aug 27, 2023

@ryhisner Good point! The silent T4579A appeared twice within XBB during diversification, which is conspicuous given the rareness of this transversion. Both clusters diverged further, meaning that the mutation might be linked to their success.
image

@FedeGueli
Copy link
Contributor

FedeGueli commented Aug 27, 2023

thx @ryhisner T4579A only is defining of FL.2 and present in all FE.1.1

noticed by @aviczhl2 here:sars-cov-2-variants/lineage-proposals#606 (comment)

@cvejris
Copy link

cvejris commented Aug 27, 2023

@ryhisner Any idea how a new transcription start at the beginning of Orf1a might contribute to virus fitness? Enhancement of transcription of (most) NSPs? Or would this new transcript be produced in the opposite direction (antisense)?

@FedeGueli
Copy link
Contributor

Not sure if relevant but doing an alignment tonite i noticed the BA.2.86 silent nuc mutation at the end of the spike C25207T (S:Y1215Y) interestingly there was a little XBB.1.34.1 cluster (with one sample from Sudan intercepted by GBW in VA) back in March 2023 with it. Ok until this there were 12K+ sequence with it associated to C25000T (32K without) so not so relevant, but the fact is that XBB.1.34 has as defining S:P681R and XBB.1.34.1 has also S:E554K as defining . The cluster with the sudanese sample has no S:A570V S:P621S S:S939F but it is true that has no additional silent or non synonymous mutation between S:E554K and S:Y1215Y. It has a silent mutation at Orf3a:8 that is absent in BA.2.86 so IF anything recombination has happened it would have been between S:F486P and Orf3a:7 . But being three additional Non Synonymous mutations present in BA.2.86 and absent in this cluster no smoking gun here.
Data :
Sudan (GBW VA) sequence : EPI_ISL_17358238
XBB.1.34.1 CLuster: https://nextstrain.org/fetch/genome-test.gi.ucsc.edu/trash/ct/subtreeAuspice1_genome_test_1a3a5_10eab0.json?label=id:node_6891921
Schermata 2023-09-01 alle 00 29 00
https://nextstrain.org/fetch/genome-test.gi.ucsc.edu/trash/ct/subtreeAuspice1_genome_test_1a3a5_10eab0.json?c=gt-nuc_25207&label=id:node_6891921
Sudanese XBB.1.34.1 USher
Schermata 2023-09-01 alle 00 30 32
Sudanese XBB.1.3.4.1 Nextclade:
Schermata 2023-09-01 alle 00 30 57
Alignment with BA.2.86 (uk seq)
Schermata 2023-09-01 alle 00 34 48

cc @silcn @corneliusroemer @thomasppeacock @ryhisner could you check please?

@shay671
Copy link

shay671 commented Sep 1, 2023

From epidemiologically perspective i think it would be beneficial to designate the European branch. Probably of course there is not expected advantage for it, but the comparative tracing of it while other samples stems directly from BA.2.86 is very important. I think.
What do u say folks?

@corneliusroemer
Copy link
Contributor

corneliusroemer commented Dec 1, 2023

@ryhisner I think your TRS homology was misaligned, this is the corrected (shifted by 1) figure, I've also added codon boundaries so one can see that the mutations are both in the 3rd position, they are both synonymous. I've also marked remaining mismatches with red boxes and labelled the rows:
image

If this was about TRS-L/B homology, the remaining mismatch is a GAT -> GAA which is nonsynonymous Asp -> Glu, and might hence be selected against. However in codon 1441, CTA -> CTT is synonymous. Both mismatches in codon 1442 are non-synonymous.

4588T has been seen not so much:
image

I wonder what the mechanistic effect of this would be: does it act like a TRS-B, causing production of a truncated ORF1ab missing nsp1 and nsp2 and most of nsp3? Or as a secondary TRS-L, either guiding to the real TRS-L or acting as a drop in if that one can't be found? Any ideas @theosanderson @thomasppeacock?

4588T might be slightly selected against on its own, so are all the other homology increasing nt mutations per https://raw.githubusercontent.com/jbloomlab/SARS2-mut-fitness/main/results_public_2023-10-01/nt_fitness/nt_fitness.csv

nt_site nt fitness expected_count increases homology
4585 A -0.77213 15.733 yes
4585 C 0.64053 73.545
4585 G 0.35844 9.6321
4585 T 0 98.91 wt
4588 A 0 113.76 wt
4588 C -0.88286 10.38
4588 G -0.91652 80.769
4588 T -1.4357 22.613 yes
4589 A 0 113.76 wt
4589 C 1.6473 10.38
4589 G -0.88622 80.769
4589 T -3.8335 22.613 yes
4591 A -0.64696 15.733 yes
4591 C 0.55125 73.545
4591 G -0.81163 9.6321
4591 T 0 98.91 wt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BA.2 designated Saltation Appears on long branch length with no intermediates
Projects
None yet
Development

No branches or pull requests