Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential sequences that should be included in B.1.617 #49

Closed
aineniamh opened this issue Apr 14, 2021 · 16 comments
Closed

Potential sequences that should be included in B.1.617 #49

aineniamh opened this issue Apr 14, 2021 · 16 comments
Assignees
Labels
correction Highlight an error in the description or definition
Milestone

Comments

@aineniamh
Copy link
Member

Potential need for inclusion in designation of B.1.617

Flagging this for follow up

From Bijaya Dhakal via Gunter Bach:

Virus with similar mutation profile were classified as B.1.617 (E484Q/L452R). However below have the same double mutation are still classified as B.1.596.

EPI_ISL_1415164
EPI_ISL_1415165
EPI_ISL_1415172
EPI_ISL_1415181
EPI_ISL_1415203
EPI_ISL_1415233
EPI_ISL_1415276
EPI_ISL_1415277
EPI_ISL_1415278
EPI_ISL_1415286
EPI_ISL_1415317
EPI_ISL_1415318
EPI_ISL_1415319
EPI_ISL_1415356
EPI_ISL_1415357
EPI_ISL_1415358
EPI_ISL_1415386
EPI_ISL_1415387
EPI_ISL_1454202
@aineniamh aineniamh added the correction Highlight an error in the description or definition label Apr 14, 2021
@aineniamh
Copy link
Member Author

Update to B.1.617 designation

Screenshot 2021-04-21 at 14 04 08

Figure 1: ML phylogeny built using iqtree (command: iqtree -s sequences.aln.fasta -nt 4 -blmin 0.0000000001 -m GTR+G -bb 1000). Sequences in alignment include all B.1.617 sequences on GISAID from 2021-04-20, and include the above sequences from this issue (currently assigned B.1.596). An outgroup A sequence (Wuhan/WH04/2020) and the basal B.1 sequence from Italy (EPI_ISL_420563) also included in tree for context.

Current lineage assignments found here (pangoLEARN 2020-04-14): lineage_report.csv

Proposed changes to designation from this tree

B.1.txt
B.1.617.txt
And above ids to be designated B.1.617 as well.

Total number of changes:

B.1.617 605 seqs
B.1 34 seqs

Treefile found here: sequences.aln.fasta.tree.zip

Full csv of lineage designations including update: lineages.csv

@aineniamh aineniamh added this to the B.1.617 milestone Apr 21, 2021
@aineniamh
Copy link
Member Author

B.1.617.sublineages.csv
Revision to now include three sublineages of B.1.617, each corresponding to one of the clusters shown on the above tree.
snp_plot
SNP plot from snipit showing series of mutations from a selected sequence in each cluster.
Final designation count:

B.1       40
B.1.617   1 
B.1.617.1 468 (purple)
B.1.617.2 112 (green)
B.1.617.3 23 (yellow)


@oroak
Copy link

oroak commented Apr 22, 2021

hey @aineniamh a public health officer contacted us last night re: hCoV-19/USA/OR-OHSU-10702/2021|EPI_ISL_1541883|2021-03-17 being designated B.1.617 in the latest pango updates. This sample is in the B.1.txt file above to be changed to B.1.617. However, looks clearly to me to be B.1.575, which was the original designation. Spike mutations are S:S494P,S:D614G,S:P681H,S:T716I. This change is also affecting new samples we are ready to release that are also B.1.575. A similar finding of a non-E484Q/L452R sample was also noted by the NYPHL in the SPHERES slack.

@aineniamh
Copy link
Member Author

Hi @oroak, I think you've misunderstood about the files above. That sequence is being designated B.1, not B.1.617. Previously B.1.617 only had ~60 sequences designated, however on GISAID there were over 600 sequences assigned by pangolin to that lineage.

I've taken the sequences that were being assigned B.1.617 by pangolin and given them designations. The sequences in the B.1.txt file are getting designated B.1. hCoV-19/USA/OR-OHSU-10702/2021|EPI_ISL_1541883|2021-03-17 would have only been assigned a lineage based on pangolin's 'best guess', and looking into these assignments made it clear it was not B.1.617.

@aineniamh
Copy link
Member Author

If you have a list of these sequences you believe should be B.1.575 I'm happy to investigate and add them to the official designations!

@oroak
Copy link

oroak commented Apr 23, 2021

@aineniamh Yes it looks like I misinterpreted "and IDs above" to mean the files as well. The other B.1.575s getting misassigned (by pangoLEARN 2021-04-14) we haven't released yet. This information is now part of mandatory public health reporting for Oregon and we want to make sure things are "correct" before they are reported. Yes, I do believe hCoV-19/USA/OR-OHSU-10702/2021|EPI_ISL_1541883|2021-03-17 should be designated B.1.575 (and not B.1) based on the prior pangoLEARN from early April and mutation information here https://outbreak.info/situation-reports?pango=B.1.575. This is a screen shot from a build I just ran last week Screen Shot 2021-04-23 at 8 04 09 AM

@oroak
Copy link

oroak commented Apr 26, 2021

@aineniamh I do believe hCoV-19/USA/OR-OHSU-10702/2021|EPI_ISL_1541883|2021-03-17 should be designated B.1.575. I quickly spot checked the first few US samples listed in the B.1.txt file. Based on the mutations listed they appear to be B.1.575. Could the B.1 annotation have been broadly mistaken for this group?
hCoV-19/USA/NJ-CDC-LC0035972/2021|EPI_ISL_1609515|2021-03-27
hCoV-19/USA/NJ-CDC-LC0036132/2021|EPI_ISL_1609321|2021-03-25
hCoV-19/USA/NY-NYULH1032/2021|EPI_ISL_1423413|2021-03-10 (missing S:S494P)
hCoV-19/USA/NY-PRL-03_08_00K18/2021|EPI_ISL_1258673|2021-03-07
hCoV-19/USA/NY-PRL-2021_03_15_01J21/2021|EPI_ISL_1306945|2021-03-13

@aineniamh aineniamh self-assigned this Apr 27, 2021
@aineniamh
Copy link
Member Author

@oroak I've amended the designations for B.1.575 now in issue #64. Thanks for your patience. The new pangoLEARN model will be trained today, so within the next day or so these changes will be included within pangolin assignments.

@nehajha21
Copy link

Hey, @aineniamh I am from India working on genome sequencing. In some samples that we have the pangolin assigning them B.1.617.1 or B.1.617 lineage even though it appears B.1.617.2 lineage as it does not have the mutation E484Q and have T478K mutation which is not found in B.1.617.1 lineage.

@aineniamh
Copy link
Member Author

aineniamh commented May 17, 2021

Hi @nehajha21, thanks for messaging and bringing this to our attention! Are your sequences on GISAID? If you could provide us with a list I'll make sure to get this fixed!

@aineniamh aineniamh reopened this May 17, 2021
@nehajha21
Copy link

nehajha21 commented May 18, 2021

@aineniamh the samples are not on GISAID and I can't share the sequence and data with you. But I have taken snapshots that I can share with you.
Screenshot 2021-05-18 at 12 43 37 PM

Screenshot 2021-05-18 at 12 45 02 PM

Screenshot 2021-05-18 at 12 42 43 PM

@nehajha21
Copy link

@aineniamh The amino acid mutation E484Q has the nucleotide mutation as G23012C and T478K has C22995A.
So, from the above screenshots, you can see that for sample1 at nucleotide position 23012, there is no change in the nucleotide base and for the nucleotide position 22995 there is a change in nucleotide base from C->A
As B.1.617.2 does not have the mutation E484Q and has T478K mutation, so the above sample should be assigned B.1.617.2 according to my knowledge.

@aineniamh
Copy link
Member Author

aineniamh commented May 18, 2021

Hi @nehajha21, sorry meant to say were the sequences on GISAID, I've corrected the message above. If you see from other issues on the repo, providing a list of the GISAID names allows us to designate the sequences and that feeds into the model from a GISAID download.

An example of the information to supply is here: #1
Crucially, we need GISAID sequence names or GISAID IDs to add them to the designations.

@nehajha21
Copy link

@aineniamh These samples are not uploaded on GISAID, so can't share those here. I am working in the sequencing lab in India and we receive samples from all over India on daily basis and we have to revert back to them also, and for the analysis purpose, I use pangolin to find the lineage information for all the samples. So, it's really confusing for us to calling them B.1.617.1 and report even though they appear to be B.1.617.2

@aineniamh
Copy link
Member Author

pangolin can only assign based on the known diversity that we have access to going into the training model. It's very difficult for us to do anything to fix this on our end if the sequences haven't been shared on GISAID. If they were uploaded, you wouldn't even need to supply any metadata with them necessarily if that's an issue- but that's where we can access the sequences and input into our assignment model.

Alternatively, there is a tool scorpio developed by @rmcolq and @benjamincjackson that assigns explicitly based on SNPs rather than the machine learning approach of pangolin. The tool is very new and found here: https://github.com/cov-lineages/scorpio.

As an aside though, if you know your sequences should be B.1.617.2 there is no reason for you to not call your sequences B.1.617.2, even if pangolin assigns otherwise.

@nehajha21
Copy link

Hey @aineniamh I will send you the fasta sequence of above-mentioned sample and can you please check why the pangolin assigning B.1.617.1 to the sequence even though it appears B.1.617.2 lineage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
correction Highlight an error in the description or definition
Projects
None yet
Development

No branches or pull requests

3 participants