-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
designation from hash being over-ridden by scorpio (resulting in "None" assignment) #305
Comments
I have some additional related information:
|
There are also >30000 genomes de-classified from B.1.526 and almost 10,000 B.1.1.7s using this same software combination. |
I have some more details about the behavior of these 'None' calls. I simulated incomplete sequence data by replacing nucleotides with Ns (in 100bp chunks). Using a subset of these designated sequences, I actually got the correct lineage assignment about 10% of the time when I had replaced 10-20% of the genome. This appears to be due to reducing the 'ref allele' count for scorpio. |
Thanks for this - I'll do some investigating! It is now expected behaviour that the pangolin lineage assignment is None where scorpio does not support the lineage assignment- we were finding too many false positives particularly with the highly prevalent Delta assignments. However it was not expected that some lineages would be de-assigned at this level. |
Since cB.1.623 is not a VOC/VUI, for this case we have decided to remove the constellation definition. Should be resolved by constellations release v0.0.15. For the remaining lineages, I've seen ~200 B.1.1.7 designated sequences which don't pass the threshold. We plan to remove them from the designation list as we are happy that they have intermediate mutation types. |
I am running pangolin with the following versions:
version: PLEARN-v1.2.56
pangolin_version: 3.1.11
pango_version: v1.2.56
scorpio_version: 0.3.12
constellations: v0.0.13
I ran pangolin on sequences for the taxa listed in the lineage file (pango-designation-1.2.56/lineages.csv) using sequences downloaded from GISAID. Among these sequences, I found 986 that had 'None' in the lineage call and 'PANGO-v1.2.56' in the version field.; 983 of these had the phrase "'not supported by scorpio'" in their 'note' column. Most of these (675) were designated as lineage B.1.623, but several other lineages were affected too.
Is this the intended behavior, or should designation be prioritized over scorpio?
I can't attach a file, so here are 25 as an example:
England/CAMC-105EB5A/2021
Cambodia/126518/2020
Cambodia/126519/2020
Cambodia/126516/2020
Cambodia/126521/2020
Wales/PHWC-4C721A/2021
England/QEUH-C31DA9/2020
Wales/PHWC-4B4EA3/2021
Latvia/232/2021
England/ALDP-118537F/2021
England/QEUH-CCCB30/2020
England/QEUH-CD0F1F/2020
Spain/CT-HUVH-Bellvitge55844/2021
Turkey/HSGM-1398/2021
Turkey/HSGM-1401/2021
Turkey/HSGM-1327/2021
Turkey/HSGM-1346/2021
Turkey/HSGM-1239/2021
Spain/CT-HUVH-86062/2021
Spain/CT-HUVH-23463/2021
USA/UT-UPHL-2102916408/2021
USA/TX-CDC-9MNN-8884/2021
France/HDF-IPP01172/2021
India/GJ-GBRC-452/2020
Ireland/D-NVRL-20IRL12095/2020
The text was updated successfully, but these errors were encountered: