Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

designation from hash being over-ridden by scorpio (resulting in "None" assignment) #305

Closed
aretchless opened this issue Aug 25, 2021 · 5 comments

Comments

@aretchless
Copy link

aretchless commented Aug 25, 2021

I am running pangolin with the following versions:
version: PLEARN-v1.2.56
pangolin_version: 3.1.11
pango_version: v1.2.56
scorpio_version: 0.3.12
constellations: v0.0.13

I ran pangolin on sequences for the taxa listed in the lineage file (pango-designation-1.2.56/lineages.csv) using sequences downloaded from GISAID. Among these sequences, I found 986 that had 'None' in the lineage call and 'PANGO-v1.2.56' in the version field.; 983 of these had the phrase "'not supported by scorpio'" in their 'note' column. Most of these (675) were designated as lineage B.1.623, but several other lineages were affected too.

Is this the intended behavior, or should designation be prioritized over scorpio?

I can't attach a file, so here are 25 as an example:
England/CAMC-105EB5A/2021
Cambodia/126518/2020
Cambodia/126519/2020
Cambodia/126516/2020
Cambodia/126521/2020
Wales/PHWC-4C721A/2021
England/QEUH-C31DA9/2020
Wales/PHWC-4B4EA3/2021
Latvia/232/2021
England/ALDP-118537F/2021
England/QEUH-CCCB30/2020
England/QEUH-CD0F1F/2020
Spain/CT-HUVH-Bellvitge55844/2021
Turkey/HSGM-1398/2021
Turkey/HSGM-1401/2021
Turkey/HSGM-1327/2021
Turkey/HSGM-1346/2021
Turkey/HSGM-1239/2021
Spain/CT-HUVH-86062/2021
Spain/CT-HUVH-23463/2021
USA/UT-UPHL-2102916408/2021
USA/TX-CDC-9MNN-8884/2021
France/HDF-IPP01172/2021
India/GJ-GBRC-452/2020
Ireland/D-NVRL-20IRL12095/2020

@aretchless
Copy link
Author

I have some additional related information:

  1. For B.1.623, I count 739 designated sequences, so scorpio is nullifying assignments for the vast majority of those (675).
  2. When run with '--usher' option, the designation hash is used (it is not nullified by scorpio)
  3. Tangentially -- the option '--skip-designation-hash' seems to have no effect when run with '--usher'.

@donutbrew
Copy link

There are also >30000 genomes de-classified from B.1.526 and almost 10,000 B.1.1.7s using this same software combination.

@aretchless
Copy link
Author

I have some more details about the behavior of these 'None' calls. I simulated incomplete sequence data by replacing nucleotides with Ns (in 100bp chunks). Using a subset of these designated sequences, I actually got the correct lineage assignment about 10% of the time when I had replaced 10-20% of the genome. This appears to be due to reducing the 'ref allele' count for scorpio.

@rmcolq
Copy link
Contributor

rmcolq commented Aug 25, 2021

Thanks for this - I'll do some investigating! It is now expected behaviour that the pangolin lineage assignment is None where scorpio does not support the lineage assignment- we were finding too many false positives particularly with the highly prevalent Delta assignments. However it was not expected that some lineages would be de-assigned at this level.

@rmcolq
Copy link
Contributor

rmcolq commented Aug 26, 2021

Since cB.1.623 is not a VOC/VUI, for this case we have decided to remove the constellation definition. Should be resolved by constellations release v0.0.15.

For the remaining lineages, I've seen ~200 B.1.1.7 designated sequences which don't pass the threshold. We plan to remove them from the designation list as we are happy that they have intermediate mutation types.

@rmcolq rmcolq closed this as completed Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants