Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skani updates #44

Merged
merged 8 commits into from
Mar 28, 2024
Merged

skani updates #44

merged 8 commits into from
Mar 28, 2024

Conversation

dpark01
Copy link
Member

@dpark01 dpark01 commented Mar 28, 2024

This PR introduces two changes.

First, this changes all skani invocations to use -m 15 -c 10 as the new default parameters for the marker k-mer compression factor (-m) and the k-mer subsampling rate (-c). This is down from our previous default values of -m 30 -c 20 which is in turn lower than the skani-recommended settings for viruses of -m 200 -c 30 which is lower than the skani bacterial-inspired default values of -m 1000 -c 125. The new default values were determined through empirical testing of diverse viral taxa, and -m 15 -c 10 succeeds in clustering (finding non-zero ANI between) all of rhinovirus & enterovirus together as well as all of Lassa virus (our old default values only partially succeeded on these taxa).

Second, this PR reformats skani TSV output to be written out in sorted order of descending ANI * Total_bases_covered (instead of the default sort order of descending ANI). This prevents our reference selection code from favoring reference genomes with higher ANI for short stretches of sequence (which happened with some rhino/enteros) and now favors longer matches of high identity. This specific metric (the product of ANI and match length) is inspired by ReferenceSeeker (publication, GitHub). Unit tests added to confirm proper sort order.

@dpark01 dpark01 marked this pull request as ready for review March 28, 2024 21:27
@dpark01 dpark01 merged commit 6178d2e into master Mar 28, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant