-
Notifications
You must be signed in to change notification settings - Fork 0
Downloading schemes
Before MiST can perform allele calling, you need a scheme containing reference alleles and (optionally) sequence type (ST) profiles. MiST provides helper scripts to download schemes from common sources.
📌 Disclaimer: Make sure to cite the corresponding resources when using these schemes in your research.
The general command to download a scheme is:
mist download --downloader {SOURCE} --url {URL} --output scheme_outOptions:
--url TEXT URL to download from. [required]
-o, --output PATH Output directory [default: mist_download]
-p, --include-profiles Download the profiles
-d, --downloader [cgmlstorg|enterobase|bigsdb|bigsdb_auth]
Downloader [required]
--dir-tokens PATH Directory with access tokens [default:
/scratch/temp/bebog/.bigsdb_tokens]
--key-name TEXT Key name [default: PubMLST]
--site TEXT Site [default: PubMLST]
--debug Enable debug mode
--log PATH Save log to this file
--help Show this message and exit.
The currently supported values for the site option are PubMLST and Pasteur.
The --key-name should correspond to the value in the access_tokens file in the token directory.
EnteroBase is a web platform for analyzing bacterial genomic data. MiST downloads schemes directly from the EnteroBase FTP site.
- Available schemes: EnteroBase schemes
mist download \
--downloader enterobase \
--url https://enterobase.warwick.ac.uk/schemes/Escherichia.Achtman7GeneMLST/ \
--output enterobase_ecoli_mlst_achtman \
--include-profiles https://enterobase.warwick.ac.uk/cite
BIGSdb is software for storing and analyzing sequence data for bacterial isolates. It powers PubMLST.org and Institut Pasteur BIGSdb.
The URL of a scheme can be determined by searching for it on this page (sequence databases have the _seqdef suffix).
For example:
https://rest.pubmlst.org/db/pubmlst_abaumannii_seqdef
From this page, you can navigate to the available schemes by following the /schemes endpoint.
https://rest.pubmlst.org/db/pubmlst_abaumannii_seqdef/schemes
mist download \
--downloader bigsdb \
--url https://rest.pubmlst.org/db/pubmlst_abaumannii_seqdef/schemes/1 \
--output pubmlst_abaumannii_mlst \
--include-profiles - PubMLST.org: https://pubmed.ncbi.nlm.nih.gov/30345391/.
- BIGSdb Institut Pasteur: See References section for each species on BIGSdb IP Paris.
BIGSdb instances require authentication to access the most recent data. MiST relies on the BIGSdb_downloader (bundled in MiST).
- Follow the credential setup guide
- Register for database access via
My Account>Database registrations. - Ensure the following files exist in your
.bigsdb_tokensdirectory:
access_tokens
client_credentials
mist download \
--downloader bigsdb_auth \
--url https://rest.pubmlst.org/db/pubmlst_abaumannii_seqdef/schemes/1 \
--token_dir /path/to/.bigsdb_tokens \
--key-name PubMLST \
--site PubMLST-
subprocess.TimeoutExpired: You likely do not have access. Register with the database first. - Compare allele counts with and without authentication to confirm access.
- PubMLST.org: https://pubmed.ncbi.nlm.nih.gov/30345391/.
- BIGSdb Institut Pasteur: See References section for each species on BIGSdb IP Paris.
cgMLST.org provides curated cgMLST schemes.
The available schemes and corresponding URLs can be found here.
mist download \
--downloader cgmlstorg \
--url https://www.cgmlst.org/ncs/schema/Banthracis5268/ \
--output cgmlstorg_banthracis \
--include-profilesEach scheme lists its citation on cgMLST.org under the “citation” box.
Next step Indexing schemes.