-
Notifications
You must be signed in to change notification settings - Fork 0
Downloading schemes
Before MiST can perform allele calling, you need a scheme containing reference alleles and (optionally) sequence type (ST) profiles. MiST provides helper scripts to download schemes from common sources.
📌 Disclaimer: Make sure to cite the corresponding resources when using these schemes in your research.
The general command to download a scheme is:
mist_download --downloader {SOURCE} --url {URL} --output scheme_outoptions:
-h, --help show this help message and exit
-u URL, --url URL URL of the scheme
-o OUTPUT, --output OUTPUT
Output directory
-p, --include-profiles
Download the profiles
-d {cgmlstorg,enterobase,bigsdb,bisdb_auth}, --downloader {cgmlstorg,enterobase,bigsdb,bisdb_auth}
Downloader
-k DIR_TOKENS, --dir-tokens DIR_TOKENS
Directory with access tokens
EnteroBase is a web platform for analyzing bacterial genomic data. MiST downloads schemes directly from the EnteroBase FTP site.
- Available schemes: EnteroBase schemes
mist_download \
--downloader enterobase \
--url https://enterobase.warwick.ac.uk/schemes/Escherichia.Achtman7GeneMLST/ \
--output enterobase_ecoli_mlst_achtman \
--include-profiles https://enterobase.warwick.ac.uk/cite
BIGSdb is software for storing and analyzing sequence data for bacterial isolates. It powers PubMLST.org and Institut Pasteur BIGSdb.
The URL of a scheme can be determined by searching for it on this page (sequence databases have the _seqdef suffix).
For example:
https://rest.pubmlst.org/db/pubmlst_abaumannii_seqdef
From this page, you can navigate to the available schemes by following the /schemes endpoint.
https://rest.pubmlst.org/db/pubmlst_abaumannii_seqdef/schemes
mist_download \
--downloader bigsdb \
--url https://rest.pubmlst.org/db/pubmlst_abaumannii_seqdef/schemes/1 \
--output pubmlst_abaumannii_mlst \
--include-profiles - PubMLST.org: https://pubmed.ncbi.nlm.nih.gov/30345391/.
- BIGSdb Institut Pasteur: See References section for each species on BIGSdb IP Paris.
BIGSdb instances require authentication to access the most recent data. MiST relies on the BIGSdb_downloader (bundled in MiST).
- Follow the credential setup guide
- Register for database access via
My Account>Database registrations. - Ensure the following files exist in your
.bigsdb_tokensdirectory:
access_tokens
client_credentials
mist_download \
--downloader bigsdb_auth \
--url https://rest.pubmlst.org/db/pubmlst_abaumannii_seqdef/schemes/1 \
--token_dir /path/to/.bigsdb_tokens \
--key-name PubMLST \
--site PubMLST-
subprocess.TimeoutExpired: You likely do not have access. Register with the database first. - Compare allele counts with and without authentication to confirm access.
- PubMLST.org: https://pubmed.ncbi.nlm.nih.gov/30345391/.
- BIGSdb Institut Pasteur: See References section for each species on BIGSdb IP Paris.
cgMLST.org provides curated cgMLST schemes.
The available schemes and corresponding URLs can be found here.
mist_download \
--downloader cgmlstorg \
--url https://www.cgmlst.org/ncs/schema/Banthracis5268/ \
--output cgmlstorg_banthracis \
--include-profilesEach scheme lists its citation on cgMLST.org under the “citation” box.
Next step Indexing schemes.