Skip to content

Downloading schemes

BertBog edited this page Sep 4, 2025 · 4 revisions

Before MiST can perform allele calling, you need a scheme containing reference alleles and (optionally) sequence type (ST) profiles. MiST provides helper scripts to download schemes from common sources.

⚠️ Important: After downloading, schemes must still be indexed before they can be used.

📌 Disclaimer: Make sure to cite the corresponding resources when using these schemes in your research.

Basic usage

The general command to download a scheme is:

mist_download --downloader {SOURCE} --url {URL} --output scheme_out

Options

options:
  -h, --help            show this help message and exit
  -u URL, --url URL     URL of the scheme
  -o OUTPUT, --output OUTPUT
                        Output directory
  -p, --include-profiles
                        Download the profiles
  -d {cgmlstorg,enterobase,bigsdb,bisdb_auth}, --downloader {cgmlstorg,enterobase,bigsdb,bisdb_auth}
                        Downloader
  -k DIR_TOKENS, --dir-tokens DIR_TOKENS
                        Directory with access tokens

EnteroBase

EnteroBase is a web platform for analyzing bacterial genomic data. MiST downloads schemes directly from the EnteroBase FTP site.

Example command

mist_download \
  --downloader enterobase \
  --url https://enterobase.warwick.ac.uk/schemes/Escherichia.Achtman7GeneMLST/ \
  --output enterobase_ecoli_mlst_achtman \
  --include-profiles 

How to cite

https://enterobase.warwick.ac.uk/cite


BIGSdb (PubMLST, IP Paris) - without authentication

BIGSdb is software for storing and analyzing sequence data for bacterial isolates. It powers PubMLST.org and Institut Pasteur BIGSdb.

⚠️ Both PubMLST.org and BIGSdb IPP require authorization to access the most recently submitted data. See with-authorization.

Downloading a scheme

Determining the scheme URL

The URL of a scheme can be determined by searching for it on this page (sequence databases have the _seqdef suffix).

For example:

https://rest.pubmlst.org/db/pubmlst_abaumannii_seqdef

From this page, you can navigate to the available schemes by following the /schemes endpoint.

https://rest.pubmlst.org/db/pubmlst_abaumannii_seqdef/schemes

Example command

mist_download \
  --downloader bigsdb \
  --url https://rest.pubmlst.org/db/pubmlst_abaumannii_seqdef/schemes/1 \
  --output pubmlst_abaumannii_mlst \
  --include-profiles 

How to cite


BIGSdb (PubMLST, IP Paris) - with authentication

BIGSdb instances require authentication to access the most recent data. MiST relies on the BIGSdb_downloader (bundled in MiST).

1. Setting up credentials

  • Follow the credential setup guide
  • Register for database access via My Account > Database registrations.
  • Ensure the following files exist in your .bigsdb_tokens directory:
access_tokens
client_credentials

2. Example command

mist_download \
  --downloader bigsdb_auth \
  --url https://rest.pubmlst.org/db/pubmlst_abaumannii_seqdef/schemes/1 \ 
  --token_dir /path/to/.bigsdb_tokens \
  --key-name PubMLST \
  --site PubMLST

Troubleshooting

  • subprocess.TimeoutExpired: You likely do not have access. Register with the database first.
  • Compare allele counts with and without authentication to confirm access.

How to cite


cgMLST.org

cgMLST.org provides curated cgMLST schemes.

The available schemes and corresponding URLs can be found here.

Example command

mist_download \
  --downloader cgmlstorg \
  --url https://www.cgmlst.org/ncs/schema/Banthracis5268/ \
  --output cgmlstorg_banthracis \
  --include-profiles

How to cite

Each scheme lists its citation on cgMLST.org under the “citation” box.


Next step Indexing schemes.

Clone this wiki locally