Home

MiST is a rapid, accurate and flexible (core-genome) multi-locus sequence typing (MLST) allele caller.

Getting started

1. Installation

Start by installing the tool and its dependencies using the installation instructions.

To check the installation:

mist_query --version

2. Downloading a scheme

MiST does not include built-in (cg)MLST schemes. However, schemes can be downloaded from widely used resources such as PubMLST.org, EnteroBase (https://enterobase.warwick.ac.uk/) or cgMLST.org. When using these schemes, make sure to cite the corresponding source in your research.

Instructions on how to download the schemes are provided here.

3. Creating the index

Afterwards, an index has to be created, which requires locus FASTA files and optionally a profiles TSV file.

mist_index --fasta abcZ.fasta adk.fasta aroE.fasta fumC.fasta gdh.fasta -o mlst_neisseria

For details, see Indexing schemes.

4. Querying the scheme

Once the scheme is indexed, you can query assemblies (FASTA format) against it. Minimal example:

mist_query --db mlst_neisseria --fasta input_contigs.fasta

With TSV output, JSON, and logs:

mist_query --db mlst_neisseria \
           --fasta input_contigs.fasta \
           --out-tsv results.tsv \
           --out-dir results/ \
           --threads 8

For details, see Running MiST.

5. Understanding outputs

MiST produces results in JSON by default, with optional TSV and additional files.

Typical output directory:

results/
├── mist.json              # Main JSON output
├── results.tsv            # (optional) tabular results
├── mist.log               # verbose log file
├── minimap2_parsed.tsv    # (optional) alignments
└── novel_alleles/         # novel allele FASTAs (if detected)

JSON: contains allele calls, best profile match, and metadata
TSV: simple tabular format (locus, allele, is_novel)
Logs: useful for debugging and reproducibility

For details, see: Running MiST or follow the Tutorial.

Graphical overview

Database construction

Each FASTA file containing allele sequences for a locus is first clustered by sequence identity using CD-HIT. Sequences of different lengths are forced into separate clusters, regardless of identity. The resulting clusters are labelled C1, C2, and C3.
Alleles with frameshifts relative to other cluster members are detected using nucmer and split into separate clusters (e.g., C1 is split into C1a and C1b).
One representative per cluster (typing allele) is retained in the final FASTA file.
FASTA files for all loci are combined, and a Minimap2 index is built.

Allele calling

Input contigs are aligned to the combined typing alleles using Minimap2.
Corresponding sequences are extracted based on their location in the input contigs.
Extracted sequences are hashed and compared against a database of pre-computed allele hashes.
If one or more exact matches are found, they are reported; otherwise, the best-matching allele is identified.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Getting started

1. Installation

2. Downloading a scheme

3. Creating the index

4. Querying the scheme

5. Understanding outputs

Graphical overview

Database construction

Allele calling

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Getting started

Creating databases

Running & output

Example use cases

Clone this wiki locally