-
Notifications
You must be signed in to change notification settings - Fork 0
Home
MiST is a rapid, accurate and flexible (core-genome) multi-locus sequence typing (MLST) allele caller.
Start by installing the tool and its dependencies using the installation instructions.
To check the installation:
mist --versionMiST does not include built-in (cg)MLST schemes. However, schemes can be downloaded from widely used resources such as PubMLST.org, EnteroBase (https://enterobase.warwick.ac.uk/) or cgMLST.org. When using these schemes, make sure to cite the corresponding source in your research.
Instructions on how to download the schemes are provided here.
Afterwards, an index has to be created, which requires locus FASTA files and optionally a profiles TSV file.
mist index abcZ.fasta adk.fasta aroE.fasta fumC.fasta gdh.fasta -o mlst_neisseria
For details, see Indexing schemes.
Once the scheme is indexed, you can query assemblies (FASTA format) against it. Minimal example:
mist call --db mlst_neisseria --fasta input_contigs.fastaWith TSV output, JSON, log, and intermediate Minimap2 output:
mist call --db mlst_neisseria \
--fasta input_contigs.fasta \
--out-tsv results/results.tsv \
--out-dir results/ \
--log results/log.txt \
--keep-minimap2 \
--threads 8For details, see Running MiST.
MiST produces results in JSON by default, with optional TSV and additional files.
Typical output directory:
results/
├── mist.json # Main JSON output
├── results.tsv # (optional) tabular results
├── mist.log # (optional) log file
├── minimap2_parsed.tsv # (optional) alignments
└── novel_alleles/ # novel allele FASTAs (if detected)
- JSON: contains allele calls, best profile match, and metadata
- TSV: simple tabular format (locus, allele, is_novel)
- Logs: useful for debugging and traceability
For details, see: Running MiST or follow the Tutorial.
- Each FASTA file containing allele sequences for a locus is first clustered by sequence identity using CD-HIT. Sequences of different lengths are forced into separate clusters, regardless of identity. The resulting clusters are labelled C1, C2, and C3.
- Alleles with frameshifts relative to other cluster members are detected using nucmer and split into separate clusters (e.g., C1 is split into C1a and C1b).
- One representative per cluster (typing allele) is retained in the final FASTA file.
- FASTA files for all loci are combined, and a Minimap2 index is built.
- Input contigs are aligned to the combined typing alleles using Minimap2.
- Corresponding sequences are extracted based on their location in the input contigs.
- Extracted sequences are hashed and compared against a database of pre-computed allele hashes.
- If one or more exact matches are found, they are reported; otherwise, the best-matching allele is identified.