-
Notifications
You must be signed in to change notification settings - Fork 0
Home
MiST is a rapid, accurate and flexible (core-genome) multi-locus sequence typing (MLST) allele caller.
Start by installing the tool and its dependencies using the installation instructions.
To check the installation:
mist_query --versionMiST does not include built-in (cg)MLST schemes. However, schemes can be downloaded from widely used resources such as PubMLST.org, EnteroBase (https://enterobase.warwick.ac.uk/) or cgMLST.org. When using these schemes, make sure to cite the corresponding source in your research.
Instructions on how to download the schemes are provided here.
Afterwards, an index has to be created, which requires locus FASTA files and optionally a profiles TSV file.
mist_index --fasta abcZ.fasta adk.fasta aroE.fasta fumC.fasta gdh.fasta -o mlst_neisseria
For details, see Indexing schemes.
Once the scheme is indexed, you can query assemblies (FASTA format) against it. Minimal example:
mist_query --db mlst_neisseria --fasta input_contigs.fastaWith TSV output, JSON, and logs:
mist_query --db mlst_neisseria \
--fasta input_contigs.fasta \
--out-tsv results.tsv \
--out-dir results/ \
--threads 8For details, see Running MiST.
MiST produces results in JSON by default, with optional TSV and additional files.
Typical output directory:
results/
├── mist.json # Main JSON output
├── results.tsv # (optional) tabular results
├── mist.log # verbose log file
├── minimap2_parsed.tsv # (optional) alignments
└── novel_alleles/ # novel allele FASTAs (if detected)
- JSON: contains allele calls, best profile match, and metadata
- TSV: simple tabular format (locus, allele, is_novel)
- Logs: useful for debugging and reproducibility
For details, see: Running MiST or follow the Tutorial.
- Each FASTA file containing allele sequences for a locus is first clustered by sequence identity using CD-HIT. Sequences of different lengths are forced into separate clusters, regardless of identity. The resulting clusters are labelled C1, C2, and C3.
- Alleles with frameshifts relative to other cluster members are detected using nucmer and split into separate clusters (e.g., C1 is split into C1a and C1b).
- One representative per cluster (typing allele) is retained in the final FASTA file.
- FASTA files for all loci are combined, and a Minimap2 index is built.
- Input contigs are aligned to the combined typing alleles using Minimap2.
- Corresponding sequences are extracted based on their location in the input contigs.
- Extracted sequences are hashed and compared against a database of pre-computed allele hashes.
- If one or more exact matches are found, they are reported; otherwise, the best-matching allele is identified.