Skip to content

Running MiST

BertBog edited this page Sep 4, 2025 · 5 revisions

MiST is a command-line tool used for allele calling and (cg)MLST profiling. It compares input assemblies or contigs in FASTA format to an indexed (cg)MLST database and reports the best-matching alleles and profiles, as well as potential novel alleles.

Example usage

The only required options are --db and --fasta.

Minimal example

mist_query --db neisseria/mlst --fasta input_contigs.fasta

Extended example (with multiple options and 8 threads)

mist_query \
  --db neisseria/mlst \
  --fasta input_contigs.fasta \
  --out-tsv alleles.tsv \
  --out-dir results \
  --threads 8

Options

The following options are available:

options:
  -h, --help            show this help message and exit
  -f FASTA, --fasta FASTA
                        Input FASTA path (required)
  -d DB, --db DB        Database path (required)
  -o OUT_JSON, --out-json OUT_JSON
                        JSON output file (default: mist.json)
  --out-tsv OUT_TSV     TSV output file
  --out-dir OUT_DIR     Output directory
  --export-novel        Create FASTA files for (potential) novel alleles
  --keep-minimap2       Store the minimap2 output
  -t THREADS, --threads THREADS
                        Nb. of threads to use (default: 1)
  --min-id-novel MIN_ID_NOVEL
                        Minimum % identity for novel alleles
  -m {all,first,longest}, --multi {all,first,longest} 
                        Strategy to handle multiple perfect hits (default: longest)
  --loci LOCI           Limit the detection to these loci (mainly used for debugging)
  --version             Print version and exit

Output files

JSON output

By default, the output will be generated in JSON format and stored in mist.json.

The JSON output contains three main sections:

  • alleles: dictionary with allele calls for each locus
  • profile: best matching (cg)ST profile and metadata (including % of matching loci)
  • metadata: analysis metadata (timestamp, tool version, etc.)

TSV output

If the --out-tsv option is set, an additional TSV file is generated with the following columns:

  • locus: target locus
  • allele: detected allele
  • is_novel: whether the allele is novel (boolean)

Additional outputs

If the --out-dir options is set, additional output files will be stored in this directory.

results/
├── mist.log              # verbose logging and executed commands
├── minimap2_parsed.tsv   # parsed Minimap2 alignments (if --keep-minimap2 is set)
└── novel_alleles/        # FASTA files of novel alleles
    ├── gdh_n462f8f.fasta # The name corresponds to the locus name followed by the sequence hash
    └── ...

JSON output format

Example output for a perfect hit of the pdhC locus:

{
  "allele_str": "3",  
  "allele_results": [
    {
      "allele": "3",
      "alignment": {
        "seq_id": "gi|77358697|ref|NC_003112.2|",
        "start": 1360856,
        "end": 1361335,
        "strand": "+"
      },
      "sequence": null,
      "closest_alleles": null
    }
  ],
  "tags": []
}

The dictionary contains the following entries:

  • allele_str: detected allele as a string
  • allele_results: all alignments
  • tags: additional tags to denote special cases or missing alleles. An overview of the available tags is provided in the table below

Tags

Tag Description
ABSENT The locus is likely absent, as no seed alignment was found.
EDGE The detected allele is located at the end of a contig and is therefore incomplete.
INDEL The locus is present, but the allele length does not match any known sequence in the database.
NOVEL A potential novel allele has been detected that is not present in the current database.

Novel alleles

If a novel allele is detected, the corresponding sequence is included in the JSON output and written to a FASTA file (if --export-novel is set).

📌 Disclaimer: It is strongly recommended to submit the valid novel alleles to the underlying databases.

Example (sequence truncated for clarity):

{
  "allele_str": "n462f8f",
  "allele_results": [
    {
      "allele": "n462f8f",
      "alignment": {
    "seq_id": "gi|77358697|ref|NC_003112.2|",
        "start": 1419413,
        "end": 1419913,
        "strand": "-"
      },
      "sequence": "ATGTTCGAGCCGCTGTGGAACAATAA...",
      "closest_alleles": [
        "gdh_5",
        "gdh_67"
      ]
    }
  ],
  "tags": [
     "NOVEL"
   ]
}

Clone this wiki locally