Skip to content

Tutorial

Bert Bogaerts edited this page Sep 24, 2025 · 3 revisions

This tutorial demonstrates how to download, index, and query the Acinetobacter baumannii PubMLST MLST scheme using MiST.

1. Download the scheme

Download the A. baumannii MLST scheme from PubMLST.org.

Note: You can retrieve download URLs of schemes with the mist list command (Listing available schemes).

mist download \
  --downloader bigsdb \
  --url https://rest.pubmlst.org/db/pubmlst_abaumannii_seqdef/schemes/1 \
  --output mlst \
  --include-profiles

After completion, the mlst directory will contain:

fasta_list.txt
Oxf_cpn60.fasta
Oxf_gdhB.fasta
Oxf_gltA.fasta
Oxf_gpi.fasta
Oxf_gyrB.fasta
Oxf_recA.fasta
Oxf_rpoD.fasta
profiles.tsv

2. Create the index

Build an index from the downloaded scheme:

mist index \
  mlst/*.fasta \
  --profiles mlst/profiles.tsv \
  --output mlst_idx

The mlst_idx directory should now contain:

├── loci_repr.fasta
├── locirepr.fasta.mni
├── loci.txt
├── Oxf_cpn60/
├── Oxf_gdhB/
├── Oxf_gltA/
├── Oxf_gpi/
├── Oxf_gyrB/
├── Oxf_recA/
├── Oxf_rpoD/
└── profiles.tsv

3. Call alleles

Download an A. baumannii genome from ENA/NCBI (or use your own FASTA file):

curl -L -o GCA_900020545.1.fasta \
  "https://www.ebi.ac.uk/ena/browser/api/fasta/GCA_900020545.1?download=true&gzip=false"

Call the alleles:

mist call \
  --db mlst_idx/ \
  --fasta GCA_900020545.1.fasta \
  --out-json results.json \
  --out-tsv results.tsv \
   -t 4

4. Inspect the results

During the run, MiST logs the number of detected loci and the assigned ST:

2025-XX-XX 00:00:00 -      mist_query -    INFO - Detected 7/7 loci (100.00%), including 0 (potential) novel alleles
2025-XX-XX 00:00:00 -      mist_query -    INFO - Matching ST: 1567 (100.00% match)

JSON output

The results.json file contains detailed information about allele calls, alignments, and the assigned sequence type. Example (truncated):

{
  "alleles": {
    "...": {},
    "Oxf_recA": {
      "allele_str": "11",
      "allele_results": [
        {
          "allele": "11",
          "alignment": {
            "seq_id": "ENA|FITR01000016|FITR01000016.1",
            "start": 114660,
            "end": 115030,
            "strand": "-"
          },
          "sequence": null,
          "closest_alleles": null
        }
      ],
      "tags": []
    },
    "Oxf_rpoD": {
      "allele_str": "5",
      "allele_results": [
        {
          "allele": "5",
          "alignment": {
            "seq_id": "ENA|FITR01000063|FITR01000063.1",
            "start": 24250,
            "end": 24762,
            "strand": "+"
          },
          "sequence": null,
          "closest_alleles": null
        }
      ],
      "tags": []
    }
  },
  "profile": {
    "name": "1567",
    "metadata": [
      [
        "ST",
        "1567"
      ],
      [
        "clonal_complex",
        "n/a"
      ],
      [
        "species",
        "Acinetobacter baumannii"
      ]
    ],
    "alleles": {
      "Oxf_gltA": "10",
      "...": "..."
    },
    "pct_match": 100.0
  },
  "metadata": {
    "timestamp": "2025-XX-XXT00:00:00",
    "tool_version": "0.0.1"
  }
}

TSV output

For a simplified view, the results.tsv file lists the allele calls per locus:

locus   allele  is_novel
Oxf_cpn60       4       False
Oxf_gdhB        182     False
Oxf_gltA        10      False
Oxf_gpi 100     False
Oxf_gyrB        12      False
Oxf_recA        11      False
Oxf_rpoD        5       False

Clone this wiki locally