Skip to content

Tutorial

BertBog edited this page Sep 4, 2025 · 3 revisions

This tutorial demonstrates how to download, index, and query the Acinetobacter baumannii PubMLST MLST scheme using MiST.

1. Download the scheme

Download the A. baumannii MLST scheme from PubMLST.org.

mist_download \
  --downloader bigsdb \
  --url https://rest.pubmlst.org/db/pubmlst_abaumannii_seqdef/schemes/1 \
  --output mlst \
  --include-profiles

After completion, the mlst directory will contain:

fasta_list.txt
mist.log
Oxf_cpn60.fasta
Oxf_gdhB.fasta
Oxf_gltA.fasta
Oxf_gpi.fasta
Oxf_gyrB.fasta
Oxf_recA.fasta
Oxf_rpoD.fasta
profiles.tsv

2. Create the index

Build an index from the downloaded scheme:

mist_index \
  --fasta mlst/*.fasta \
  -p mlst/profiles.tsv \
  --output mlst_idx

The mlst_idx directory should now contain:

loci_repr.fasta
loci_repr.fasta.mni
loci.txt
Oxf_cpn60
Oxf_gdhB
Oxf_gltA
Oxf_gpi
Oxf_gyrB
Oxf_recA
Oxf_rpoD
profiles.tsv

3. Query the scheme

Download an A. baumannii genome from ENA/NCBI (or use your own FASTA file):

curl -L -o GCA_900020545.1.fasta \
  "https://www.ebi.ac.uk/ena/browser/api/fasta/GCA_900020545.1?download=true&gzip=false"

Run the query:

mist_query \
  --db mlst_idx/ \
  --fasta GCA_900020545.1.fasta \
  --out-json results.json \
  --out-tsv results.tsv \
   -t 4

4. Inspect the results

During the run, MiST logs the number of detected loci and the assigned ST:

2025-XX-XX 00:00:00 -      mist_query -    INFO - Detected 7/7 loci (100.00%), including 0 (potential) novel alleles
2025-XX-XX 00:00:00 -      mist_query -    INFO - Matching ST: 1567 (100.00% match)

JSON output

The results.json file contains detailed information about allele calls, alignments, and the assigned sequence type. Example (truncated):

{
  "alleles": {
    "...": {},
    "Oxf_recA": {
      "allele_str": "11",
      "allele_results": [
        {
          "allele": "11",
          "alignment": {
            "seq_id": "ENA|FITR01000016|FITR01000016.1",
            "start": 114660,
            "end": 115030,
            "strand": "-"
          },
          "sequence": null,
          "closest_alleles": null
        }
      ],
      "tags": []
    },
    "Oxf_rpoD": {
      "allele_str": "5",
      "allele_results": [
        {
          "allele": "5",
          "alignment": {
            "seq_id": "ENA|FITR01000063|FITR01000063.1",
            "start": 24250,
            "end": 24762,
            "strand": "+"
          },
          "sequence": null,
          "closest_alleles": null
        }
      ],
      "tags": []
    }
  },
  "profile": {
    "name": "1567",
    "metadata": [
      [
        "ST",
        "1567"
      ],
      [
        "clonal_complex",
        "n/a"
      ],
      [
        "species",
        "Acinetobacter baumannii"
      ]
    ],
    "alleles": {
      "Oxf_gltA": "10",
      "...": "..."
    },
    "pct_match": 100.0
  },
  "metadata": {
    "timestamp": "2025-XX-XXT00:00:00",
    "tool_version": "0.0.1"
  }
}

TSV output

For a simplified view, the results.tsv file lists the allele calls per locus:

locus   allele  is_novel
Oxf_cpn60       4       False
Oxf_gdhB        182     False
Oxf_gltA        10      False
Oxf_gpi 100     False
Oxf_gyrB        12      False
Oxf_recA        11      False
Oxf_rpoD        5       False

Clone this wiki locally