Skip to content

LINcodes

Bioinformatics Platform (Belgian Institute of Public Health) edited this page Sep 26, 2025 · 2 revisions

This tutorial describes how to retrieve partial LINcodes with MIST.

For more information on LINcodes, please refer to the official documentation.

1. Download the scheme

The Klebsiella LINcodes are defined using the scgMLST629_S cgMLST scheme, which is hosted on the Institut Pasteur BIGSdb instance.

To download the scheme, first identify its URL:

mist list --downloader bigsdb --host pasteur  # List the available sub-databases on the BIGSdb instance
mist list --downloader bigsdb --host pasteur --db pubmlst_klebsiella_seqdef  # List Klebsiella schemes

Afterwards, retrieve the scgMLST629_S scheme and its profiles:

mist download \
  --downloader bigsdb \
  --url https://bigsdb.pasteur.fr/api/db/pubmlst_klebsiella_seqdef/schemes/18 \
  --output kleb_scgmlst_s \
  --include-profiles

⚠️ Note: Downloading profiles can take a long time due to the large number of entries (this is a BIGSdb limitation, not MiST).

💡 Tip: The example above downloads without authentication. To obtain the latest version of the scheme, use the bigsdb_auth downloader. See Downloading schemes for details.


2. Create the MiST index

Once the scheme is downloaded, create a MiST index:

mist index \
  --fasta-list kleb_scgmlst_s/fasta_list.txt \
  --profiles kleb_scgmlst_s/profiles.tsv \
  --output kleb_scgmlst_s-index \
  --threads 8

3. Call alleles

Download a Klebsiella genome in FASTA format (or use your own):

curl -L -o GCA_048969535.fasta  \
  "https://www.ebi.ac.uk/ena/browser/api/fasta/GCA_048969535.1?download=true&gzip=false"

Call the alleles with MiST:

mist call \
  --fasta GCA_048969535.fasta \
  --db kleb_scgmlst_s-index \
  --out-json GCA_048969535.json

The log will display the best-matching scgST:

2025-09-26 11:30:24,271 -      mistcaller -    INFO - Matching ST: 4362 (99.21% match)

4. Extract partial LINcodes

Use the provided helper script to extract partial LINcodes from the MiST JSON output:

python mist_to_partial_lin.py GCA_048969535.json

Example output:

Best matching: scgST-4362
Number of matches: 624/629
LINcode for scgMST-4362: 0_0_369_0_0_0_0_35_0_0
Partial LINcode for input strain: 0_0_369_0_0_0_*_*_*

In addition, the MiST JSON output also contains information on phylogroup, sublineage, and clonal group for the best matching scgST.

Disclaimer

When using LINCodes, please cite the original publication.

Clone this wiki locally