Skip to content

Phylogenies

BertBog edited this page Sep 4, 2025 · 2 revisions

This page explains how to use the mlst_phylogeny script to build (cg)MLST-based phylogenies from allele call files.

Overview

mlst_phylogeny takes as input allele call files (TSV or JSON), filters loci and datasets according to quality thresholds, and generates:

  • A filtered allele matrix (allele_matrix.tsv)
  • A pairwise distance matrix (distances.tsv)

These outputs can be processed and visualized in tools such as GrapeTree to construct minimum spanning trees.

Input Files

The script accepts allele call files in TSV or JSON format generated by MiST.

Note: At least three datasets are required.

Basic Usage

mlst_phylogeny -i sample1.tsv sample2.tsv sample3.tsv

or with JSON files:

mlst_phylogeny -j sample1.json sample2.json sample3.json

You can mix TSV and JSON inputs:

mlst_phylogeny -i sample1.tsv -j sample2.json sample3.json

Command-line Options

usage: mlst_phylogeny [-h] [-i TSV [TSV ...]] [-j JSON [JSON ...]] [-o OUT_MATRIX] [-d OUT_DISTS] [-l MIN_PERC_LOCI] [-s MIN_PERC_SAMPLES]

options:
  -h, --help            show this help message and exit
  -i TSV [TSV ...], --tsv TSV [TSV ...]
  -j JSON [JSON ...], --json JSON [JSON ...]
  -o OUT_MATRIX, --out-matrix OUT_MATRIX
                        Filtered allele matrix (TSV)
  -d OUT_DISTS, --out-dists OUT_DISTS
                        Pairwise distance matrix (TSV)
  -l MIN_PERC_LOCI, --min-perc-loci MIN_PERC_LOCI
                        Minimum percentage of loci that should be present in a dataset
  -s MIN_PERC_SAMPLES, --min-perc-samples MIN_PERC_SAMPLES
                        Minimum percentage of datasets where loci should be present

Output files

Filtered Allele Matrix (allele_matrix.tsv)

A matrix of allele calls after filtering datasets and loci.

Example:

ID        SAUR0001 SAUR0002 SAUR0003
sample1   1        2        1
sample2   1        -        2
sample3   1        2        1

Pairwise Distance Matrix (distances.tsv)

A symmetric matrix of allelic distances between datasets.

Example:

ID       sample1  sample2  sample3
sample1  0        2        1
sample2  2        0        1
sample3  1        1        0

Building the phylogeny

You can construct a phylogeny using GrapeTree:

grapetree --profile allele_matrix.tsv --method MSTreeV2

Note that GrapeTree is not included in the installation, but can be installed using Pip.

pip install grapetree

Clone this wiki locally