-
Notifications
You must be signed in to change notification settings - Fork 0
Phylogenies
This page explains how to use the mlst_phylogeny script to build (cg)MLST-based phylogenies from allele call files.
mlst_phylogeny takes as input allele call files (TSV or JSON), filters loci and datasets according to quality thresholds, and generates:
- A filtered allele matrix (
allele_matrix.tsv) - A pairwise distance matrix (
distances.tsv)
These outputs can be processed and visualized in tools such as GrapeTree to construct minimum spanning trees.
The script accepts allele call files in TSV or JSON format generated by MiST.
Note: At least three datasets are required.
mlst_phylogeny -i sample1.tsv sample2.tsv sample3.tsvor with JSON files:
mlst_phylogeny -j sample1.json sample2.json sample3.jsonYou can mix TSV and JSON inputs:
mlst_phylogeny -i sample1.tsv -j sample2.json sample3.json
usage: mlst_phylogeny [-h] [-i TSV [TSV ...]] [-j JSON [JSON ...]] [-o OUT_MATRIX] [-d OUT_DISTS] [-l MIN_PERC_LOCI] [-s MIN_PERC_SAMPLES]
options:
-h, --help show this help message and exit
-i TSV [TSV ...], --tsv TSV [TSV ...]
-j JSON [JSON ...], --json JSON [JSON ...]
-o OUT_MATRIX, --out-matrix OUT_MATRIX
Filtered allele matrix (TSV)
-d OUT_DISTS, --out-dists OUT_DISTS
Pairwise distance matrix (TSV)
-l MIN_PERC_LOCI, --min-perc-loci MIN_PERC_LOCI
Minimum percentage of loci that should be present in a dataset
-s MIN_PERC_SAMPLES, --min-perc-samples MIN_PERC_SAMPLES
Minimum percentage of datasets where loci should be present
A matrix of allele calls after filtering datasets and loci.
Example:
ID SAUR0001 SAUR0002 SAUR0003
sample1 1 2 1
sample2 1 - 2
sample3 1 2 1
A symmetric matrix of allelic distances between datasets.
Example:
ID sample1 sample2 sample3
sample1 0 2 1
sample2 2 0 1
sample3 1 1 0
You can construct a phylogeny using GrapeTree:
grapetree --profile allele_matrix.tsv --method MSTreeV2
Note that GrapeTree is not included in the installation, but can be installed using Pip.
pip install grapetree