PyPopART is a pure Python implementation of PopART (Population Analysis with Reticulate Trees) for constructing and visualizing haplotype networks from DNA sequence data.
- Multiple Network Algorithms: MST, MSN, TCS (Statistical Parsimony), Median-Joining (MJN), Parsimony Network (PN), and Tight Span Walker (TSW)
- Distance Metrics: Hamming, Jukes-Cantor, Kimura 2-parameter, Tamura-Nei
- Comprehensive Analysis: Network statistics, topology analysis, population genetics measures
- Rich Visualization: Static (matplotlib) and interactive (Dash Cytoscape) network plots
- Flexible I/O: Support for FASTA, NEXUS, PHYLIP, GenBank formats
- Command-Line Interface: Easy-to-use CLI for all operations
- Web-based GUI: Interactive Dash application for network construction and visualization
- Python API: Programmatic access for custom workflows
git clone https://github.com/adamtaranto/pypopart.git
cd pypopart
pip install -e ".[dev]"- Python 3.9 or higher
- Dependencies: biopython, click, matplotlib, networkx, numpy, pandas, plotly, scipy, scikit-learn, numba
PyPopART provides two main interfaces:
1. Command-Line Interface (CLI) - For scripting and batch processing:
pypopart --help2. Web-based GUI - For interactive analysis:
pypopart-gui
# Opens web interface at http://localhost:8050pypopart load sequences.fastaOutput:
Loading sequences from sequences.fasta...
✓ Loaded 50 sequences
Alignment length: 500 bp
Alignment Statistics:
Sequences: 50
Length: 500 bp
Variable sites: 25
Parsimony informative: 15
GC content: 48.5%
# Median-Joining Network (default)
pypopart network sequences.fasta -o network.graphml
# Statistical Parsimony (TCS)
pypopart network sequences.fasta -a tcs -o network.graphml
# With custom distance metric
pypopart network sequences.fasta -a mjn -d k2p -o network.graphmlpypopart analyze network.graphml --statsOutput:
Loading network from network.graphml...
✓ Loaded network with 12 nodes
=== Network Statistics ===
nodes: 12
edges: 15
diameter: 4
avg_degree: 2.5
clustering_coefficient: 0.3214
reticulation_index: 0.25
# Static plot (PNG/PDF/SVG)
pypopart visualize network.graphml -o network.png --layout spring
# Interactive HTML
pypopart visualize network.graphml -o network.html --interactive# List available algorithms
pypopart info --list-algorithms
# Output:
# Available Network Construction Algorithms:
# mst - Minimum Spanning Tree
# msn - Minimum Spanning Network
# tcs - Statistical Parsimony (TCS)
# mjn - Median-Joining Network
# pn - Parsimony Network (consensus from multiple trees)
# tsw - Tight Span Walker (metric-preserving network)
# List distance metrics
pypopart info --list-distances
# List supported formats
pypopart info --list-formatsLaunch the interactive Dash application:
# Start GUI on default port 8050
pypopart-gui
# Start on custom port
pypopart-gui --port 8080
# Enable debug mode
pypopart-gui --debugOnce started, open your browser to http://localhost:8050 and follow the workflow:
- Upload Data: Load sequence alignment (FASTA, NEXUS, or PHYLIP) and optional metadata (CSV)
- Configure Algorithm: Choose network algorithm (MST, MSN, TCS, MJN, PN, TSW) and parameters
- Compute Network: Build the haplotype network
- Customize Layout: Adjust node positions, sizes, spacing, and layout algorithms
- Export Results: Download network (GraphML, GML, JSON) or images (PNG, SVG)
Features:
- Interactive network visualization with zoom and pan
- Drag-and-drop node repositioning
- Population-based coloring (pie charts for mixed nodes)
- Search and highlight specific haplotypes
- Real-time statistics and haplotype summary
- Multiple layout algorithms (Spring, Hierarchical, Kamada-Kawai, etc.)
from pypopart.io import load_alignment
from pypopart.core.distance import DistanceCalculator
from pypopart.core.condensation import condense_alignment
from pypopart.algorithms import MJNAlgorithm
from pypopart.visualization import StaticVisualizer
# Load sequences
alignment = load_alignment('sequences.fasta')
# Calculate distances
calculator = DistanceCalculator(method='k2p')
dist_matrix = calculator.calculate_matrix(alignment)
# Identify unique haplotypes
haplotypes, freq_map = condense_alignment(alignment)
# Construct Median-Joining Network
mjn = MJNAlgorithm(epsilon=0)
network = mjn.construct_network(haplotypes, dist_matrix)
# Visualize
viz = StaticVisualizer(network)
viz.plot(layout_algorithm='spring', output_file='network.png')Creates a tree connecting all haplotypes with minimum total distance.
pypopart network sequences.fasta -a mst -o network.graphmlUse when: You want the simplest possible network structure without reticulation.
Properties: Always produces a tree (no cycles), guaranteed minimum total edge weight.
Extends MST by adding alternative connections at equal distance.
pypopart network sequences.fasta -a msn -o network.graphmlUse when: You want to show alternative evolutionary pathways at the same genetic distance.
Properties: Includes all edges tied for minimum distance, may contain reticulations.
Connects haplotypes within a parsimony probability limit (default 95%).
pypopart network sequences.fasta -a tcs -p 0.95 -o network.graphmlUse when: You want statistically justified connections based on parsimony.
Properties: Uses connection limits based on parsimony probability, good for intraspecific data.
Infers ancestral/median sequences and creates a reticulate network.
pypopart network sequences.fasta -a mjn -e 0 -o network.graphmlUse when: You want to infer ancestral haplotypes and show complex evolutionary relationships.
Properties:
- Infers median vectors (ancestral nodes)
- Epsilon parameter controls complexity (0 = maximum simplification)
- Handles reticulation and homoplasy
- Good for closely related sequences
Creates a consensus network by sampling edges from multiple random parsimony trees.
pypopart network sequences.fasta -a pn -o network.graphmlUse when: You want a consensus approach that captures phylogenetic uncertainty across multiple tree topologies.
Properties:
- Samples 100 random parsimony trees by default
- Includes edges that appear frequently across trees
- Can represent reticulation where multiple edges have similar frequencies
- Automatically infers median vertices for multi-mutation edges
- Good for datasets with phylogenetic uncertainty
Constructs networks using the tight span of the distance matrix, preserving all metric properties.
pypopart network sequences.fasta -a tsw -o network.graphmlUse when: You need accurate metric-preserving networks for complex evolutionary relationships with reticulation.
Properties:
- Preserves all metric properties of the distance matrix
- Automatically infers ancestral/median sequences
- Best for small to medium datasets (n < 100)
- Computationally intensive but highly accurate
- Handles reticulation and complex evolutionary patterns
- hamming: Simple count of differences (fastest)
- jc: Jukes-Cantor correction for multiple substitutions
- k2p: Kimura 2-parameter (transitions vs transversions)
- tamura_nei: Accounts for GC content and transition/transversion bias
pypopart network sequences.fasta -d k2p -o network.graphml- FASTA (
.fasta,.fa,.fna) - NEXUS (
.nexus,.nex) - including traits/metadata - PHYLIP (
.phy,.phylip) - GenBank (
.gb,.gbk)
- GraphML (
.graphml) - Recommended, preserves all attributes - GML (
.gml) - JSON (
.json) - NEXUS (
.nexus,.nex) - PNG (
.png) - Raster image - SVG (
.svg) - Vector, web-friendly
# Load alignment with population metadata
pypopart load sequences.fasta -m metadata.csv
# Visualize colored by population
pypopart visualize network.graphml -o network.png --color-by populationMetadata CSV format:
id,population,latitude,longitude,color,notes
Hap1,PopA,,,,,
Hap2,PopA,,,,,
Hap3,PopB,,,,,
# Comprehensive statistics
pypopart analyze network.graphml --stats --topology --popgen -o results.jsonpypopart analyze network.graphml --topologyIdentifies:
- Connected components
- Star-like patterns
- Central/hub nodes
- Potential ancestral nodes
# Circular layout with labels
pypopart visualize network.graphml -o network.pdf \
--layout circular --show-labels --width 1200 --height 1200
# Radial layout, interactive
pypopart visualize network.graphml -o network.html \
--layout radial --interactiveExample data and Jupyter notebooks can be found in the examples/ directory:
01_basic_workflow.ipynb- Complete workflow from sequences to network02_algorithm_comparison.ipynb- Comparing different network algorithms03_visualization_options.ipynb- Customizing network plots
Full documentation is available at https://pypopart.readthedocs.io (coming soon)
Topics covered:
- Installation and setup
- Detailed API reference
- Algorithm descriptions and parameters
- Visualization customization
- Population genetics measures
- File format specifications
- Troubleshooting guide
If you use PyPopART in your research, please cite the original PopART paper as well as this repository:
Leigh, J.W., Bryant, D. and Nakagawa, S., 2015. POPART: full-feature software for haplotype network construction. Methods in Ecology & Evolution, 6(9).
Taranto, A. (2025). PyPopART: Pure Python implementation of haplotype network analysis.
GitHub repository: https://github.com/adamtaranto/pypopart
PyPopART implements algorithms from the following publications:
-
Minimum Spanning Tree/Network: Excoffier, L. & Smouse, P. E. (1994). Using allele frequencies and geographic subdivision to reconstruct gene trees within a species: molecular variance parsimony. Genetics, 136(1), 343-359.
-
TCS (Statistical Parsimony): Clement, M., Posada, D., & Crandall, K. A. (2000). TCS: a computer program to estimate gene genealogies. Molecular Ecology, 9(10), 1657-1659.
-
Median-Joining Network: Bandelt, H. J., Forster, P., & Röhl, A. (1999). Median-joining networks for inferring intraspecific phylogenies. Molecular Biology and Evolution, 16(1), 37-48.
-
Parsimony Network: Excoffier, L. & Smouse, P. E. (1994). Using allele frequencies and geographic subdivision to reconstruct gene trees within a species: molecular variance parsimony. Genetics, 136(1), 343-359.
-
Tight Span Walker: Dress, A. W., Huber, K. T., Koolen, J., Moulton, V., & Spillner, A. (2012). Basic Phylogenetic Combinatorics. Cambridge University Press.
PyPopART is licensed under the GNU General Public License v3.0 or later. See LICENSE for details.
git clone https://github.com/adamtaranto/pypopart.git
cd pypopart
pip install -e ".[dev]"
pre-commit install- PopART - Original PopART software
PyPopART is a python port of the original PopART software developed by Jessica Leigh.
- Author: Adam Taranto
- GitHub: @adamtaranto
- Issues: GitHub Issues