The Small Parsimony Problem (SPP) aims at finding the gene orders at internal nodes of a given phylogenetic tree such that the overall genome rearrangement distance along the tree branches is minimized. This problem is intractable in most genome rearrangement models, especially when gene duplication and loss are considered.
SPP_DCJ
is Integer Linear Program-based algorithm to solve the SPP for natural genomes, i.e., genomes that contain conserved, unique, and duplicated markers. The evolutionary model that we consider is the DCJ-indel model that includes the Double-Cut and Join rearrangement operation and the insertion and deletion of genome segments.
SPP_DCJ
is an extension of DING.
- Python 3 libraries
- Snakemake
The following steps show howto run SPP_DCJ
with Gurobi.
-
SPP_DCJ
requires (i) a given phylogeny and (ii) a table with candidate adjacencies for all genomes corresponding to nodes of the given phylogeny. The candidate adjacency table has the following columns:#Species Gene_1 Ext_1 Species Gene_2 Ext_2 Weight
-
Generate ILP (
-a
and-b
are parameters of the objective function, set here to 0.5 and 0.25, respectively):scripts/spp_dcj.py -m experiments/example/Example1.idmap -a 0.5 -b 0.25 experiments/example/SpeciesTree.txt experiments/example/AdjacenciesExample1.txt > experiments/example/Example1.ilp 2> experiments/example/Example1.spp_dcj.log
-
Run ILP
gurobi_cl ResultFile=experiments/example/Example.sol experiments/example/Example1.ilp > experiments/example/Example1.gurobi.log
-
Extract adjacencies from solution
scripts/sol2adjacencies.py experiments/example/Example1.sol experiments/example/Example1.idmap > experiments/example/PredictedAdjacenciesExample1.txt
-
Visualize candidate and predicted adjacencies
scripts/visualize_genomes.py -i experiments/example/PredictedAdjacenciesExample1.txt experiments/example/AdjacenciesExample1.txt > experiments/example/PredictedAdjacenciesExample1.pdf
An the code above in included in the example in experiments/example
.