Mitochondrial Ontogenetic Phylogeny Estimation
Mope is a program for making inferences about the dynamics of mitochondrial heteroplasmy during different life stages, using approaches from population genetics and phylogenetics. See the preprint or the paper.
Mope is written in Python and requires Python 2.7+ or Python 3.1+. The following python modules are required:
- H5py (for HDF5 data processing)
- emcee for Ensemble MCMC machinery
- future for Python 2/3 compatibility
- lru-dict for fast caching
All of these, except for emcee, future, and lru-dict, are included with the Anaconda Python distribution.
To install these dependencies, you can use pip:
# pip install -U numpy scipy pandas h5py emcee future lru-dict
Mope also requires Cython, the Python development headers and libraries, and the GNU Scientific Library development libraries. Many systems will already have Cython. On Ubuntu, the remaining libraries can be installed with the following command:
# apt-get install python-dev libgsl0-dev
For install the most up-to-date version of
mope, clone this repository and
use Python and distutils:
# git clone https://github.com/ammodramus/mope # cd mope/ # python setup.py install
This may require superuser priveleges; to install in your home directory
mope can also be installed using
pip; however the version on
PyPI may not
be updated as frequently. To install with
# pip install mope
Again, this command may require superuser priveleges on some systems; in this case, use the command
# pip install --user mope
A successful installation will install the
mope library for use in Python and
an executable script called
mope, which can be used to run data analysis and
inference, perform simulations, and execute a number of other utility
functionalities. This executable script should be in the user's PATH after
To obtain example files and scripts, clone this repository rather than install
pip. Mope is not supported on Windows.
Obtaining allele frequency transition files
Likelihood calculations with mope require precomputed allele frequency transition distributions. Mope can download these automatically:
Transition distributions can also downloaded
Note that if you are downloading this file programatically (e.g., using wget or
curl), you will need to rename the downloaded file to
to limitations of our filehosting service.
To generate allele frequency transition distributions locally, run
This command will generate many commands to be run in parallel so that the transition distributions can be generated more efficiently.
For usage, try
# mope run --help
For inference with mope, allele frequency data can be provided in two formats, either as allele frequency data or as allele count data. In each case, the data takes the form of a tab-delimited table.
Required columns are the data columns, having the names of the different tissues in the ontogenetic phylogeny (and corrosponding to the leaf nodes of the phylogeny) and any age columns for ages corresponding to ontogenetic phylogeny components that accumulate drift and mutation with time.
For allele frequency data, the above data columns contain the allele
frequencies. For allele count data, the data columns contain the counts of the
focal heteroplasmic allele and additional (required) coverage columns contain
the total coverage. Count columns must be named
x is the name
of a data column.
examples/data/ for an example dataset in each format.
Ontogenetic tree file
Ontogenetic trees are specified in a modified NEWICK format. Each node requires a unique name, and a length. Only alphanumeric characters and underscores are allowed in node names.
Node lengths specify the name of the parameter pair (i.e., the genetic drift and mutation parameters) associated with the branch. Optionally, this parameter name may be multiplied by an age variable, indicating that this parameter is to be interpreted as a branch length that depends on some age. (Note that this age name must be a variable in the data file -- see Data file.)
It is also possible to specify that the genetic drift for a certain parameter
is to be modeled as a bottleneck. This done by appending
^ to the parameter
These three ways of specifying a node are demonstrated here for a node named
mother_blood:blo # simple genetic drift, no dependence on age mother_blood:blo*mother_age # rate of accumulation of drift, with mother_age mother_blood:blo^ # mother_blood is a bottleneck
For a complete example, here is the ontogenetic phylogeny used in the original study.
Parameters file (simulations only)
The parameters file specifies simulation parameters. It is a
whitespace-delimited table of parameter names (first column, must match tree
file) and their values (second column). See
examples/params/ for examples.