Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time



First: Clone this repository

git clone
cd PhylogicNDT

Then either :

Manual Install

Install python 2.7, R (optional) and required packages For debian:

apt-get install python-pip build-essential python-dev r-base r-base-dev git graphviz libgraphviz-dev

Install setuptools and wheel

pip install setuptools wheel

Install required packages

pip install -r req

Install scipy, matplotlib, and pandas (these versions are recommended)

pip install pandas==0.19.2 scipy==1.0.0 matplotlib==2.0.0
pip install -e git+ (for faster compute)

Docker Install

Install docker from

docker build --tag phylogicndt . 

Using the Package

./ --help

If running from the docker, first run:

docker run -i -t phylogicndt
cd phylogicndt


To run clustering on the provided sample input data:

To specify inputs:

./ Cluster -i Patient_ID  -s Sample1_id:Sample1_maf:Sample1_CN_seg:Sample1_Purity:Sample1_Timepoint -s Sample2_id:Sample2_maf:Sample2_CN_seg:Sample2_Purity:Sample2_Timepoint ... SampleN_info 

alternatively - provide a tsv sample_information_file (.sif)

with headers: sample_id maf_fn seg_fn purity timepoint

./ Cluster -i Patient_ID  -sif Patient.sif

the .maf should contain pre-computed raw ccf histograms based on mutations alt/ref count (Absolute annotated mafs or .Rdata files are also supported) if the ccf histograms are absent - the --maf_input_type flag must be set to calc_ccf and sample purity must be provided. Also local copy number must be attached to each mutation in the maf with columns named local_cn_a1 and local_cn_a2

CN_seg is optional to annotate copy-number information on the trees

To specify number of iterations:

./ Cluster -ni 1000

Acknowledgment: Clustering Module is partially inspired (primary 1D clustering) by earlier work of Carter & Getz (Landau D, Carter S , Stojanov P et al. Cell 152, 714–726, 2013)

BuildTree (and GrowthKinetics)

The GrowthKinetics module fully incorporates the BuildTree libraries, so when rates are desired, there is no need to run both.

  • The -w flag should provide a measure of tumor burden, with one value per input sample maf in clustering. When ommited, stable tumor burden is assumed.
  • The -t flag should provide relative time for spacing the samples. When omitted, equal spacing is assumed.

Just BuildTree

./ BuildTree -i Indiv_ID -sif Patient.sif  -m mutation_ccf_file -c cluster_ccf_file 


./ GrowthKinetics -i Indiv_ID -sif Patient.sif -ab cell_population_abundance_mcmc_trace -w 10 10 10 10 10 -t 1 2 3 4 5 

Run Cluster together with BuildTree

./ Cluster -i Patient_ID  -sif Patient.sif -rb


SinglePatientTiming requires a maf input and a seg file input for each sample. The maf file should be the output of PhylogicNDT Clustering module. The seg file should have the following columns:

Chromosome  Start   End A1.Seg.CN   A2.Seg.CN

To run SinglePatientTiming:

./ Timing -i Indiv_ID -sif Patient.sif


LeagueModel requires an input of comparison tables. The comparison tables should be the output of SinglePatientTiming ending in ".comp.tsv"

To run LeagueModel:

./ LeagueModel -cohort Cohort -comps comp1 comp2 ... compN

Alternatively, one can use a single aggregated table. The table should have the following columns:

sample  event1  event2  p_event1_win    p_event2_win    unknown

To run with the aggregated table:

./ LeagueModel -cohort Cohort -comparison_cn comps


A simulation module is provided for convenience.

./ PhylogicSim --help

Command to visualize all the options and help.

./ PhylogicSim 

Run the simulation with the default paramters.

./ PhylogicSim -i MySimulation

Specify a prefix for all the output files

./ PhylogicSim -i MySimulation -ns 7

Specify the number of samples you want to simulate.

./ PhylogicSim -i MySimulation -nodes 5

Specify the number of distinct clones present in your samples. Minimum 2 (The first clone is always the clonal clone)

./ PhylogicSim -i MySimulation -nodes 5 -seg /Example_SegFile.txt

Specify a segment file with copy number values to sample from. See the "Example_SegFile.txt" for a format example. If no file is specified, a build-in CN profile is used, based on the hg19 contigs.

./ PhylogicSim -i MySimulation -nodes 5 -clust_file /Example_Clust_File.txt

Force the ccf values of each cluster on each sample, instead of generating a new random phylogeny from scratch. If -clust_file is specified, the -ns and -nodes flags are ignored an instead replaced with the values from the Clust_File. Each line of the tsv file represents a sample, with each tab separated value the ccf of a cluster. The last value of each line must always be -1 to account for the artifact cluster.

./ PhylogicSim -i MySimulation -nodes 5 -clust_file /Example_Clust_File.txt -a 0.3

Specify the proportion of mutations that are artifactual (Random af unrelated to mutation/CN). Can be combined with a clust_file.

./ PhylogicSim -i MySimulation -nodes 5 -clust_file /Example_Clust_File.txt -pfile /Example_PurityFile.txt

TSV file to specify the purity of each sample individualy (Otherwise, the purity is specified for all the samples using the -p flag.). Each line represents a sample. The file can optionally contain an extra three columns with the alpha, beta and N values for the coverage betabinomial for each sample (Otherwise, those values are set for all samples using the -ap, -b and -nb flags respectively).