# Intialize RaXML with custom tree

Mamie Wang 2020/06/12

The purpose of this notebook is to modify the RaXML code so that it takes input from a user-provided tree. We are interested in testing the hypothesis whether RaXML initialized with STR method performs better than the RaXML without initialization.

From RaXML manual, there is an option `-t` to specify a user starting tree (https://cme.h-its.org/exelixis/resource/download/NewManual.pdf). 

We first want to make sure that RaXML runs with given input. We will
- Simulate a binary tree with 128 leaves 
- Reconstruct the tree using spectral deep method with NJ
- Use the reconstructed tree as initial tree for the RaXML method

In [1]:
import sys, os

sys.path.append(os.path.join(os.path.dirname(sys.path[0]),'../spectral-tree-inference/spectraltree'))

import numpy as np
import utils
import generation
import reconstruct_tree
import dendropy
import scipy
import time
from itertools import product
import matplotlib.pyplot as plt

from dendropy.model.discrete import simulate_discrete_chars, Jc69, Hky85
from dendropy.calculate.treecompare import symmetric_difference
import character_matrix

In [2]:
import os, sys

# https://stackoverflow.com/questions/8391411/suppress-calls-to-print-python
class HiddenPrints:
    def __enter__(self):
        self._original_stdout = sys.stdout
        sys.stdout = open(os.devnull, 'w')

    def __exit__(self, exc_type, exc_val, exc_tb):
        sys.stdout.close()
        sys.stdout = self._original_stdout

## Simulate sequences from a perfect binary tree

In [9]:
m = 512
threshold = 128
n = 50
binary_tree = utils.balanced_binary(m, edge_length = 0.5)
data_HKY = simulate_discrete_chars(100, binary_tree, Hky85(kappa = 1), mutation_rate=0.1)
ch_list = list()
for t in data_HKY.taxon_namespace:
    ch_list.append([x.symbol for x in data_HKY[t]])
ch_arr = np.array(ch_list)

In [10]:
spectral_method = reconstruct_tree.SpectralTreeReconstruction(reconstruct_tree.NeighborJoining, reconstruct_tree.HKY_similarity_matrix)
with HiddenPrints():
    start_time = time.time()
    tree_rec = spectral_method.deep_spectral_tree_reonstruction(ch_arr, reconstruct_tree.HKY_similarity_matrix_missing_data, 
                                                            taxon_namespace = binary_tree.taxon_namespace, 
                                                            threshhold = threshold, min_split = 5)
    runtime = time.time() - start_time

RF,F1 = reconstruct_tree.compare_trees(tree_rec, binary_tree)

In [11]:
print("--- %s seconds ---" % runtime)
print("RF = ",RF)
print("F1% = ",F1) 

--- 22.228666067123413 seconds ---
RF =  134
F1% =  93.44422700587083


Save the recreated tree as a Newick file. 

In [12]:
tree_rec.write(path="/gpfs/ysm/project/kleinstein/mw957/repos/spec_tree/data/binary_128_STR_NJ.tre", schema="newick")

## Run RaXML without initialized tree

In [13]:
raxml_HKY = reconstruct_tree.RAxML()
start_time = time.time()
tree_rec = raxml_HKY(data_HKY, raxml_args="-T 2 --HKY85 -c 1")  
runtime = time.time() - start_time
RF,F1 = reconstruct_tree.compare_trees(tree_rec, binary_tree)

print("--- %s seconds ---" % runtime)
print("RF = ",RF)
print("F1% = ",F1) 

--- 112.97217106819153 seconds ---
RF =  44
F1% =  97.84735812133071


In [14]:
raxml_HKY = reconstruct_tree.RAxML()
start_time = time.time()
tree_rec = raxml_HKY(data_HKY, raxml_args="-T 2 --HKY85 -c 1 -t /gpfs/ysm/project/kleinstein/mw957/repos/spec_tree/data/binary_128_STR_NJ.tre")  
runtime = time.time() - start_time
RF,F1 = reconstruct_tree.compare_trees(tree_rec, binary_tree)

print("--- %s seconds ---" % runtime)
print("RF = ",RF)
print("F1% = ",F1) 

--- 111.95083498954773 seconds ---
RF =  38
F1% =  98.14090019569471
