# Species delimitation in Malagasy Canarium using iBPP

This notebook is an empirical application of ibpp for species delimitation using GBS data assembled in ipyrad. We use the ipyrad utility function to `loci2bpp` to programatticaly setup a range of tests and to deploy them in parallel. 

### Information about this notebook
This is a jupyter notebook. All code in this notebook is Python. You should be able to download and execute this notebook and reproduce all of our results. This notebook along with other notebooks and data files are hosted on github: https://github.com/sarahfederman/Canarium-GBS/

### Import Python libraries

In [18]:
import ipyrad as ip
import ipyparallel as ipp
import pandas as pd
import random
import socket
import ete3
import sys
import os

## print versions
print "ipyrad v.{}".format(ip.__version__)

ipyrad v.0.5.13


### Create a directory to store results files in

In [19]:
WDIR = "./analysis_bpp"
if not os.path.exists(WDIR):
    os.mkdir(WDIR)

### Setup an ipyparallel cluster connection

In [38]:
## open a view to the client
ipyclient = ipp.Client()
lbview = ipyclient.load_balanced_view()

## confirm we are connected to 4 8-core nodes
hosts = ipyclient[:].apply_sync(socket.gethostname)
for hostname in set(hosts):
    print("host compute node: [{} cores] on {}"\
          .format(hosts.count(hostname), hostname))
    
#ipyclient.close()

host compute node: [20 cores] on c13n12.farnam.hpc.yale.internal


### The input data

In [39]:
## downoad .loci file from (replace dropbox link with zenodo link) and save path
#! curl -LkO https://dl.dropboxusercontent.com/u/2538935/CanEnd_min20.loci
LOCI = "./CanEnd_min20.loci"

In [40]:
## make a mapping dictionary grouping samples into 'species'
IMAP6 = {
    "A": ['SF172', 'SF175', 'SF328', 'SF200', 'SF209', 'D14528', 'SF276', 'SF286', 'D13052'],
    "B": ['D13101', 'D13103', 'D14482', 'D14483'],
    "C": ['D14504', 'D14505', 'D14506'],
    "D": ['D14477', 'D14478', 'D14480', 'D14485', 'D14501', 'D14513'], 
    "E": ['D13090', 'D12950'],
    "F": ['D13097', 'SF155', 'D13063', 'D12963', 'SF160', 'SF327',
          'SF224', 'SF228', '5573', 'SF153', 'SF164', 'D13075', 'SF197'], 
    }


## make a dictionary with min values to filter loci to those with N samples per species.
MINMAP6 = {
    "A": 8, 
    "B": 4, 
    "C": 3,
    "D": 5, 
    "E": 2, 
    "F": 8,
}


## Species tree hypothesis ('guide tree') based on raxml & bucky results
TREE6 = "((((D,C),B),(E,F)),A);"
print ete3.Tree(TREE6)


            /-D
         /-|
      /-|   \-C
     |  |
   /-|   \-B
  |  |
  |  |   /-E
--|   \-|
  |      \-F
  |
   \-A


### Make a function to call bpp/ibpp
We will submit a large range of jobs to our parallel cluster. First we will infer a species tree with bpp, and then we will add traits and test delimitation hypotheses with ibpp. To track the progress of all of the parallel processes we will store info about them (their async objects) in a dictionary called results. 

In [42]:
## a function to call i/bpp
def bpp(ctlfile):
    """ 
    This assumes you installed bpp & ibpp in ~/local/bin/ following the 
    installation instructions in the ipyrad bpp tutorial. 
    """
    import subprocess
    import os
    if ".ibpp" in ctlfile:
        cmd = [os.path.expanduser("~/local/bin/ibpp"), ctlfile]
    else:
        cmd = [os.path.expanduser("~/local/bin/bpp"), ctlfile]
    subprocess.check_output(cmd)
    

### Infer species tree 
We want to infer a species tree (infer_sptree=1; infer_delimit=0), however, to ensure adequate mixing of our mcmc analysis we'll run the analysis from 2 different starting trees, and from two different values for the prior theta. We also repeat each test starting from three different random seeds. 

In [43]:
TREES = ["((((D,C),B),(E,F)),A);", 
         "((((D,B),C),(E,F)),A);", 
         "((((B,C),D),(E,F)),A);"]

## tree search for the best species tree. 
## Iterate over different starting trees.
## Repeat x3 reps.
ctls = []
for tidx, tree in enumerate(TREES):
    for theta in [200, 2000]:
        for rep in range(3):
            ## build input files
            name = "tree-{}-theta-{}-tau-2000-rep-{}".format(tidx, theta, rep)
            ctl = ip.file_conversion.loci2bpp(name, 
                                              locifile=LOCI,
                                              imap=IMAP6,      
                                              minmap=MINMAP6,
                                              guidetree=tree,
                                              wdir=WDIR,
                                              infer_sptree=1,
                                              infer_delimit=0,
                                              maxloci=10000,
                                              nsample=100000,
                                              burnin=10000,
                                              sampfreq=2,
                                              thetaprior=(2, theta),
                                              tauprior=(2, 2000, 1),
                                              seed=random.randint(1,1e6),
                                              finetune=(300.0, 0.0002, 0.0001, 
                                                        0.0001, 0.2, 1e-05, 0.1, 0.1),
                                              )
            ## store the ctl filename
            ctls.append(ctl)
        

new files created (1088 loci, 6 species, 37 samples)
  tree-0-theta-200-tau-2000-rep-0.bpp.seq.txt
  tree-0-theta-200-tau-2000-rep-0.bpp.imap.txt
  tree-0-theta-200-tau-2000-rep-0.bpp.ctl.txt
new files created (1088 loci, 6 species, 37 samples)
  tree-0-theta-200-tau-2000-rep-1.bpp.seq.txt
  tree-0-theta-200-tau-2000-rep-1.bpp.imap.txt
  tree-0-theta-200-tau-2000-rep-1.bpp.ctl.txt
new files created (1088 loci, 6 species, 37 samples)
  tree-0-theta-200-tau-2000-rep-2.bpp.seq.txt
  tree-0-theta-200-tau-2000-rep-2.bpp.imap.txt
  tree-0-theta-200-tau-2000-rep-2.bpp.ctl.txt
new files created (1088 loci, 6 species, 37 samples)
  tree-0-theta-2000-tau-2000-rep-0.bpp.seq.txt
  tree-0-theta-2000-tau-2000-rep-0.bpp.imap.txt
  tree-0-theta-2000-tau-2000-rep-0.bpp.ctl.txt
new files created (1088 loci, 6 species, 37 samples)
  tree-0-theta-2000-tau-2000-rep-1.bpp.seq.txt
  tree-0-theta-2000-tau-2000-rep-1.bpp.imap.txt
  tree-0-theta-2000-tau-2000-rep-1.bpp.ctl.txt
new files created (1088 loci, 6 sp

In [44]:
## a dictionary to store results
tree_asyncs = {}

## submit jobs to the cluster
for job in ctls:
    tree_asyncs[job] = lbview.apply(bpp, job)
    sys.stderr.write("job submitted [{}]\n".format(job))

job submitted [/ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/tree-0-theta-200-tau-2000-rep-0.bpp.ctl.txt]
job submitted [/ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/tree-0-theta-200-tau-2000-rep-1.bpp.ctl.txt]
job submitted [/ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/tree-0-theta-200-tau-2000-rep-2.bpp.ctl.txt]
job submitted [/ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/tree-0-theta-2000-tau-2000-rep-0.bpp.ctl.txt]
job submitted [/ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/tree-0-theta-2000-tau-2000-rep-1.bpp.ctl.txt]
job submitted [/ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/tree-0-theta-2000-tau-2000-rep-2.bpp.ctl.txt]
job submitted [/ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/tree-1-theta-200-tau-2000-rep-0.bpp.ctl.txt]
job submitted [/ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/tree-1-theta-200-tau-2000-rep-1.bpp.ctl.txt]
job submitted [/ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/tree-1-theta-200-tau-2000-rep-2.bpp.ctl.txt]
job submitted [/ysm-gpfs/home/de24

### Track progress of jobs

In [51]:
## check whether each has finished or failed
for jid, job in enumerate(dict(tree_asyncs.items())):
    ## get shorter name for job
    jobname = job.split("/")[-1]
    
    ## print done or not
    if alljobs[job].ready():
        if alljobs[job].successful():
            print "{:<3}{:<30} -- finished".format(jid, jobname)
        else:
            print "{:<3}{:<30} -- failed:".format(jid, alljobs[job].exception())
    else:
        print "{:<3}{:<30} -- still running".format(jid, jobname)

0  tree-1-theta-2000-tau-2000-rep-1.bpp.ctl.txt -- finished
1  tree-1-theta-2000-tau-2000-rep-0.bpp.ctl.txt -- finished
2  tree-2-theta-200-tau-2000-rep-2.bpp.ctl.txt -- finished
3  tree-0-theta-200-tau-2000-rep-0.bpp.ctl.txt -- finished
4  tree-2-theta-200-tau-2000-rep-0.bpp.ctl.txt -- finished
5  tree-1-theta-200-tau-2000-rep-0.bpp.ctl.txt -- finished
6  tree-1-theta-200-tau-2000-rep-2.bpp.ctl.txt -- finished
7  tree-2-theta-2000-tau-2000-rep-2.bpp.ctl.txt -- finished
8  tree-0-theta-200-tau-2000-rep-2.bpp.ctl.txt -- finished
9  tree-2-theta-200-tau-2000-rep-1.bpp.ctl.txt -- finished
10 tree-1-theta-2000-tau-2000-rep-2.bpp.ctl.txt -- finished
11 tree-2-theta-2000-tau-2000-rep-0.bpp.ctl.txt -- finished
12 tree-0-theta-200-tau-2000-rep-1.bpp.ctl.txt -- finished
13 tree-1-theta-200-tau-2000-rep-1.bpp.ctl.txt -- finished
14 tree-2-theta-2000-tau-2000-rep-1.bpp.ctl.txt -- finished
15 tree-0-theta-2000-tau-2000-rep-0.bpp.ctl.txt -- finished
16 tree-0-theta-2000-tau-2000-rep-1.bpp.ctl.txt -

### Summarize results

In [66]:
mcmc = tree_asyncs.keys()
mcmc = [i.replace(".ctl.txt", ".out.txt") for i in mcmc]



/usr/bin/sh: mcmc[0]: command not found
^C


## Infer species delimitation
For species delimitation we'll start with our largest hypothesis based on clustering analysis, which is six species. This will test whether nodes on the six taxon tree should be collapsed into fewer. We use the topology supported in the species tree analysis above as our fixed species tree. We will run this analysis both with and without traits included, and test it over a range of starting values, and prior values. We performed initial runs to find good 'finetuning' parameters to maximize efficiency of the chain mixing. 

In [69]:
## Trait data (csv) from (https://zenodo.../CanEnd_trait2.csv")
TRAITS = pd.read_csv("./CanEnd_traits.csv", na_values="", index_col=0)
TRAITS.head(10)

## select a subset of traits for this test
subt = TRAITS[['acumen_length', 'basal_L_widest_point', 'lateral_lft_W',
               'basal_petiolule','stip_scar_length', 'X2o_vein_pairs']]
subt

Unnamed: 0_level_0,acumen_length,basal_L_widest_point,lateral_lft_W,basal_petiolule,stip_scar_length,X2o_vein_pairs
Indiv,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
SF175,5.82,29.32,36.79,8.98,2.3,10.67
SF328,7.57,26.59,33.04,5.4,1.39,10.5
SF200,1.59,23.89,27.35,4.63,1.78,10.0
SF209,2.99,19.32,30.8,4.94,2.04,10.67
D14528,7.5,17.39,33.91,9.12,2.99,10.33
SF276,6.54,26.16,37.79,8.35,2.64,8.0
SF286,6.49,25.38,52.27,11.73,2.75,8.5
D14504,7.64,17.71,48.74,5.32,2.82,16.67
D14505,11.66,29.61,54.47,5.67,3.0,17.5
D14506,14.81,29.22,70.34,13.51,3.5,20.33


In [71]:
## this is to show an example ctl file (our actual runs are below)
## This and variants of it were used to find optimal finetune params.
ctl1 = ip.file_conversion.loci2bpp("ibpp-test", 
                                   locifile=LOCI, 
                                   traits_df=subt,
                                   imap=IMAP6, 
                                   guidetree=TREE6, 
                                   minmap=MINMAP6,
                                   wdir=WDIR,
                                   infer_sptree=0,
                                   infer_delimit=1,
                                   delimit_alg=(1, 2, 1),
                                   maxloci=300,  
                                   nsample=1000,
                                   burnin=100,
                                   sampfreq=2,
                                   thetaprior=(2, 2000),
                                   tauprior=(2, 2000, 1),
                                   finetune=(300.0, 0.0002, 0.0001, 0.0001, 0.2, 0.0001, 0.1, 0.1),
                                   seed=random.randint(1,1e6),
                                   verbose=1,
                                   )

ctl file
--------
seed = 502852
seqfile = /ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/ibpp-test.ibpp.seq.txt
Imapfile = /ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/ibpp-test.ibpp.imap.txt
mcmcfile = /ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/ibpp-test.ibpp.mcmc.txt
outfile = /ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/ibpp-test.ibpp.out.txt
traitfile = /ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/ibpp-test.ibpp.traits.txt
nloci = 300
cleandata = 0
speciesdelimitation = 1 1 2 1
ntraits = 6
nindT = 36
usetraitdata = 1
useseqdata = 1
nu0 = 0
kappa0 = 0
species&tree = 6 A B C D E F
                 9 4 3 6 2 13
                 ((((D,C),B),(E,F)),A);
thetaprior = 2 2000
tauprior = 2 2000 1
finetune = 1: 300.0 0.0002 0.0001 0.0001 0.2 0.0001 0.1 0.1
print = 1 0 0 0
burnin = 100
sampfreq = 2
nsample = 1000
--------

new files created (300 loci, 6 species, 37 samples)
  ibpp-test.ibpp.seq.txt
  ibpp-test.ibpp.imap.txt
  ibpp-test.ibpp.ctl.txt
  ibpp-test.ibpp.traits.txt


### Set up many additional species delimitation tests
We are interested in both how well the sequence data and the trait data can delimit species in Canarium. We will setup a range of tests to look at different settings for the priors, for different species delimitation algorithms, and for different types of data. We will start with a six taxon tree and allow the species delimitation algorithm to collapes nodes on the tree to test hypotheses of 1-6 species. 

In [72]:
## set up a couple tests to perform over different delimitation algorithms
DELIMIT_TESTS = [
    (0, 2),
    (0, 5),
    (1, 1.0, 2.0),
    (1, 2.0, 1.0), 
]

In [73]:
## iterate over combinations for a total of 42 tests. 
## (1) 4 delimitation algorithm combinations (DELIMIT_TESTS)
## (2) 0/1 with or without traits 
## (3) 2 independent replicates from different random seeds

ctls = []
for tdx, delim in enumerate(DELIMIT_TESTS):
    for usetraits in [0, 1]:
        for rep in range(2):
        
            ## make a name for this test
            rname = "delim-alg-{}-tr-{}-rep-{}".format(tdx, usetraits, rep)
        
            ## make input files and get ctl path
            ctl = ip.file_conversion.loci2bpp(rname, LOCI, IMAP6, TREE6, 
                                          wdir=WDIR,
                                          traits_df=TRAITS,
                                          minmap=MINMAP6, 
                                          infer_delimit=1,
                                          infer_sptree=0,
                                          delimit_alg=delim,
                                          maxloci=10000,  
                                          nsample=200000,
                                          burnin=20000,
                                          sampfreq=2,
                                          thetaprior=(2, 2000),
                                          tauprior=(2, 2000, 1),
                                          usetraitdata=usetraits,
                                          seed=random.randint(0, 1e9),
                                          finetune=(300.0, 0.0002, 0.0001, 0.0001, 0.2, 0.0001, 0.1, 0.1),
                                          )
            ## store ctl finenames
            ctls.append(ctl)

new files created (1088 loci, 6 species, 37 samples)
  delim-alg-0-tr-0-rep-0.ibpp.seq.txt
  delim-alg-0-tr-0-rep-0.ibpp.imap.txt
  delim-alg-0-tr-0-rep-0.ibpp.ctl.txt
  delim-alg-0-tr-0-rep-0.ibpp.traits.txt
new files created (1088 loci, 6 species, 37 samples)
  delim-alg-0-tr-0-rep-1.ibpp.seq.txt
  delim-alg-0-tr-0-rep-1.ibpp.imap.txt
  delim-alg-0-tr-0-rep-1.ibpp.ctl.txt
  delim-alg-0-tr-0-rep-1.ibpp.traits.txt
new files created (1088 loci, 6 species, 37 samples)
  delim-alg-0-tr-1-rep-0.ibpp.seq.txt
  delim-alg-0-tr-1-rep-0.ibpp.imap.txt
  delim-alg-0-tr-1-rep-0.ibpp.ctl.txt
  delim-alg-0-tr-1-rep-0.ibpp.traits.txt
new files created (1088 loci, 6 species, 37 samples)
  delim-alg-0-tr-1-rep-1.ibpp.seq.txt
  delim-alg-0-tr-1-rep-1.ibpp.imap.txt
  delim-alg-0-tr-1-rep-1.ibpp.ctl.txt
  delim-alg-0-tr-1-rep-1.ibpp.traits.txt
new files created (1088 loci, 6 species, 37 samples)
  delim-alg-1-tr-0-rep-0.ibpp.seq.txt
  delim-alg-1-tr-0-rep-0.ibpp.imap.txt
  delim-alg-1-tr-0-rep-0.ibpp.ctl.

In [74]:
## store async results
delim_asyncs = {}

## send jobs to run on cluster
for job in ctls:
    delim_asyncs[job] = lbview.apply(bpp, job)
    sys.stderr.write("job submitted [{}]\n".format(job))

job submitted [/ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/delim-alg-0-tr-0-rep-0.ibpp.ctl.txt]
job submitted [/ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/delim-alg-0-tr-0-rep-1.ibpp.ctl.txt]
job submitted [/ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/delim-alg-0-tr-1-rep-0.ibpp.ctl.txt]
job submitted [/ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/delim-alg-0-tr-1-rep-1.ibpp.ctl.txt]
job submitted [/ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/delim-alg-1-tr-0-rep-0.ibpp.ctl.txt]
job submitted [/ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/delim-alg-1-tr-0-rep-1.ibpp.ctl.txt]
job submitted [/ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/delim-alg-1-tr-1-rep-0.ibpp.ctl.txt]
job submitted [/ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/delim-alg-1-tr-1-rep-1.ibpp.ctl.txt]
job submitted [/ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/delim-alg-2-tr-0-rep-0.ibpp.ctl.txt]
job submitted [/ysm-gpfs/home/de243/Canarium-GBS/analysis_bpp/delim-alg-2-tr-0-rep-1.ibpp.ctl.txt]
job submit

### Track progress

In [75]:
## check success/failure of all jobs
alljobs = dict(delim_asyncs.items())

## check whether each has finished or failed
for jid, job in enumerate(sorted(alljobs)):
    ## get shorter name for job
    jobname = job.split("/")[-1]
    
    ## print done or not
    if alljobs[job].ready():
        if alljobs[job].successful():
            print "{:<3}{:<30} -- finished".format(jid, jobname)
        else:
            print "{:<3}{:<30} -- failed:".format(jid, alljobs[job].exception())
    else:
        print "{:<3}{:<30} -- still running".format(jid, jobname)

0  delim-alg-0-tr-0-rep-0.ibpp.ctl.txt -- still running
1  delim-alg-0-tr-0-rep-1.ibpp.ctl.txt -- still running
2  delim-alg-0-tr-1-rep-0.ibpp.ctl.txt -- still running
3  delim-alg-0-tr-1-rep-1.ibpp.ctl.txt -- still running
4  delim-alg-1-tr-0-rep-0.ibpp.ctl.txt -- still running
5  delim-alg-1-tr-0-rep-1.ibpp.ctl.txt -- still running
6  delim-alg-1-tr-1-rep-0.ibpp.ctl.txt -- still running
7  delim-alg-1-tr-1-rep-1.ibpp.ctl.txt -- still running
8  delim-alg-2-tr-0-rep-0.ibpp.ctl.txt -- still running
9  delim-alg-2-tr-0-rep-1.ibpp.ctl.txt -- still running
10 delim-alg-2-tr-1-rep-0.ibpp.ctl.txt -- still running
11 delim-alg-2-tr-1-rep-1.ibpp.ctl.txt -- still running
12 delim-alg-3-tr-0-rep-0.ibpp.ctl.txt -- still running
13 delim-alg-3-tr-0-rep-1.ibpp.ctl.txt -- still running
14 delim-alg-3-tr-1-rep-0.ibpp.ctl.txt -- still running
15 delim-alg-3-tr-1-rep-1.ibpp.ctl.txt -- still running


In [33]:
ll -thr analysis_bpp/*.mcmc.txt

-rw-r--r-- 1 de243 0 Dec  5 02:40 analysis_bpp/tree-0-theta-200-rep-1.bpp.mcmc.txt
-rw-r--r-- 1 de243 0 Dec  5 02:40 analysis_bpp/tree-0-theta-20-rep-1.bpp.mcmc.txt
-rw-r--r-- 1 de243 0 Dec  5 02:40 analysis_bpp/tree-1-theta-20-rep-0.bpp.mcmc.txt
-rw-r--r-- 1 de243 0 Dec  5 02:40 analysis_bpp/tree-1-theta-20-rep-2.bpp.mcmc.txt
-rw-r--r-- 1 de243 0 Dec  5 02:40 analysis_bpp/tree-1-theta-20-rep-1.bpp.mcmc.txt
-rw-r--r-- 1 de243 0 Dec  5 02:40 analysis_bpp/tree-0-theta-200-rep-2.bpp.mcmc.txt
-rw-r--r-- 1 de243 0 Dec  5 02:40 analysis_bpp/tree-2-theta-20-rep-0.bpp.mcmc.txt
-rw-r--r-- 1 de243 0 Dec  5 02:40 analysis_bpp/tree-2-theta-20-rep-2.bpp.mcmc.txt
-rw-r--r-- 1 de243 0 Dec  5 02:40 analysis_bpp/tree-1-theta-200-rep-0.bpp.mcmc.txt
-rw-r--r-- 1 de243 0 Dec  5 02:40 analysis_bpp/tree-2-theta-200-rep-1.bpp.mcmc.txt
-rw-r--r-- 1 de243 0 Dec  5 02:40 analysis_bpp/tree-1-theta-200-rep-2.bpp.mcmc.txt
-rw-r--r-- 1 de243 0 Dec  5 02:40 analysis_bpp/tree-2-theta-200-rep-0.bpp.mcmc.txt

### Parse results (out.txt) files

In [None]:
## Let's read in the '.bpp.out.txt' results files for each test
median_dict = {}
ess_dict = {}

for test, job in enumerate(sorted(asyncs)):
    ## replace .ctl.txt with .out.txt
    outname = job.replace(".ctl.", ".out.")
    
    ## parse theta and tau priors from the job name
    theta = job.split("-")[1:3]
    tau = job.split("-")[3:5]
    
    ## read the file and parse out results
    with open(outname, 'r') as infile:
        data = infile.readlines()
    
    ## b/c sptree and delimit were set to 0 all this test did was infer sptree params
    ## on the fixed tree. So let's compare the parameters under different priors
    for line in data:
        if "theta_1" and "theta_2" in line:
            index = ["theta mean", "tau mean"] + line.split()
            
        if "median" in line:
            data = [5./float(theta[1]), 1./float(tau[1])] + line.split()[1:]
            median_dict[test] = pd.Series(data=data, index=index)
            
        if "ESS*" in line:
            data = [5./float(theta[1]), 1./float(tau[1])] + line.split()[1:]
            ess_dict[test] = pd.Series(data=data, index=index)

## make results into a dataframe and print. It appears that the prior has a large effect on theta5 (AB)
medians = pd.DataFrame(data=median_dict)
ess = pd.DataFrame(data=ess_dict)

## look at median values
medians.T