# Experiment 1a
### Cedric Chauve, 11/12/2018

## Introduction

In this experiment (script *exp1a.sh*) we counted the number of histories for the following data:
- species tree size (number of leaves) from 3 to 32,
- for each species tree size, we considered 100 trees 
    - the first one (index 0) is the caterpillar,
    - if k is a power of 2 the second tree (index 1) is the complete binary tree,
    - the remaining trees are random,
- the history size (number of leaves) ranges from 1 to 50,
- for each species tree we we considered 25 random rankings.

We record the results for species trees of a given size *k* in the file *results/exp1a_k*. Each non-comment row of the result file has the following tab-separated format:
- species tree size
- species tree index
- ranking type (U for unranked, R for ranked)
- if unranked, newick string describing the tree, otherwie ranking of internal nodes
- number of histories separated by spaces.

For each configuration, we count the number of histories in a model with only DL histories or also DLT histories.

In [37]:
import csv
import pandas as pd
import numpy as np

In [69]:
# Parameters

# Species tree
S_SIZE_MIN = 3
S_SIZE_MAX = 4
S_SIZES    = [i for i in range(S_SIZE_MIN,S_SIZE_MAX+1)]

# Number of species trees
NB_S_TREES    = 100
S_TREES_INDEX = [i for i in range(0,NB_S_TREES)]

# History size
H_SIZE_MIN = 1
H_SIZE_MAX = 50
H_SIZES    = [i for i in range(H_SIZE_MIN,H_SIZE_MAX+1)]

# Number of rankings
NB_RANKINGS    = 25
RANKINGS_INDEX = [i for i in range(0,NB_RANKINGS)]

# Evolutionary models
EVOL_MODELS = [('U','DL'),('U','DLT')]

In [89]:
# Format: RESULTS[evol_model][s][n][tree_index] is 
# the number of histories of size n for tree tree_index of size s in model evol_model

RESULTS = {x:{s:{n:{t:0 for t in S_TREES_INDEX} for n in H_SIZES}  for s in S_SIZES} for x in EVOL_MODELS}
for s in S_SIZES:
    with open('../results/exp1a_'+str(s), 'r') as f:
        reader = csv.reader(f,delimiter='\t')
        for row in reader:
            if row[0][0]!='#':
                model = (row[2],row[3])
                t_ind = int(row[1])
                row5  = row[5].split()
                for n in H_SIZES:
                    RESULTS[model][s][n][t_ind] = int(row5[n-1])
                    
RESULTS_frame = pd.DataFrame.from_dict({(m,s,n): RESULTS[m][s][n] 
                                        for m in RESULTS.keys() 
                                        for s in RESULTS[m].keys()
                                        for n in RESULTS[m][s].keys()},
                                        orient='index')

In [93]:
# Analyse 1: average, standard deviation, min and max for the number of histories per model for a given spcies tree size
STATS1 = {x:{s:{n:{} for n in H_SIZES}  for s in S_SIZES} for x in EVOL_MODELS}

for x in EVOL_MODELS:
    for s in S_SIZES:
        for n in H_SIZES:
            data =  np.array([RESULTS[x][s][n][t] for t in S_TREES_INDEX])
            STATS1[x][s][n] = {'avg':np.mean(data), 'std':np.std(data), 'min':np.min(data), 'max':np.max(data), 'argmin':np.argmin(data), 'argmax':np.argmax(data)}
            
STATS1_frame = pd.DataFrame.from_dict({(m,s,n): STATS1[m][s][n] 
                                        for m in STATS1.keys() 
                                        for s in STATS1[m].keys()
                                        for n in STATS1[m][s].keys()},
                                        orient='index')            

In [94]:
STATS1_frame

Unnamed: 0,Unnamed: 1,Unnamed: 2,avg,std,min,max,argmin,argmax
"(U, DL)",3,1,3.000000e+00,0.000000e+00,3,3,0,0
"(U, DL)",3,2,1.900000e+01,0.000000e+00,19,19,0,0
"(U, DL)",3,3,1.590000e+02,0.000000e+00,159,159,0,0
"(U, DL)",3,4,1.565000e+03,0.000000e+00,1565,1565,0,0
"(U, DL)",3,5,1.702200e+04,0.000000e+00,17022,17022,0,0
"(U, DL)",3,6,1.979280e+05,0.000000e+00,197928,197928,0,0
"(U, DL)",3,7,2.413494e+06,0.000000e+00,2413494,2413494,0,0
"(U, DL)",3,8,3.049009e+07,0.000000e+00,30490089,30490089,0,0
"(U, DL)",3,9,3.958281e+08,0.000000e+00,395828145,395828145,0,0
"(U, DL)",3,10,5.250494e+09,0.000000e+00,5250493688,5250493688,0,0
