# Experiment 1a
### Cedric Chauve, 11/12/2018

## Introduction

In this experiment (script *exp1a.sh*) we counted the number of histories for the following data:
- species tree size (number of leaves) from 3 to 32,
- for each species tree size, we considered 100 trees 
    - the first one (index 0) is the caterpillar,
    - if k is a power of 2 the second tree (index 1) is the complete binary tree,
    - the remaining trees are random,
- the history size (number of leaves) ranges from 1 to 50,
- for each species tree we we considered 25 random rankings.

We record the results for species trees of a given size *k* in the file *results/exp1a_k*. Each non-comment row of the result file has the following tab-separated format:
- species tree size
- species tree index
- ranking type (U for unranked, R for ranked)
- if unranked, newick string describing the tree, otherwie ranking of internal nodes
- number of histories separated by spaces.

For each configuration, we count the number of histories in a model with only DL histories or also DLT histories.

In [37]:
import csv
import pandas as pd
import numpy as np

In [27]:
# Parameters

# Species tree
S_SIZE_MIN = 3
S_SIZE_MAX = 4
S_SIZES    = [i for i in range(S_SIZE_MIN,S_SIZE_MAX+1)]

# Number of species trees
NB_S_TREES    = 100
S_TREES_INDEX = [i for i in range(0,NB_S_TREES)]

# History size
H_SIZE_MIN = 1
H_SIZE_MAX = 50
H_SIZES    = [i for i in range(H_SIZE_MIN,H_SIZE_MAX+1)]

# Number of rankings
NB_RANKINGS    = 25
RANKINGS_INDEX = [i for i in range(0,NB_RANKINGS)]

# Evolutionary models
EVOL_MODELS = [('U','DL'),('U','DLT')]

In [32]:
# Reading results
RESULTS = {x:{s:{t:[] for t in S_TREES_INDEX}  for s in S_SIZES} for x in EVOL_MODELS}

# Format: RESULTS[evol_model][s][tree_index][n] is 
# the number of histories of size n for tree tree_index of size s in model evol_model

for s in S_SIZES:
    with open('../results/exp1a_'+str(s), 'r') as f:
        reader = csv.reader(f,delimiter='\t')
        for row in reader:
            if row[0][0]!='#':
                model = (row[2],row[3])
                t_ind = int(row[1])
                row5  = row[5].split()
                RESULTS[model][s][t_ind] = {i:int(row5[i-1]) for i in H_SIZES}

In [68]:
# Analyse 1: average, standard deviation, min and max for the number of histories per model for a given spcies tree size
EVOL_MODELS_U = [('U','DL'),('U','DLT')]
AN_NBHISTORIES = {x:{n:{}  for n in H_SIZES} for x in EVOL_MODELS_U}
s = S_SIZE_MAX

for model in EVOL_MODELS_U:
    for n in H_SIZES:
        data =  np.array([X[0][n] for X in [RESULTS[model][s][t_ind] for t_ind in S_TREES_INDEX]])
        #AN_NBHISTORIES[model][n] = {'avg':np.mean(data), 'std':np.std(data), 'min':np.min(data), 'max':np.max(data), 'argmin':np.argmin(data), 'argmax':np.argmax(data)}
        print(len(data))

100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100


In [67]:
AN_NBHISTORIES[('U', 'DL')]

{1: {'avg': 4.0, 'std': 0.0, 'min': 4, 'max': 4, 'argmin': 0, 'argmax': 0},
 2: {'avg': 37.9,
  'std': 2.071231517720798,
  'min': 34,
  'max': 39,
  'argmin': 1,
  'argmax': 0},
 3: {'avg': 467.06,
  'std': 52.60928055010827,
  'min': 368,
  'max': 495,
  'argmin': 1,
  'argmax': 0},
 4: {'avg': 6674.0,
  'std': 1056.328074037607,
  'min': 4685,
  'max': 7235,
  'argmin': 1,
  'argmax': 0},
 5: {'avg': 104547.86,
  'std': 20251.25904136333,
  'min': 66416,
  'max': 115303,
  'argmin': 1,
  'argmax': 0},
 6: {'avg': 1742975.94,
  'std': 387536.94463054277,
  'min': 1013268,
  'max': 1948791,
  'argmin': 1,
  'argmax': 0},
 7: {'avg': 30397567.26,
  'std': 7497740.862445384,
  'min': 16279788,
  'max': 34379505,
  'argmin': 1,
  'argmax': 0},
 8: {'avg': 548564460.78,
  'std': 147094533.92890534,
  'min': 271594611,
  'max': 626684162,
  'argmin': 1,
  'argmax': 0},
 9: {'avg': 10168580504.54,
  'std': 2925102714.572874,
  'min': 4660794200,
  'max': 11722058693,
  'argmin': 1,
  'argma