# Compilation of test data for performance tests

This notebook records the parameters for Wright-Fisher simulations used to generate our test data sets, as well as commands for running inference algorithms on the test data and compiling the results. In this work we have considered **four** scenarios intended to explore different regimes of computational and evolutionary complexity:

1. small simple
2. medium simple
3. small complex
4. medium complex

Parameters for each of these scenarios are given below in **Section 1**.

We analyzed the resulting trajectories with **XX** different algorithms:

1. **Marginal Path Likelihood (MPL)** [[code](https://github.com/bartonlab/MPL)] [paper] 
2. **MPL without mutation**
3. **Single Locus (SL)** (MPL without covariance)
4. **SL without mutation**
5. CLEAR [[code](https://github.com/airanmehr/CLEAR)] [[paper](http://www.genetics.org/content/206/2/1011)]
6. EandR-timeseries [[code](https://github.com/terhorst/EandR-timeseries)] [[paper](http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1005069)]
7. poolSeq [[code](https://github.com/ThomasTaus/poolSeq)] [[paper](https://academic.oup.com/mbe/article/doi/10.1093/molbev/msx225/4086114)]
8. ...
9. ...
10. ...

Methods in **bold** are described in the present work. For other methods we have included a link to the code repository (if available) and to the corresponding paper. Scripts to run and compile output from each of these methods are collected in **Section 2**.

### Import required libraries

In [1]:
# Full library list and version numbers

print('This notebook was prepared using:')

import sys
print('python version %s' % sys.version)

import numpy as np
print('numpy version %s' % np.__version__)

import pandas as pd
print('pandas version %s' % pd.__version__)

import sklearn as sk
from sklearn.metrics import roc_auc_score
print('scikit-learn version %s' % sk.__version__)

This notebook was prepared using:
python version 3.6.3 (default, Oct  4 2017, 06:09:15) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)]
numpy version 1.13.3
pandas version 0.21.0
scikit-learn version 0.19.1


# Section 1. Generation of test data through Wright-Fisher simulations

Wright-Fisher simulations are performed using `wfsim/Wright-Fisher.py`. The output of these simulations is saved for processing. The code below creates multiple job files for running many simulations in parallel on a computer cluster.

In [2]:
# GLOBAL VARIABLES

TESTS   = [   'small_simple',      'medium_simple',      'small_complex',    'medium_complex']
N_VALS  = dict(small_simple=  1000, medium_simple=  1000, small_complex=1000, medium_complex=1000)
L_VALS  = dict(small_simple=    10, medium_simple=    50, small_complex=  10, medium_complex=  50)
T0_VALS = dict(small_simple=     0, medium_simple=     0, small_complex=  10, medium_complex=  10)
T_VALS  = dict(small_simple=   150, medium_simple=  1000, small_complex=  70, medium_complex= 310)
MU_VALS = dict(small_simple=  5e-4, medium_simple=  1e-4, small_complex=5e-4, medium_complex=1e-4)
NB_VALS = dict(small_simple=     4, medium_simple=    10, small_complex=   4, medium_complex=  10)
ND_VALS = dict(small_simple=     4, medium_simple=    10, small_complex=   4, medium_complex=  10)
SB_VALS = dict(small_simple= 0.025, medium_simple= 0.025, small_complex= 0.1, medium_complex= 0.1)
SD_VALS = dict(small_simple=-0.025, medium_simple=-0.025, small_complex=-0.1, medium_complex=-0.1)

N_TRIALS     = 100   # number of independent trials to run for each test set
COMP_NS_VALS = [100] # number of sequence samples to collect per time point 
COMP_DT_VALS = [ 10] # time between sampling events (in discrete generations)

In [3]:
pbs_str = """#!/bin/bash\n#PBS -m abe\n#PBS -M jpbarton\n#PBS -k oe\n#PBS -j oe\n#PBS -l nodes=1:ppn=1\n"""

# SMALL SIMPLE, 100x runs of
# 
#     N   = 10^3         (population size)
#     T   = 10^4         (total number of generations to simulate)
#     mu  = 5 x 10^{-4}  (mutation rate)
#     L   = 10           (sequence length) 
#     n_b = 4            (number of beneficial mutations)
#     n_d = 4            (number of deleterious mutations)
#     s_b =  0.025       (selection coefficient for beneficial mutations)
#     s_d = -0.025       (selection coefficient for deleterious mutations)

test     = 'small_simple'
job_pars = {'-T'   : int(1.0e4),
            '-N'   : N_VALS[test],
            '-L'   : L_VALS[test],
            '--mu' : MU_VALS[test],
            '--nB' : NB_VALS[test],
            '--fB' : SB_VALS[test],
            '--nD' : ND_VALS[test],
            '--fD' : SD_VALS[test] }
job_sub = open('wfsim/jobs/run_wfsim_'+test+'.sh', 'w')
for t in range(N_TRIALS):
    trial_str = 'wfsim_'+test+'_%d' % t
    job_sub.write('qsub -q verylong %s > /dev/null\n' % ('wfsim/jobs/'+trial_str+'.pbs'))
    with open('wfsim/jobs/'+trial_str+'.pbs', 'w') as f:
        f.write(pbs_str)
        f.write('python wfsim/Wright-Fisher.py -o wfsim/data/%s ' % trial_str)
        f.write('%s\n' % (' '.join([k + ' ' + str(v) for k, v in job_pars.items()])))
job_sub.close()


# MEDIUM SIMPLE, 100x runs of
# 
#     N   = 10^3         (population size)
#     T   = 10^4         (total number of generations to simulate)
#     mu  = 1 x 10^{-4}  (mutation rate)
#     L   = 50           (sequence length) 
#     n_b = 10           (number of beneficial mutations)
#     n_d = 10           (number of deleterious mutations)
#     s_b =  0.025       (selection coefficient for beneficial mutations)
#     s_d = -0.025       (selection coefficient for deleterious mutations)

test     = 'medium_simple'
job_pars = {'-T'   : int(1.0e4),
            '-N'   : N_VALS[test],
            '-L'   : L_VALS[test],
            '--mu' : MU_VALS[test],
            '--nB' : NB_VALS[test],
            '--fB' : SB_VALS[test],
            '--nD' : ND_VALS[test],
            '--fD' : SD_VALS[test] }
job_sub = open('wfsim/jobs/run_wfsim_'+test+'.sh', 'w')
for t in range(N_TRIALS):
    trial_str = 'wfsim_'+test+'_%d' % t
    job_sub.write('qsub -q verylong %s > /dev/null\n' % ('wfsim/jobs/'+trial_str+'.pbs'))
    with open('wfsim/jobs/'+trial_str+'.pbs', 'w') as f:
        f.write(pbs_str)
        f.write('python wfsim/Wright-Fisher.py -o wfsim/data/%s ' % trial_str)
        f.write('%s\n' % (' '.join([k + ' ' + str(v) for k, v in job_pars.items()])))
job_sub.close()


# SMALL COMPLEX, 100x runs of
# 
#     N   = 10^3         (population size)
#     T   = 10^4         (total number of generations to simulate)
#     mu  = 5 x 10^{-4}  (mutation rate)
#     L   = 10           (sequence length) 
#     n_b = 4            (number of beneficial mutations)
#     n_d = 4            (number of deleterious mutations)
#     s_b =  0.100       (selection coefficient for beneficial mutations)
#     s_d = -0.100       (selection coefficient for deleterious mutations)
#
# For these simulations the starting population is evenly split between
# 3 collections of sequences with randomly chosen mutations (probability
# of mutation is 50% at each site independent of other sites, 
# see Wright-Fisher.py for details)

test     = 'small_complex'
job_pars = {'-T'   : int(1.0e4),
            '-N'   : N_VALS[test],
            '-L'   : L_VALS[test],
            '--mu' : MU_VALS[test],
            '--nB' : NB_VALS[test],
            '--fB' : SB_VALS[test],
            '--nD' : ND_VALS[test],
            '--fD' : SD_VALS[test],
             '--random' : 3 }
job_sub   = open('wfsim/jobs/run_wfsim_'+test+'.sh', 'w')
for t in range(N_TRIALS):
    trial_str = 'wfsim_'+test+'_%d' % t
    job_sub.write('qsub -q verylong %s > /dev/null\n' % ('wfsim/jobs/'+trial_str+'.pbs'))
    with open('wfsim/jobs/'+trial_str+'.pbs', 'w') as f:
        f.write(pbs_str)
        f.write('python wfsim/Wright-Fisher.py -o wfsim/data/%s ' % trial_str)
        f.write('%s\n' % (' '.join([k + ' ' + str(v) for k, v in job_pars.items()])))
job_sub.close()


# MEDIUM COMPLEX, 100x runs of
# 
#     N   = 10^3         (population size)
#     T   = 10^4         (total number of generations to simulate)
#     mu  = 1 x 10^{-4}  (mutation rate)
#     L   = 50           (sequence length) 
#     n_b = 10           (number of beneficial mutations)
#     n_d = 10           (number of deleterious mutations)
#     s_b =  0.100       (selection coefficient for beneficial mutations)
#     s_d = -0.100       (selection coefficient for deleterious mutations)
#
# For these simulations the starting population is evenly split between
# 5 collections of sequences with randomly chosen mutations (probability
# of mutation is 50% at each site independent of other sites, 
# see Wright-Fisher.py for details)

test     = 'medium_complex'
job_pars = {'-T'   : int(1.0e4),
            '-N'   : N_VALS[test],
            '-L'   : L_VALS[test],
            '--mu' : MU_VALS[test],
            '--nB' : NB_VALS[test],
            '--fB' : SB_VALS[test],
            '--nD' : ND_VALS[test],
            '--fD' : SD_VALS[test],
            '--random' : 5 }
job_sub   = open('wfsim/jobs/run_wfsim_'+test+'.sh', 'w')
for t in range(N_TRIALS):
    trial_str = 'wfsim_'+test+'_%d' % t
    job_sub.write('qsub -q verylong %s > /dev/null\n' % ('wfsim/jobs/'+trial_str+'.pbs'))
    with open('wfsim/jobs/'+trial_str+'.pbs', 'w') as f:
        f.write(pbs_str)
        f.write('python wfsim/Wright-Fisher.py -o wfsim/data/%s ' % trial_str)
        f.write('%s\n' % (' '.join([k + ' ' + str(v) for k, v in job_pars.items()])))
job_sub.close()

Once the Wright-Fisher trajectories have been generated, we subsample them to create our test trajectories using `wfsim/py2c.py`. For comparison between inference methods we chose to take 100 sequences per sample, with samples taken every 10 generations. The starting and ending generations of these test trajectories are

1. small simple -- start 0, end 150
2. medium simple -- start 0, end 1000
3. small complex -- start 10, end 70
4. medium complex -- start 10, end 310

The code below produces four shell scripts `expand_small_simple.sh`, `expand_medium_simple.sh`, `expand_small_complex.sh`, and `expand_medium_complex.sh`, which can be run to extract the trajectories from the compressed output of `wfsim/Wright-Fisher.py`.

In [4]:
# Extract sub-trajectories from full samples

for t in TESTS:
    job_sub = open('expand_%s.sh' % t, 'w')
    for ns in COMP_NS_VALS:
        for dt in COMP_DT_VALS:
            for i in range(N_TRIALS):
                job_sub.write('python3 wfsim/py2c.py -i wfsim/data/wfsim_%s_%d -t %d -T %d --ns %d --dt %d -s %d\n' 
                            % (t, i, T0_VALS[t], T_VALS[t], ns, dt, i))
    job_sub.close()

# Section 2. Running the inference algorithms and compiling output

### 1-4. MPL, MPL without mutation, SL, SL without mutation

First create the job files and run them.

In [5]:
pbs_str = """#!/bin/bash\n#PBS -m abe\n#PBS -M jpbarton\n#PBS -k oe\n#PBS -j oe\n#PBS -l nodes=1:ppn=1\n"""

ns_vals = [10, 20, 30, 40, 50,  80, 100, 1000]
dt_vals = [ 1,  5, 10, 20, 50, 100, 200,  250]

for t in TESTS:
    job_sub = open('MPL/jobs/run_wfinf_%s.sh' % t, 'w')
    job_sub.write('g++ MPL/src/main.cpp MPL/src/inf.cpp MPL/src/io.cpp -O3 -lgslcblas -lgsl -o mpl\n')
    for ns in ns_vals:
        for dt in dt_vals:
            trial_str = 'wfinf_%s_T%d_ns%d_dt%d' % (t, T_VALS[t], ns, dt)
            job_sub.write('qsub -q verylong %s > /dev/null\n' % ('MPL/jobs/'+trial_str+'.pbs'))
            with open('MPL/jobs/'+trial_str+'.pbs', 'w') as f:
                f.write(pbs_str)
                for i in range(N_TRIALS):
                    i_str = 'wfsim/data/wfsim_%s_%d_T%d_ns%d_dt%d' % (t, i, T_VALS[t], ns, dt)
                    o_str = 'MPL/out/%s_%d_T%d_ns%d_dt%d'          % (t, i, T_VALS[t], ns, dt)
                    f.write('python wfsim/py2c.py -i wfsim/data/wfsim_%s_%d -t %d -T %d --ns %d --dt %d -s %d\n' 
                            % (t, i, T0_VALS[t], T_VALS[t], ns, dt, i))
                    f.write('./mpl -i %s.dat -o %s_MPL.dat'            % (i_str, o_str))
                    f.write(' -g 1e3 -N %d -mu %.3e > /dev/null\n'     % (N_VALS[t], MU_VALS[t]))
                    f.write('./mpl -i %s.dat -o %s_MPL_noMu.dat'       % (i_str, o_str))
                    f.write(' -g 1e3 -N %d -mu 0 > /dev/null\n'        % (N_VALS[t]))
                    f.write('./mpl -i %s.dat -o %s_SL.dat'             % (i_str, o_str))
                    f.write(' -nc -g 1e3 -N %d -mu %.3e > /dev/null\n' % (N_VALS[t], MU_VALS[t]))
                    f.write('./mpl -i %s.dat -o %s_SL_noMu.dat'        % (i_str, o_str))
                    f.write(' -nc -g 1e3 -N %d -mu 0 > /dev/null\n'    % (N_VALS[t]))
                    f.write('rm %s.dat\n' % i_str)
    job_sub.close()

    methods = ['MPL', 'SL', 'MPL_noMu', 'SL_noMu']

    job_collect = open('MPL/jobs/run_wfinf_%s_collect.sh' % t, 'w')
    job_collect.write('python MPL/collect_s.py -i MPL/out/%s -n %d -T %d -t %d' % (t, N_TRIALS, T_VALS[t], T0_VALS[t]))
    for  m in methods: job_collect.write(  ' -m %s' %  m)
    for ns in ns_vals: job_collect.write(' --ns %d' % ns)
    for dt in dt_vals: job_collect.write(' --dt %d' % dt)
    job_collect.write(' &\n')
    job_collect.write('cd MPL/out && tar czf %s.tar.gz ' % (t))
    for m in methods: job_collect.write(' *%s*_%s.dat' % (t, m))
    job_collect.write(' && cd ../..')
    job_collect.close()

Next collect and organize the output.

In [11]:
for t in TESTS:
    true_ben = [1 if i in                       range(NB_VALS[t]) else 0 for i in range(L_VALS[t])]
    true_del = [1 if i in  range(L_VALS[t]-ND_VALS[t], L_VALS[t]) else 0 for i in range(L_VALS[t])]
    true_neu = [1 if i in range(NB_VALS[t], L_VALS[t]-ND_VALS[t]) else 0 for i in range(L_VALS[t])]
    coefs    = ['s%d' % j for j in range(L_VALS[t])]
    
    df              = pd.read_csv('MPL/out/%s_collected.csv' % (t), memory_map=True)
    df['AUROC_ben'] = pd.Series(data=[roc_auc_score(true_ben, np.array(df.iloc[i][coefs])) for i in range(len(df))])
    df['AUROC_del'] = pd.Series(data=[roc_auc_score(true_del,-np.array(df.iloc[i][coefs])) for i in range(len(df))])
    for i in range(L_VALS[t]):
        if   true_ben[i]: df['ds%d' % i] = df['s%d' % i] - SB_VALS[t]
        elif true_del[i]: df['ds%d' % i] = df['s%d' % i] - SD_VALS[t]
        elif true_neu[i]: df['ds%d' % i] = df['s%d' % i]
        
    df.to_csv('data/MPL_%s_collected_extended.csv.gz' % (t), compression='gzip')

### 5. CLEAR

First create the job files and run them.

In [9]:
pbs_str = """#!/bin/bash\n#PBS -m abe\n#PBS -M jpbarton\n#PBS -k oe\n#PBS -j oe\n#PBS -l nodes=1:ppn=1\n"""
pbs_str = pbs_str + 'START=$(date +"%s.%N")\n'
pbs_end = 'RUNTIME=$(echo "$(date +%s.%N) - $START" | bc)\necho "$RUNTIME" >> '

for t in TESTS:
    job_sub = open('CLEAR/jobs/run_%s.sh' % t, 'w')
    for ns in COMP_NS_VALS:
        for dt in COMP_DT_VALS:
            for i in range(N_TRIALS):
                _data     = np.loadtxt('wfsim/data/wfsim_%s_%d_T%d_ns%d_dt%d.dat' % (t, i, T_VALS[t], ns, dt))
                _L        = len(_data[0][2:])
                times     = np.unique(_data.T[0])
                positions = np.array(range(1, _L+1),int)

                levels     = [[1], [int(_t) for _t in times], ['C', 'D']]
                names      = ['REP', 'GEN', 'READ']
                indices    = ['CHROM', 'POS']
                col_values = {}
                col_tuples = []
                idx_tuples = [('chrI', l+1) for l in range(_L)]

                for j in range(len(times)):
                    _t_data = np.array([_d[2:] for _d in _data if _d[0]==times[j]])
                    _t_num  = np.array([ _d[1] for _d in _data if _d[0]==times[j]])
                    _t_sum  = np.einsum('i,ij->j', _t_num, _t_data)
                    for l in range(_L):
                        col_tuples.append((1, int(times[j]), 'C'))
                        col_tuples.append((1, int(times[j]), 'D'))
                        if (1, int(times[j]), 'C') in col_values:
                            col_values[(1, int(times[j]), 'C')].append(_t_sum[l]+1)
                            col_values[(1, int(times[j]), 'D')].append(np.sum(_t_num)+1)
                        else:
                            col_values[(1, int(times[j]), 'C')] = [_t_sum[l]+1]
                            col_values[(1, int(times[j]), 'D')] = [np.sum(_t_num)+1]

                df_CLEAR = pd.DataFrame(col_values, index = np.array(range(_L),int)+1)
                df_CLEAR.columns.names = tuple(names)
                df_CLEAR.to_pickle('CLEAR/data/%s_%d.df' % (t, i))
                
                o_str = '%s_%d' % (t, i)
                with open('CLEAR/jobs/%s.pbs' % (o_str), 'w') as f:
                    f.write(pbs_str)
                    f.write('python3 CLEAR/CLEAR.py --pandas CLEAR/data/%s.df' % (o_str))
                    f.write(' --N %d --out CLEAR/out/%s.df\n'                  % (N_VALS[t], o_str))
                    f.write('%sCLEAR/out/%s_time.dat\n'                        % (pbs_end, o_str))
                
                job_sub.write('qsub -q verylong CLEAR/jobs/%s_%d.pbs > /dev/null\n' % (t, i))
                
    job_sub.close()

Next collect and organize the output.

In [12]:
# Process CLEAR results

for t in TESTS:
    true_ben = [1 if i in                       range(NB_VALS[t]) else 0 for i in range(L_VALS[t])]
    true_del = [1 if i in  range(L_VALS[t]-ND_VALS[t], L_VALS[t]) else 0 for i in range(L_VALS[t])]
    true_neu = [1 if i in range(NB_VALS[t], L_VALS[t]-ND_VALS[t]) else 0 for i in range(L_VALS[t])]
    coefs    = ['s%d' % j for j in range(L_VALS[t])]
    
    f    = open('CLEAR/out/%s_collected.csv' % (t), 'w')
    head = 'trajectory,method,t0,T,ns,deltat,runtime,' + (','.join(coefs))
    f.write('%s\n' % head)
    
    for n in range(N_TRIALS):
        temp_df = pd.melt(pd.read_pickle('CLEAR/out/%s_%d.df' % (t, n)))
        temp_s  = np.array(temp_df[temp_df.stat=='s'].value)
        temp_t  = float([i.split() for i in open('CLEAR/out/%s_%d_time.dat' % (t, n)).readlines()][-1][0])
        
        f.write('%d,%s,%d,%d,%d,%d,%lf,' % (n, 'CLEAR', T0_VALS[t], T_VALS[t], ns, dt, temp_t))
        f.write(','.join(['%lf' % s for s in temp_s]))
        f.write('\n')
    
    f.close()
    
    df              = pd.read_csv('CLEAR/out/%s_collected.csv' % (t), memory_map=True)
    df['AUROC_ben'] = pd.Series(data=[roc_auc_score(true_ben, np.array(df.iloc[i][coefs])) for i in range(len(df))])
    df['AUROC_del'] = pd.Series(data=[roc_auc_score(true_del,-np.array(df.iloc[i][coefs])) for i in range(len(df))])
    for i in range(L_VALS[t]):
        if   true_ben[i]: df['ds%d' % i] = df['s%d' % i] - SB_VALS[t]
        elif true_del[i]: df['ds%d' % i] = df['s%d' % i] - SD_VALS[t]
        elif true_neu[i]: df['ds%d' % i] = df['s%d' % i]
        
    df.to_csv('data/CLEAR_%s_collected_extended.csv.gz' % (t), compression='gzip')

### 6. EandR-timeseries

First create the job files and run them.

In [15]:
pbs_str = """#!/bin/bash\n#PBS -m abe\n#PBS -M jpbarton\n#PBS -k oe\n#PBS -j oe\n#PBS -l nodes=1:ppn=4\n"""
pbs_str = pbs_str + 'START=$(date +"%s.%N")\n'
pbs_end = 'RUNTIME=$(echo "$(date +%s.%N) - $START" | bc)\necho "$RUNTIME" >> '

for t in TESTS:
    job_sub_ind  = open('EandR/jobs/run_%s_independent.sh' % t, 'w')
    job_sub_link = open('EandR/jobs/run_%s_linked.sh'      % t, 'w')
    for ns in COMP_NS_VALS:
        for dt in COMP_DT_VALS:
            for i in range(N_TRIALS):
                o_str = 'EandR/out/%s_%d' % (t, i)
                i_str = 'wfsim/data/wfsim_%s_%d_T%d_ns%d_dt%d.dat' % (t, i, T_VALS[t], ns, dt)
                with open('EandR/jobs/%s_%d_independent.pbs' % (t, i), 'w') as f:
                    f.write(pbs_str)
                    f.write('python3 EandR/EandR.py -N %d -i %s -o %s_independent.dat\n' % (N_VALS[t], i_str, o_str))
                    f.write('%s%s_independent_time.dat\n'                                % (pbs_end, o_str))
                    job_sub_ind.write('qsub -q verylong EandR/jobs/%s_%d_independent.pbs > /dev/null\n' % (t, i))
                with open('EandR/jobs/%s_%d_linked.pbs' % (t, i), 'w') as f:
                    f.write(pbs_str)
                    f.write('python3 EandR/EandR.py -N %d -i %s -o %s_linked.dat -l\n' % (N_VALS[t], i_str, o_str))
                    f.write('%s%s_linked_time.dat\n'                                   % (pbs_end, o_str))
                    job_sub_link.write('qsub -q verylong EandR/jobs/%s_%d_linked.pbs > /dev/null\n' % (t, i))
                
    job_sub_ind.close()
    job_sub_link.close()

Next collect and organize the output.

In [None]:
for t in TESTS:
    true_ben = [1 if i in                       range(NB_VALS[t]) else 0 for i in range(L_VALS[t])]
    true_del = [1 if i in  range(L_VALS[t]-ND_VALS[t], L_VALS[t]) else 0 for i in range(L_VALS[t])]
    true_neu = [1 if i in range(NB_VALS[t], L_VALS[t]-ND_VALS[t]) else 0 for i in range(L_VALS[t])]
    coefs    = ['s%d' % j for j in range(L_VALS[t])]
    
    f    = open('EandR/out/%s_collected.csv' % (t), 'w')
    head = 'trajectory,method,t0,T,ns,deltat,runtime,' + (','.join(coefs))
    f.write('%s\n' % head)
    
    for n in range(N_TRIALS):
        temp_s = np.loadtxt('EandR/out/%s_%d_linked.dat' % (t, n))
        temp_t = np.loadtxt('EandR/out/%s_%d_linked_time.dat' % (t, n))
        
        f.write('%d,%s,%d,%d,%d,%d,%lf,' % (n, 'EandR_linked', T0_VALS[t], T_VALS[t], ns, dt, temp_t))
        f.write(','.join(['%lf' % s for s in temp_s]))
        f.write('\n')
        
        temp_s = np.loadtxt('EandR/out/%s_%d_independent.dat' % (t, n))
        temp_t = np.loadtxt('EandR/out/%s_%d_independent_time.dat' % (t, n))
        
        f.write('%d,%s,%d,%d,%d,%d,%lf,' % (n, 'EandR_independent', T0_VALS[t], T_VALS[t], ns, dt, temp_t))
        f.write(','.join(['%lf' % s for s in temp_s]))
        f.write('\n')
    
    f.close()
    
    df              = pd.read_csv('EandR/out/%s_collected.csv' % (t), memory_map=True)
    df['AUROC_ben'] = pd.Series(data=[roc_auc_score(true_ben, np.array(df.iloc[i][coefs])) for i in range(len(df))])
    df['AUROC_del'] = pd.Series(data=[roc_auc_score(true_del,-np.array(df.iloc[i][coefs])) for i in range(len(df))])
    for i in range(L_VALS[t]):
        if   true_ben[i]: df['ds%d' % i] = df['s%d' % i] - SB_VALS[t]
        elif true_del[i]: df['ds%d' % i] = df['s%d' % i] - SD_VALS[t]
        elif true_neu[i]: df['ds%d' % i] = df['s%d' % i]
            
    df.to_csv('data/EandR_%s_collected_extended.csv.gz' % (t), compression='gzip')