# Compilation of test data for performance tests

This notebook records the parameters for Wright-Fisher simulations used to generate our test data sets, as well as commands for running inference algorithms on the test data and compiling the results. In this work consider one **simple** and one **complex** scenario, intended to explore different regimes of computational and evolutionary complexity. We also consider an **example** that has somewhat more complicated evolutionary trajectories. 

Parameters for each of these scenarios are given below in **Section 1**.

We analyzed the resulting trajectories with **7** different algorithms, in addition to MPL:

1. FIT [[paper](https://doi.org/10.1534/genetics.113.158220)]  
2. LLS [[paper](https://doi.org/10.1093/molbev/msx225)] [[code](https://github.com/ThomasTaus/poolSeq)]  
3. CLEAR [[paper](https://doi.org/10.1534/genetics.116.197566)] [[code](https://github.com/airanmehr/CLEAR)]  
4. EandR-timeseries [[paper](https://doi.org/10.1371/journal.pgen.1005069)] [[code](https://github.com/terhorst/EandR-timeseries)]  
5. ApproxWF [[paper](https://doi.org/10.1534/genetics.115.184598)] [[code](https://bitbucket.org/phaentu/approxwf/wiki/Home)]  
6. WFABC [[paper](https://doi.org/10.1111/1755-0998.12280)] [[code](http://jjensenlab.org/software)]  
7. IM [[paper](https://doi.org/10.1534/genetics.111.133975)]  


Here we have included a link to the code repository (if available) and to the corresponding paper. Scripts to run and compile output from these methods are collected in **Section 2**.

### Import required libraries and define global variables

In [1]:
# Full library list and version numbers

print('This notebook was prepared using:')

import sys
print('python version %s' % sys.version)

import numpy as np
print('numpy version %s' % np.__version__)

import pandas as pd
print('pandas version %s' % pd.__version__)

import sklearn as sk
from sklearn.metrics import roc_auc_score
print('scikit-learn version %s' % sk.__version__)


# GLOBAL VARIABLES

PBS_STR = """#!/bin/bash\n#PBS -m abe\n#PBS -M jpbarton\n#PBS -k oe\n#PBS -j oe\n#PBS -l nodes=1:ppn=1\n"""

# Code Ocean directories
# WFS_DIR = '../data/wfsim'
# WFS_DIR_REL = '../wfsim'
# MPL_DIR = 'MPL'
# SIM_MPL_DIR = '../data/simulation/MPL'
# CLR_DIR = 'external/CLEAR'
# EAR_DIR = 'external/EandR'
# SIM_DIR = '../data/simulation'

# GitHub directories
WFS_DIR = 'src/wfsim'
WFS_DIR_REL = '../wfsim'
MPL_DIR = 'src/MPL'
SIM_MPL_DIR = 'src/MPL/out'
CLR_DIR = 'src/external/CLEAR'
EAR_DIR = 'src/external/EandR'
SIM_DIR = 'data/simulation'

TESTS   = [   'example',      'medium_simple',      'medium_complex']
N_VALS  = dict(example=  1000, medium_simple=  1000, medium_complex=1000)
L_VALS  = dict(example=    50, medium_simple=    50, medium_complex=  50)
T0_VALS = dict(example=     0, medium_simple=     0, medium_complex=  10)
T_VALS  = dict(example=   400, medium_simple=  1000, medium_complex= 310)
MU_VALS = dict(example=  1e-3, medium_simple=  1e-4, medium_complex=1e-4)
NB_VALS = dict(example=    10, medium_simple=    10, medium_complex=  10)
ND_VALS = dict(example=    10, medium_simple=    10, medium_complex=  10)
SB_VALS = dict(example= 0.025, medium_simple= 0.025, medium_complex= 0.1)
SD_VALS = dict(example=-0.025, medium_simple=-0.025, medium_complex=-0.1)

N_TRIALS     =  100  # number of independent trials to run for each test set
COMP_NS_VALS = [100] # number of sequence samples to collect per time point 
COMP_DT_VALS = [ 10] # time between sampling events (in discrete generations)

This notebook was prepared using:
python version 3.7.7 (default, Mar 10 2020, 15:43:33) 
[Clang 11.0.0 (clang-1100.0.33.17)]
numpy version 1.18.4
pandas version 1.0.3
scikit-learn version 0.22.2.post1


# Section 1. Generation of test data through Wright-Fisher simulations

Wright-Fisher simulations are performed using `src/wfsim/Wright-Fisher.py`. The output of these simulations is saved for processing. The code below creates multiple job files for running many simulations in parallel on a computer cluster.

In [2]:
# example, 100x runs of
# 
#     N   = 10^3         (population size)
#     T   = 1000         (total number of generations to simulate)
#     mu  = 1 x 10^{-3}  (mutation rate)
#     L   = 50           (sequence length) 
#     n_b = 10           (number of beneficial mutations)
#     n_d = 10           (number of deleterious mutations)
#     s_b =  0.025       (selection coefficient for beneficial mutations)
#     s_d = -0.025       (selection coefficient for deleterious mutations)
#
# RANDOM STARTING POPULATION (3 groups)

test     = 'example'
job_pars = {'-T'   : 1000,
            '-N'   : N_VALS[test],
            '-L'   : L_VALS[test],
            '--mu' : MU_VALS[test],
            '--nB' : NB_VALS[test],
            '--fB' : SB_VALS[test],
            '--nD' : ND_VALS[test],
            '--fD' : SD_VALS[test],
            '--random' : 3 }
job_sub = open(WFS_DIR+'/jobs/run_wfsim_'+test+'.sh', 'w')
for t in range(N_TRIALS):
    trial_str = 'wfsim_'+test+'_%d' % t
    job_sub.write('qsub -q verylong %s > /dev/null\n' % ('jobs/'+trial_str+'.pbs'))
    with open(WFS_DIR+'/jobs/'+trial_str+'.pbs', 'w') as f:
        f.write(PBS_STR)
        f.write('python3 Wright-Fisher.py -o data/%s ' % (trial_str))
        f.write('%s\n' % (' '.join([k + ' ' + str(v) for k, v in job_pars.items()])))
job_sub.close()


# MEDIUM SIMPLE, 100x runs of
# 
#     N   = 10^3         (population size)
#     T   = 10^4         (total number of generations to simulate)
#     mu  = 1 x 10^{-4}  (mutation rate)
#     L   = 50           (sequence length) 
#     n_b = 10           (number of beneficial mutations)
#     n_d = 10           (number of deleterious mutations)
#     s_b =  0.025       (selection coefficient for beneficial mutations)
#     s_d = -0.025       (selection coefficient for deleterious mutations)

test     = 'medium_simple'
job_pars = {'-T'   : int(1.0e4),
            '-N'   : N_VALS[test],
            '-L'   : L_VALS[test],
            '--mu' : MU_VALS[test],
            '--nB' : NB_VALS[test],
            '--fB' : SB_VALS[test],
            '--nD' : ND_VALS[test],
            '--fD' : SD_VALS[test] }
job_sub = open(WFS_DIR+'/jobs/run_wfsim_'+test+'.sh', 'w')
for t in range(N_TRIALS):
    trial_str = 'wfsim_'+test+'_%d' % t
    job_sub.write('qsub -q verylong %s > /dev/null\n' % ('jobs/'+trial_str+'.pbs'))
    with open(WFS_DIR+'/jobs/'+trial_str+'.pbs', 'w') as f:
        f.write(PBS_STR)
        f.write('python3 Wright-Fisher.py -o data/%s ' % (trial_str))
        f.write('%s\n' % (' '.join([k + ' ' + str(v) for k, v in job_pars.items()])))
job_sub.close()


# MEDIUM COMPLEX, 100x runs of
# 
#     N   = 10^3         (population size)
#     T   = 10^4         (total number of generations to simulate)
#     mu  = 1 x 10^{-4}  (mutation rate)
#     L   = 50           (sequence length) 
#     n_b = 10           (number of beneficial mutations)
#     n_d = 10           (number of deleterious mutations)
#     s_b =  0.100       (selection coefficient for beneficial mutations)
#     s_d = -0.100       (selection coefficient for deleterious mutations)
#
# For these simulations the starting population is evenly split between
# 5 collections of sequences with randomly chosen mutations (probability
# of mutation is 50% at each site independent of other sites, 
# see Wright-Fisher.py for details)

test     = 'medium_complex'
job_pars = {'-T'   : int(1.0e4),
            '-N'   : N_VALS[test],
            '-L'   : L_VALS[test],
            '--mu' : MU_VALS[test],
            '--nB' : NB_VALS[test],
            '--fB' : SB_VALS[test],
            '--nD' : ND_VALS[test],
            '--fD' : SD_VALS[test],
            '--random' : 5 }
job_sub = open(WFS_DIR+'/jobs/run_wfsim_'+test+'.sh', 'w')
for t in range(N_TRIALS):
    trial_str = 'wfsim_'+test+'_%d' % t
    job_sub.write('qsub -q verylong %s > /dev/null\n' % ('jobs/'+trial_str+'.pbs'))
    with open(WFS_DIR+'/jobs/'+trial_str+'.pbs', 'w') as f:
        f.write(PBS_STR)
        f.write('python3 Wright-Fisher.py -o data/%s ' % (trial_str))
        f.write('%s\n' % (' '.join([k + ' ' + str(v) for k, v in job_pars.items()])))
job_sub.close()

Once the Wright-Fisher trajectories have been generated, we subsample them to create our test trajectories using `src/wfsim/py2c.py`. For comparison between inference methods we chose to take 100 sequences per sample, with samples taken every 10 generations. The starting and ending generations of these test trajectories are

1. example -- start 0, end 400
2. medium simple -- start 0, end 1000
3. medium complex -- start 10, end 310

The code below produces 3 shell scripts `expand_example.sh`, `expand_medium_simple.sh`, and `expand_medium_complex.sh`, which can be run to extract the trajectories from the compressed output of `src/wfsim/Wright-Fisher.py`.

# Section 2. Running the inference algorithms and compiling output

In [3]:
# Extract sub-trajectories from full samples

ns_vals = [10, 20, 30, 40, 50,  80, 100, 1000]
dt_vals = [ 1,  5, 10, 20, 50]

for t in TESTS[1:]:
    job_sub = open('%s/expand_%s.sh' % (WFS_DIR, t), 'w')
    for ns in COMP_NS_VALS:
        for dt in COMP_DT_VALS:
            for i in range(N_TRIALS):
                job_sub.write('python3 py2c.py -i data/wfsim_%s_%d -t %d -T %d --ns %d --dt %d -s %d\n' 
                            % (t, i, T0_VALS[t], T_VALS[t], ns, dt, i))
    job_sub.close()

### MPL

First create the job files and run them.

In [4]:
ns_vals = [10, 20, 30, 40, 50,  80, 100, 1000]
dt_vals = [ 1,  5, 10, 20, 50]

for t in TESTS:
    job_sub = open('%s/jobs/run_wfinf_%s.sh' % (MPL_DIR, t), 'w')
    job_sub.write('g++ src/main.cpp src/inf-binary.cpp src/io.cpp -O3 ')
    job_sub.write('-march=native -lgslcblas -lgsl -o bin/mpl-binary\n')
    for ns in ns_vals:
        for dt in dt_vals:
            trial_str = 'wfinf_%s_T%d_ns%d_dt%d' % (t, T_VALS[t], ns, dt)
            job_sub.write('qsub -q verylong %s > /dev/null\n' % ('jobs/'+trial_str+'.pbs'))
            with open(MPL_DIR+'/jobs/'+trial_str+'.pbs', 'w') as f:
                f.write(PBS_STR)
                for i in range(N_TRIALS):
                    i_str = '%s/data/wfsim_%s_%d_T%d_ns%d_dt%d' % (WFS_DIR_REL, t, i, T_VALS[t], ns, dt)
                    o_str = 'out/%s_%d_T%d_ns%d_dt%d'           % (             t, i, T_VALS[t], ns, dt)
                    f.write('python3 %s/py2c.py -i %s/data/wfsim_%s_%d -t %d -T %d --ns %d --dt %d -s %d\n' 
                            % (WFS_DIR_REL, WFS_DIR_REL, t, i, T0_VALS[t], T_VALS[t], ns, dt, i))
                    if ns in COMP_NS_VALS and dt in COMP_DT_VALS:
                        f.write('./bin/mpl-binary -i %s.dat -o %s_MPL.dat'    % (i_str, o_str))
                        f.write(' -g 1e3 -N %d -mu %.3e > %s_MPL_time.dat\n'  % (N_VALS[t], MU_VALS[t], o_str))
                        f.write('./bin/mpl-binary -i %s.dat -o %s_SL.dat -nc' % (i_str, o_str))
                        f.write(' -g 1e3 -N %d -mu %.3e > %s_SL_time.dat\n'   % (N_VALS[t], MU_VALS[t], o_str))
                    else:
                        f.write('./bin/mpl-binary -i %s.dat -o %s_MPL.dat' % (i_str, o_str))
                        f.write(' -g 1e3 -N %d -mu %.3e > /dev/null\n'     % (N_VALS[t], MU_VALS[t]))
                        f.write('./bin/mpl-binary -i %s.dat -o %s_SL.dat'  % (i_str, o_str))
                        f.write(' -nc -g 1e3 -N %d -mu %.3e > /dev/null\n' % (N_VALS[t], MU_VALS[t]))
                    f.write('./bin/mpl-binary -i %s.dat -o %s_MPL_noMu.dat' % (i_str, o_str))
                    f.write(' -g 1e3 -N %d -mu 0 > /dev/null\n'             % (N_VALS[t]))
                    f.write('./bin/mpl-binary -i %s.dat -o %s_SL_noMu.dat'  % (i_str, o_str))
                    f.write(' -nc -g 1e3 -N %d -mu 0 > /dev/null\n'         % (N_VALS[t]))
                    if t=='example' and ns==1000 and dt==1 and i==0:
                        f.write('./bin/mpl-binary -i %s.dat -o %s_MPL.dat'    % (i_str, o_str))
                        f.write(' -g 1e3 -N %d -mu %.3e -sc %s > /dev/null\n' 
                                % (N_VALS[t], MU_VALS[t], 
                                   o_str.split('/')[0]+'/covariance-'+o_str.split('/')[1]+'.dat'))
                    else:
                        f.write('rm %s.dat\n' % i_str)
    job_sub.close()

    methods = ['MPL', 'SL', 'MPL_noMu', 'SL_noMu']

    job_collect = open('%s/jobs/run_wfinf_%s_collect.sh' % (MPL_DIR, t), 'w')
    job_collect.write('python3 collect_s.py -i out/%s -n %d -T %d -t %d' % (t, N_TRIALS, T_VALS[t], T0_VALS[t]))
    for  m in methods: job_collect.write(  ' -m %s' %  m)
    for ns in ns_vals: job_collect.write(' --ns %d' % ns)
    for dt in dt_vals: job_collect.write(' --dt %d' % dt)
    job_collect.write(' &\n')
    job_collect.write('cd %s/out && tar czf %s.tar.gz ' % (MPL_DIR, t))
    for m in methods: job_collect.write(' *%s*_%s.dat' % (t, m))
    job_collect.write(' && cd ../..')
    job_collect.close()

Next collect and organize the output.

In [5]:
for t in TESTS:
    true_ben = [1 if i in                       range(NB_VALS[t]) else 0 for i in range(L_VALS[t])]
    true_del = [1 if i in  range(L_VALS[t]-ND_VALS[t], L_VALS[t]) else 0 for i in range(L_VALS[t])]
    true_neu = [1 if i in range(NB_VALS[t], L_VALS[t]-ND_VALS[t]) else 0 for i in range(L_VALS[t])]
    coefs    = ['s%d' % j for j in range(L_VALS[t])]
    
    df              = pd.read_csv('%s/%s_collected.csv' % (SIM_MPL_DIR, t), memory_map=True)
    df['AUROC_ben'] = pd.Series(data=[roc_auc_score(true_ben, np.array(df.iloc[i][coefs])) for i in range(len(df))])
    df['AUROC_del'] = pd.Series(data=[roc_auc_score(true_del,-np.array(df.iloc[i][coefs])) for i in range(len(df))])
    for i in range(L_VALS[t]):
        if   true_ben[i]: df['ds%d' % i] = df['s%d' % i] - SB_VALS[t]
        elif true_del[i]: df['ds%d' % i] = df['s%d' % i] - SD_VALS[t]
        elif true_neu[i]: df['ds%d' % i] = df['s%d' % i]
    
    df.to_csv('%s/MPL_%s_collected_extended.csv.gz' % (SIM_DIR, t), compression='gzip')

### 1, 5-7. FIT, ApproxWF, WFABC, and IM

See Matlab scripts in the `src/Matlab/` directory for data processing and running the FIT, ApproxWF, WFABC, and IM inference routines. An overview of this analysis is presented in `src/Matlab/README.TXT`.

### 2. LLS

See the R script in the `src/R` directory for data processing and running the LLS inference routine. An overview of this analysis is presented in `src/R/README.TXT`.

### 3. CLEAR

First create the job files and run them.

In [6]:
pbs_str_clr = PBS_STR + 'START=$(date +"%s.%N")\n'
pbs_end     = 'RUNTIME=$(echo "$(date +%s.%N) - $START" | bc)\necho "$RUNTIME" >> '

for t in TESTS:
    if t=='example':
        continue
    job_sub = open('%s/jobs/run_%s.sh' % (CLR_DIR, t), 'w')
    for ns in COMP_NS_VALS:
        for dt in COMP_DT_VALS:
            for i in range(N_TRIALS):
                _data     = np.loadtxt('%s/data/wfsim_%s_%d_T%d_ns%d_dt%d.dat' % (WFS_DIR, t, i, T_VALS[t], ns, dt))
                _L        = len(_data[0][2:])
                times     = np.unique(_data.T[0])
                positions = np.array(range(1, _L+1),int)

                levels     = [[1], [int(_t) for _t in times], ['C', 'D']]
                names      = ['REP', 'GEN', 'READ']
                indices    = ['CHROM', 'POS']
                col_values = {}
                col_tuples = []
                idx_tuples = [('chrI', l+1) for l in range(_L)]

                for j in range(len(times)):
                    _t_data = np.array([_d[2:] for _d in _data if _d[0]==times[j]])
                    _t_num  = np.array([ _d[1] for _d in _data if _d[0]==times[j]])
                    _t_sum  = np.einsum('i,ij->j', _t_num, _t_data)
                    for l in range(_L):
                        col_tuples.append((1, int(times[j]), 'C'))
                        col_tuples.append((1, int(times[j]), 'D'))
                        if (1, int(times[j]), 'C') in col_values:
                            col_values[(1, int(times[j]), 'C')].append(_t_sum[l]+1)
                            col_values[(1, int(times[j]), 'D')].append(np.sum(_t_num)+1)
                        else:
                            col_values[(1, int(times[j]), 'C')] = [_t_sum[l]+1]
                            col_values[(1, int(times[j]), 'D')] = [np.sum(_t_num)+1]

                df_CLEAR = pd.DataFrame(col_values, index = np.array(range(_L),int)+1)
                df_CLEAR.columns.names = tuple(names)
                df_CLEAR.to_pickle('%s/data/%s_%d.df' % (CLR_DIR, t, i))
                
                o_str = '%s_%d' % (t, i)
                with open('%s/jobs/%s.pbs' % (CLR_DIR, o_str), 'w') as f:
                    f.write(pbs_str_clr)
                    f.write('python3 %s/CLEAR.py --pandas %s/data/%s.df' % (CLR_DIR, CLR_DIR, o_str))
                    f.write(' --N %d --out %s/out/%s.df\n'               % (N_VALS[t], CLR_DIR, o_str))
                    f.write('%s%s/out/%s_time.dat\n'                     % (pbs_end, CLR_DIR, o_str))
                
                job_sub.write('qsub -q verylong %s/jobs/%s_%d.pbs > /dev/null\n' % (CLR_DIR, t, i))
                
    job_sub.close()

Next collect and organize the output.

In [7]:
# Process CLEAR results

for t in TESTS:
    if t=='example':
        continue
    true_ben = [1 if i in                       range(NB_VALS[t]) else 0 for i in range(L_VALS[t])]
    true_del = [1 if i in  range(L_VALS[t]-ND_VALS[t], L_VALS[t]) else 0 for i in range(L_VALS[t])]
    true_neu = [1 if i in range(NB_VALS[t], L_VALS[t]-ND_VALS[t]) else 0 for i in range(L_VALS[t])]
    coefs    = ['s%d' % j for j in range(L_VALS[t])]
    
    f    = open('%s/out/%s_collected.csv' % (CLR_DIR, t), 'w')
    head = 'trajectory,method,t0,T,ns,deltat,runtime,' + (','.join(coefs))
    f.write('%s\n' % head)
    
    for n in range(N_TRIALS):
        temp_df = pd.melt(pd.read_pickle('%s/out/%s_%d.df' % (CLR_DIR, t, n)))
        temp_s  = np.array(temp_df[temp_df.stat=='s'].value)
        temp_t  = float([i.split() for i in open('%s/out/%s_%d_time.dat' % (CLR_DIR, t, n)).readlines()][-1][0])
        
        f.write('%d,%s,%d,%d,%d,%d,%lf,' % (n, 'CLEAR', T0_VALS[t], T_VALS[t], ns, dt, temp_t))
        f.write(','.join(['%lf' % s for s in temp_s]))
        f.write('\n')
    
    f.close()
    
    df              = pd.read_csv('%s/out/%s_collected.csv' % (CLR_DIR, t), memory_map=True)
    df['AUROC_ben'] = pd.Series(data=[roc_auc_score(true_ben, np.array(df.iloc[i][coefs])) for i in range(len(df))])
    df['AUROC_del'] = pd.Series(data=[roc_auc_score(true_del,-np.array(df.iloc[i][coefs])) for i in range(len(df))])
    for i in range(L_VALS[t]):
        if   true_ben[i]: df['ds%d' % i] = df['s%d' % i] - SB_VALS[t]
        elif true_del[i]: df['ds%d' % i] = df['s%d' % i] - SD_VALS[t]
        elif true_neu[i]: df['ds%d' % i] = df['s%d' % i]
        
    df.to_csv('%s/CLEAR_%s_collected_extended.csv.gz' % (SIM_DIR, t), compression='gzip')

### 4. EandR-timeseries

First create the job files and run them.

In [8]:
pbs_str_ear = """#!/bin/bash\n#PBS -m abe\n#PBS -M jpbarton\n#PBS -k oe\n#PBS -j oe\n#PBS -l nodes=1:ppn=4\n"""
pbs_str_ear = pbs_str_ear + 'START=$(date +"%s.%N")\n'
pbs_end     = 'RUNTIME=$(echo "$(date +%s.%N) - $START" | bc)\necho "$RUNTIME" >> '

for t in TESTS:
    if t=='example':
        continue
    job_sub = open('%s/jobs/run_%s.sh' % (EAR_DIR, t), 'w')
    for ns in COMP_NS_VALS:
        for dt in COMP_DT_VALS:
            for i in range(N_TRIALS):
                o_str = '%s/out/%s_%d' % (EAR_DIR, t, i)
                i_str = '%s/data/wfsim_%s_%d_T%d_ns%d_dt%d.dat' % (WFS_DIR, t, i, T_VALS[t], ns, dt)
                with open('%s/jobs/%s_%d.pbs' % (EAR_DIR, t, i), 'w') as f:
                    f.write(pbs_str_ear)
                    f.write('python3 %s/EandR.py -N %d -i %s -o %s.dat\n' % (EAR_DIR, N_VALS[t], i_str, o_str))
                    f.write('%s%s_time.dat\n'                             % (pbs_end, o_str))
                    job_sub.write('qsub -q verylong %s/jobs/%s_%d.pbs > /dev/null\n' % (EAR_DIR, t, i))
    job_sub.close()

Next collect and organize the output.

In [9]:
for t in TESTS:
    if t=='example':
        continue
    true_ben = [1 if i in                       range(NB_VALS[t]) else 0 for i in range(L_VALS[t])]
    true_del = [1 if i in  range(L_VALS[t]-ND_VALS[t], L_VALS[t]) else 0 for i in range(L_VALS[t])]
    true_neu = [1 if i in range(NB_VALS[t], L_VALS[t]-ND_VALS[t]) else 0 for i in range(L_VALS[t])]
    coefs    = ['s%d' % j for j in range(L_VALS[t])]
    
    f    = open('%s/out/%s_collected.csv' % (EAR_DIR, t), 'w')
    head = 'trajectory,method,t0,T,ns,deltat,runtime,' + (','.join(coefs))
    f.write('%s\n' % head)
    
    for ns in COMP_NS_VALS:
        for dt in COMP_DT_VALS:
            for n in range(N_TRIALS):
                temp_s = np.loadtxt('%s/out/%s_%d.dat' % (EAR_DIR, t, n))
                temp_t = np.loadtxt('%s/out/%s_%d_time.dat' % (EAR_DIR, t, n))
                if temp_t.shape!=(): temp_t = temp_t[-1]
                
                f.write('%d,%s,%d,%d,%d,%d,%lf,' % (n, 'EandR', T0_VALS[t], T_VALS[t], ns, dt, temp_t))
                f.write(','.join(['%lf' % s for s in temp_s]))
                f.write('\n')
    
    f.close()
    
    df              = pd.read_csv('%s/out/%s_collected.csv' % (EAR_DIR, t), memory_map=True)
    df['AUROC_ben'] = pd.Series(data=[roc_auc_score(true_ben, np.array(df.iloc[i][coefs])) for i in range(len(df))])
    df['AUROC_del'] = pd.Series(data=[roc_auc_score(true_del,-np.array(df.iloc[i][coefs])) for i in range(len(df))])
    for i in range(L_VALS[t]):
        if   true_ben[i]: df['ds%d' % i] = df['s%d' % i] - SB_VALS[t]
        elif true_del[i]: df['ds%d' % i] = df['s%d' % i] - SD_VALS[t]
        elif true_neu[i]: df['ds%d' % i] = df['s%d' % i]
            
    df.to_csv('%s/EandR_%s_collected_extended.csv.gz' % (SIM_DIR, t), compression='gzip')