<a id='contents'></a>

# Figures and analysis

This notebook contains scripts for producing the main figures and results accompanying the manuscript. Here we perform basic organization and processing of data, which is then passed to functions in `figures.py` and `mplot.py` (available at this [GitHub repository](https://github.com/johnbarton/mplot)) for detailed formatting. The figures produced are stored as PDFs in the `/figures` folder.

## Contents

- [Overview and table of contents](#contents)
- [Loading libraries and global variables](#global)
- [Figures and data analysis](#figures)  
    - [Figure 1](#fig1-figs1)  
    - [Figure 2](#fig2-figs2)
    - [Figure 3](#fig3)
    - [Figure 4](#fig4-figs4)
    - [Figure 5](#fig5)
    - [Figure 6](#fig6-figs9)
    - [Extended Data Figure 1](#fig1-figs1)
    - [Extended Data Figure 2](#fig2-figs2)
    - Extended Data Figures 3 and 4 are plotted in Matlab -- see the `src/Matlab` folder for details  
    - [Extended Data Figure 5](#figs3-figs6)
    - [Extended Data Figure 6](#fig4-figs4)
    - [Supplementary Figure 1](#figs5)
    - [Extended Data Figure 7](#figs3-figs6)
    - [Extended Data Figure 8](#figs7-figs8)
    - [Extended Data Figure 9](#figs7-figs8)
    - [Supplementary Figure 2](#fig6-figs9)
    - [Extended Data Figure 10](#figs11)

<a id='global'></a>

## Libraries and variables

In [1]:
# Full library list and version numbers

print('This notebook was prepared using:')

import sys, os
from copy import deepcopy
from importlib import reload
print('python version %s' % sys.version)

import numpy as np
print('numpy version %s' % np.__version__)

import scipy as sp
import scipy.stats as st
print('scipy version %s' % sp.__version__)

import pandas as pd
print('pandas version %s' % pd.__version__)

import matplotlib
import matplotlib.cm as cm
import matplotlib.pyplot as plot
import matplotlib.gridspec as gridspec
import matplotlib.image as mpimg
print('matplotlib version %s' % matplotlib.__version__)

import figures as fig
import mplot as mp


# GLOBAL VARIABLES

NUC = ['-', 'A', 'C', 'G', 'T']
REF = NUC[0]
CONS_TAG = 'CONSENSUS'
HXB2_TAG = 'B.FR.1983.HXB2-LAI-IIIB-BRU.K03455.19535'
B_PPTS = ['700010470', '700010077', '700010058', '700010040', '700010607']
C_PPTS = ['706010164', '705010198', '705010185', '705010162', '704010042', 
          '703010256', '703010159', '703010131', 'cap256']
PPT2LABEL = {'700010470': 'CH470',
             '700010077': 'CH77', 
             '700010058': 'CH58', 
             '700010040': 'CH40',
             '706010164': 'CH164', 
             '705010198': 'CH198', 
             '705010185': 'CH185', 
             '705010162': 'CH162', 
             '704010042': 'CH42', 
             '703010256': 'CH256',
             '703010159': 'CH159', 
             '703010131': 'CH131', 
             '700010607': 'CH607', 
             'cap256':    'CAP256'}

FIGPROPS = { 'transparent' : True, }

# # Code Ocean directories
# HIV_DIR = '../data/HIV'
# MPL_DIR = 'MPL'
# SIM_DIR = '../data/simulation'
# FIG_DIR = '../results'
# HIV_MPL_DIR = '../data/HIV/MPL'

# GitHub directories
HIV_DIR = 'data/HIV'
MPL_DIR = 'src/MPL'
SIM_DIR = 'data/simulation'
FIG_DIR = 'figures'
HIV_MPL_DIR = 'src/MPL/HIV'

TESTS   = [   'example',      'medium_simple',      'medium_complex']
N_VALS  = dict(example=  1000, medium_simple=  1000, medium_complex=1000)
L_VALS  = dict(example=    50, medium_simple=    50, medium_complex=  50)
T0_VALS = dict(example=     0, medium_simple=     0, medium_complex=  10)
T_VALS  = dict(example=   400, medium_simple=  1000, medium_complex= 310)
MU_VALS = dict(example=  1e-3, medium_simple=  1e-4, medium_complex=1e-4)
NB_VALS = dict(example=    10, medium_simple=    10, medium_complex=  10)
ND_VALS = dict(example=    10, medium_simple=    10, medium_complex=  10)
SB_VALS = dict(example= 0.025, medium_simple= 0.025, medium_complex= 0.1)
SD_VALS = dict(example=-0.025, medium_simple=-0.025, medium_complex=-0.1)

N_TRIALS     =  100  # number of independent trials to run for each test set
COMP_NS_VALS = [100] # number of sequence samples to collect per time point 
COMP_DT_VALS = [ 10] # time between sampling events (in discrete generations)

# NOTE: the values below are taken from `HIV-analysis.ipynb` from the output of the second code cell
# If this same pipeline is run on new or different data, these values should be updated!

# ALL SUBTYPES
TOTAL_VARS       = 350045 # number of possible mutations
TOTAL_NS_EPITOPE = 7155   # total number of nonsynonymous mutations in epitopes
TOTAL_NS_REV     = 4383   # total number of nonsynonymous reversions
TOTAL_NS_REV_EPI = 127    # total number of nonsynonymous reversions in epitopes

# # Subtype B only
# TOTAL_VARS       = 133601 # number of possible mutations
# TOTAL_NS_EPITOPE = 3385   # total number of nonsynonymous mutations in epitopes
# TOTAL_NS_REV     = 1736   # total number of nonsynonymous reversions
# TOTAL_NS_REV_EPI = 66     # total number of nonsynonymous reversions in epitopes

# # Subtype C only
# TOTAL_VARS       = 216444 # number of possible mutations
# TOTAL_NS_EPITOPE = 3545   # total number of nonsynonymous mutations in epitopes
# TOTAL_NS_REV     = 2647   # total number of nonsynonymous reversions
# TOTAL_NS_REV_EPI = 54     # total number of nonsynonymous reversions in epitopes

This notebook was prepared using:
python version 3.9.8 (main, Nov 10 2021, 03:55:42) 
[Clang 13.0.0 (clang-1300.0.29.3)]
numpy version 1.21.4
scipy version 1.7.2
pandas version 1.3.4
matplotlib version 3.5.0


<a id='figures'></a>

## Figures and data analysis

<a id='fig1-figs1'></a>

### Figure 1. Example evolutionary path for a 50-site system and inferred selection coefficients.  
### Extended Data Figure 1. Plot of paths for individual alleles, inferred selection coefficients, and aggregate properties for different levels of sampling.

In [2]:
from imp import reload
reload(mp)
reload(fig)


pdata = {
    'n_gen':   400,                               # number of generations
    'dg':      5,                                 # spacing between generations for plot
    'N':       1000,                              # population size
    'xfile':   'wfsim_example_0_T400_ns1000_dt1', # file in directory data/ containing example trajectory
    'name':    'example',                         # simulation data set identifier
    'hist_ns': 1000,                              # number of samples for histograms
    'hist_dt': 1,                                 # time spacing for histograms
    'method':  'MPL',                             # inference method
    'n_ben':   10,                                # number of beneficial mutations
    'n_neu':   30,                                # number of neutral mutations
    'n_del':   10,                                # number of deleterious mutations
    's_ben':   0.025,                             # selection coefficient of beneficial mutations
    's_neu':   0,                                 # selection coefficient of neutral mutations
    's_del':   -0.025,                            # selection coefficient of deleterious mutations
    'r_seed':  1,                                 # random seed for scattering inferred selection coefficients
}

fig.plot_figure_example_mpl(**pdata)
fig.plot_supplementary_figure_example_mpl(**pdata)

for t_params in [['example', 1, 1000], ['medium_simple', 10, 100], ['medium_complex', 10, 100]]:
    test  = t_params[0]
    dtval = t_params[1]
    nsval = t_params[2]

    df   = pd.read_csv('%s/MPL_%s_collected_extended.csv.gz' % (SIM_DIR, test), memory_map=True)
    df   = df[df.method=='MPL']
    df_s = df[(df.deltat==dtval) & (df.ns==nsval)]

    print('')
    print('%s AUROC (beneficial, deleterious): (%.2f, %.2f)' 
          % (test, np.mean(df_s.AUROC_ben), np.mean(df_s.AUROC_del)))

    dscols = [j for j in df_s.columns if 'ds' in j]
    scols  = [j for j in df_s.columns if ('s' in j) and (j not in dscols+['ns'])]
    y_num = np.sqrt(np.sum(np.array(df_s[dscols]**2),axis=1))
    y_den = np.sqrt(np.sum([NB_VALS[test] * (SB_VALS[test]**2), ND_VALS[test] * (SD_VALS[test]**2)]))
    NRMSE = np.mean(np.array(y_num)/np.array(y_den))

    print('%s NRMSE: %.2f' % (test, NRMSE))


methods = ['ApproxWF', 'CLEAR', 'EandR', 'FIT', 'IM', 'LLS', 'WFABC']
best_NRMSE,     best_NRMSE_method     = 100, 'none'
best_AUROC_ben, best_AUROC_ben_method =   0, 'none'
best_AUROC_del, best_AUROC_del_method =   0, 'none'
for m in methods:
    df    = pd.read_csv('%s/%s_%s_collected_extended.csv.gz' % (SIM_DIR, m, test), memory_map=True)
    y_num = np.sqrt(np.sum(np.array(df[dscols]**2),axis=1))
    y_den = np.sqrt(np.sum([NB_VALS[test] * (SB_VALS[test]**2), ND_VALS[test] * (SD_VALS[test]**2)]))
    NRMSE = np.mean(np.array(y_num)/np.array(y_den))
    AUROC_ben = np.mean(df.AUROC_ben)
    AUROC_del = np.mean(df.AUROC_del)
    if NRMSE<best_NRMSE:
        best_NRMSE = NRMSE
        best_NRMSE_method = m
    if AUROC_ben>best_AUROC_ben:
        best_AUROC_ben = AUROC_ben
        best_AUROC_ben_method = m
    if AUROC_del>best_AUROC_del:
        best_AUROC_del = AUROC_del
        best_AUROC_del_method = m
        
print('')
print('%s %s best AUROC beneficial:  %.2f' % (test, best_AUROC_ben_method, best_AUROC_ben))
print('%s %s best AUROC deleterious: %.2f' % (test, best_AUROC_del_method, best_AUROC_del))
print('%s %s best NRMSE: %.2f'             % (test, best_NRMSE_method, best_NRMSE))

MPL example done.
MPL supplementary example done.

example AUROC (beneficial, deleterious): (0.99, 0.99)
example NRMSE: 0.45

medium_simple AUROC (beneficial, deleterious): (0.97, 0.88)
medium_simple NRMSE: 0.68

medium_complex AUROC (beneficial, deleterious): (0.92, 0.93)
medium_complex NRMSE: 0.75

medium_complex LLS best AUROC beneficial:  0.83
medium_complex FIT best AUROC deleterious: 0.81
medium_complex CLEAR best NRMSE: 0.90


<a id='fig2-figs2'></a>

### Figure 2. Performance comparison of MPL versus other selection inference methods.
### Extended Data Figure 2. Performance improvement of MPL over other selection inference methods on the same data sets.

In [3]:
from imp import reload
reload(fig)
reload(mp)


# Define a function to retrieve performance from stored data

def get_plot_data(test, df_dict, methods):
    w      = 0.25
    x_ben  = [i for i in range(len(methods))]
    x_del  = [i for i in x_ben]
    x_t    = [i for i in x_ben]
    x_err  = [i for i in x_ben]
    y_ben  = []
    y_del  = []
    y_t    = []
    y_err  = []

    for i in range(len(methods)):
        method = methods[i]
        if methods[i]=='EandR':
            method = 'EandR'
        elif methods[i]=='IM':
            method = 'Det'
        elif methods[i]=='WFABC':
            method = 'ABC'
        df_test = df_dict[methods[i]]
        df_test = df_test[(df_test.ns==100) & (df_test.deltat==10) & (df_test.method==method)]
        y_ben.append(df_test.AUROC_ben)
        y_del.append(df_test.AUROC_del)
        y_t.append(np.log10(df_test.runtime))
        
        dscols = [j for j in df_test.columns if 'ds' in j]
        scols  = [j for j in df_test.columns if ('s' in j) and (j not in dscols+['ns'])]
        y_num = np.sqrt(np.sum(np.array(df_test[dscols]**2),axis=1))
        y_den = np.sqrt(np.sum([NB_VALS[test] * (SB_VALS[test]**2), ND_VALS[test] * (SD_VALS[test]**2)]))
        if method=='LLS':
            y_num = []
            y_den = []
            for j in range(len(df_test)):
                _temp_num = 0
                _temp_den = 0
                for k in range(NB_VALS[test]):
                    if not pd.isnull(df_test.iloc[j]['ds%d' % k]):
                        _temp_num += df_test.iloc[j]['ds%d' % k]**2
                        _temp_den += SB_VALS[test]**2
                for k in range(NB_VALS[test],L_VALS[test]-ND_VALS[test]):
                    if not pd.isnull(df_test.iloc[j]['ds%d' % k]):
                        _temp_num += df_test.iloc[j]['ds%d' % k]**2
                for k in range(L_VALS[test]-ND_VALS[test],L_VALS[test]):
                    if not pd.isnull(df_test.iloc[j]['ds%d' % k]):
                        _temp_num += df_test.iloc[j]['ds%d' % k]**2
                        _temp_den += SD_VALS[test]**2
                y_num.append(np.sqrt(_temp_num))
                y_den.append(np.sqrt(_temp_den))
        y_err.append(np.array(y_num)/np.array(y_den))
        
    return x_ben, x_del, y_ben, y_del, x_t, y_t, x_err, y_err


# Choose the test data sets and get performance data

pdata = dict(test_sets=['medium_complex', 'medium_simple'],
             traj_file=['%s/wfsim_medium_complex_0_T310_ns100_dt10.dat' % (SIM_DIR),
                        '%s/wfsim_medium_simple_0_T1000_ns100_dt10.dat' % (SIM_DIR)],
             t_ticks=[[0,  75, 150, 225,  300],
                      [0, 250, 500, 750, 1000]],
            n_ben=[], n_neu=[], n_del=[], x_ben=[], y_ben=[], x_del=[], y_del=[], 
            x_err=[], y_err=[], x_t=[], y_t=[])

methods = ['MPL', 'FIT', 'LLS', 'CLEAR', 'EandR', 'ApproxWF', 'WFABC', 'IM']

for t in pdata['test_sets']:
    pdata['n_ben'].append(NB_VALS[t])
    pdata['n_neu'].append(L_VALS[t]-NB_VALS[t]-ND_VALS[t])
    pdata['n_del'].append(ND_VALS[t])
    
    df_dict   = {}
    for m in methods:
        if m=='SL':
            df_dict[m] = pd.read_csv('%s/MPL_%s_collected_extended.csv.gz' % (SIM_DIR, t))
        else:
            df_dict[m] = pd.read_csv('%s/%s_%s_collected_extended.csv.gz' % (SIM_DIR, m, t))
    
    x_ben, x_del, y_ben, y_del, x_t, y_t, x_err, y_err = get_plot_data(t, df_dict, methods)
    
    pdata['x_ben'].append(x_ben)
    pdata['y_ben'].append(y_ben)
    pdata['x_del'].append(x_del)
    pdata['y_del'].append(y_del)
    pdata['x_err'].append(x_err)
    pdata['y_err'].append(y_err)
    pdata['x_t'].append(x_t)
    pdata['y_t'].append(y_t)
    
#     # Uncomment to print performance difference plots individually
#     import seaborn as sns
#     for k in range(len(y_ben)):
#         y_ben[k] = np.array(y_ben[k])
#         y_del[k] = np.array(y_del[k])
#         y_err[k] = np.array(y_err[k])
        
#     imp_num = 0
#     imp_den = 0
        
#     print('%s\tAUC beneficial' % t)
#     for k in range(1, len(methods)):
#     #for k in [test_num]:
#         sns.distplot(y_ben[0] - y_ben[k])
#         plot.show()
#         plot.close()
#         print('\t'+str(st.ttest_rel(y_ben[0], y_ben[k])))
#         print('\t'+str(np.sum(y_ben[0]<=y_ben[k])))
#         print('\t'+str(np.min(y_ben[0]-y_ben[k]))+' '+str(np.max(y_ben[0]-y_ben[k])))
#         imp_num += np.sum(y_ben[0]<y_ben[k])
#         imp_den += len(y_ben[0])
        
#     print('%s\tAUC deleterious' % t)
#     for k in range(1, len(methods)):
#     #for k in [test_num]:
#         sns.distplot(y_del[0] - y_del[k])
#         plot.show()
#         plot.close()
#         print('\t'+str(st.ttest_rel(y_del[0], y_del[k])))
#         print('\t'+str(np.sum(y_del[0]<=y_del[k])))
#         print('\t'+str(np.min(y_del[0]-y_del[k]))+' '+str(np.max(y_del[0]-y_del[k])))
#         imp_num += np.sum(y_del[0]<y_del[k])
#         imp_den += len(y_del[0])
        
#     print('%s\tNRMSE' % t)
#     for k in range(1, len(methods)):
#     #for k in [test_num]:
#         if k>1:
#             sns.distplot(y_err[k] - y_err[0])
#             plot.show()
#             plot.close()
#         print('\t'+str(st.ttest_rel(y_err[0], y_err[k])))
#         print('\t'+str(np.sum(y_err[0]>=y_err[k])))
#         print('\t'+str(np.min(y_err[k]-y_err[0]))+' '+str(np.max(y_err[k]-y_err[0])))
#         imp_num += np.sum(y_err[0]>y_err[k])
#         imp_den += len(y_err[0])
        
#     print('')
#     print(imp_num, imp_den, imp_num/imp_den)
    
# Pass information to figure generator

fig.plot_figure_performance(**pdata)
fig.plot_supplementary_figure_performance(**pdata)

MPL performance done.
Performance comparison done.


<a id='fig3'></a>

### Figure 3. Summary of selection on HIV during intrahost evolution.  

In [4]:
from imp import reload
reload(mp)
reload(fig)


# Load in selection data and do the analysis

df = pd.read_csv('%s/analysis/total-selection.csv' % (HIV_DIR), comment='#', memory_map=True)
df = df[(df.ppt!='cap256')]
# df = df[(df.ppt!='cap256') & (df.subtype=='B')]
# df = df[(df.ppt!='cap256') & (df.subtype=='C')]


# Get fraction of top 1% most beneficial mutations that belong in different categories

top          = int(np.round(len(df)*0.01))
s_MPL_sorted = np.argsort(df.s_MPL)[::-1]
s_SL_sorted  = np.argsort(df.s_SL)[::-1]
s_set        = df.iloc[s_MPL_sorted[:top]]

n_poly = []
# Env exposed, non-epitope, nonsynonymous, no effect on glycosylation motifs
n_poly.append(np.sum((s_set.nonsynonymous> 0) & (s_set.in_epitope==False) & (s_set.glycan==0)
                      & (s_set.exposed==True))/top)
# +/- glycosylation motif, non-epitope, nonsynonymous
n_poly.append(np.sum((s_set.nonsynonymous> 0) & (s_set.in_epitope==False) & (s_set.glycan!=0))/top)
# CD8+ T cell epitope, nonsynonymous
n_poly.append(np.sum((s_set.nonsynonymous> 0) & (s_set.in_epitope==True))/top)
# Flanking CD8+ T cell epitope, nonsynonymous, not Env exposed and no effect on glycosylation motifs
n_poly.append(np.sum((s_set.nonsynonymous> 0) & (s_set.in_epitope==False) & (s_set.glycan==0)
                      & (s_set.exposed==False) & (s_set.flanking>0))/top)
# Synonymous, not reversion
n_poly.append(np.sum((s_set.nonsynonymous==0) & (s_set.nucleotide!=s_set.consensus_nucleotide))/top)
# Synonymous, reversion
n_poly.append(np.sum((s_set.nonsynonymous==0) & (s_set.nucleotide==s_set.consensus_nucleotide))/top)
# Nonsynonymous, reversion
n_poly.append(np.sum((s_set.nonsynonymous> 0) & (s_set.in_epitope==False) & (s_set.glycan==0)
                      & (s_set.exposed==False) & (s_set.flanking==0) 
                      & (s_set.nucleotide==s_set.consensus_nucleotide))/top)
# Other nonsynonymous
n_poly.append(1-np.sum(n_poly))


# # Subtype B vs. subtype C
# n_B = np.sum(s_set.subtype=='B')
# print('Number of mutations in subtype B sequences (top 1%%): %d (%.1f%%)' % (n_B, 100*n_B/top))

# n_B = np.sum(df.subtype=='B')
# print('Number of mutations in subtype B sequences (all): %d (%.1f%%)\n' % (n_B, 100*n_B/len(df)))


# # Short trajectories
# n_short = np.sum((s_set.ppt=='705010198') | (s_set.ppt=='700010607') | (s_set.ppt=='700010077') 
#                  | (s_set.tag=='705010185-5') | (s_set.tag=='700010058-3'))
# print('Number of mutations in short trajectories (top 1%%): %d (%.1f%%)' % (n_short, 100*n_short/top))

# n_short = np.sum((df.ppt=='705010198') | (df.ppt=='700010607') | (df.ppt=='700010077') 
#                  | (df.tag=='705010185-5') | (df.tag=='700010058-3'))
# print('Number of mutations in short trajectories (all): %d (%.1f%%)' % (n_short, 100*n_short/len(df)))


# Get enrichment curves

norm  = float(len(df))
top1  = int(np.round(len(df)*0.01))
top2  = int(np.round(len(df)*0.02))
top20 = int(np.round(len(df)*0.20))

x = []
y_CD8_MPL = []
y_CD8_SL  = []
y_rev_MPL = []
y_rev_SL  = []

CD8_bg = TOTAL_NS_EPITOPE / TOTAL_VARS
rev_bg = TOTAL_NS_REV / TOTAL_VARS
CD8_rev_bg = TOTAL_NS_REV_EPI / TOTAL_VARS

for n in range(top1, top20+1):
    s_set_MPL = df.iloc[s_MPL_sorted[:n]]
    s_set_SL  = df.iloc[s_SL_sorted[:n] ]
    x.append(n/norm)
    y_CD8_MPL.append(np.sum((s_set_MPL.nonsynonymous>0) & (s_set_MPL.in_epitope==True))/n)
    y_CD8_SL.append( np.sum((s_set_SL.nonsynonymous >0) & (s_set_SL.in_epitope ==True))/n)
    y_rev_MPL.append(np.sum((s_set_MPL.nonsynonymous>0) & (s_set_MPL.in_epitope==False) 
                            & (s_set_MPL.nucleotide==s_set_MPL.consensus_nucleotide))/n)
    y_rev_SL.append( np.sum((s_set_SL.nonsynonymous >0) & (s_set_SL.in_epitope==False)
                            & (s_set_SL.nucleotide ==s_set_SL.consensus_nucleotide ))/n)
    
y_CD8_MPL = np.array(y_CD8_MPL)/CD8_bg
y_CD8_SL  = np.array(y_CD8_SL )/CD8_bg
y_rev_MPL = np.array(y_rev_MPL)/rev_bg
y_rev_SL  = np.array(y_rev_SL )/rev_bg

s_set_MPL     = df.iloc[s_MPL_sorted[:top1]]
n_CD8_rev_MPL = np.sum((s_set_MPL.nonsynonymous>0) & (s_set_MPL.in_epitope==True) 
                       & (s_set_MPL.nucleotide==s_set_MPL.consensus_nucleotide))
n_CD8_MPL     = np.sum((s_set_MPL.nonsynonymous>0) & (s_set_MPL.in_epitope==True))
n_rev_MPL     = np.sum((s_set_MPL.nonsynonymous>0) & (s_set_MPL.in_epitope==False) 
                       & (s_set_MPL.nucleotide==s_set_MPL.consensus_nucleotide))

print('MPL enrichment in escape mutations that are reversions (top 1%%): %d' 
      % (np.round(n_CD8_rev_MPL/(top1 * CD8_rev_bg))))
print(st.fisher_exact([[       n_CD8_rev_MPL,                       TOTAL_NS_REV_EPI - n_CD8_rev_MPL], 
                       [top1 - n_CD8_rev_MPL, TOTAL_VARS - top1 - (TOTAL_NS_REV_EPI - n_CD8_rev_MPL)]]))
print('')

print('(MPL, SL) enrichment in escape mutations (top 1%%): (%d, %d)' 
      % (np.round(y_CD8_MPL[0]), np.round(y_CD8_SL[0])))
print('\tMPL fraction of SL: %.2f' % (y_CD8_MPL[0]/y_CD8_SL[0]))
print(st.fisher_exact([[       n_CD8_MPL,                       TOTAL_NS_EPITOPE - n_CD8_MPL], 
                       [top1 - n_CD8_MPL, TOTAL_VARS - top1 - (TOTAL_NS_EPITOPE - n_CD8_MPL)]]))
print('')

print('(MPL, SL) enrichment in non-epitope reversions (top 1%%): (%d, %d)' 
      % (np.round(y_rev_MPL[0]), np.round(y_rev_SL[0])))
print('\tMPL fraction of SL: %.2f' % (y_rev_MPL[0]/y_rev_SL[0]))
print(st.fisher_exact([[       n_rev_MPL,                       (TOTAL_NS_REV - TOTAL_NS_REV_EPI) - n_rev_MPL], 
                       [top1 - n_rev_MPL, TOTAL_VARS - top1 - ((TOTAL_NS_REV - TOTAL_NS_REV_EPI) - n_rev_MPL)]]))
print('')


# Pass information to figure generator

pdata = {
    'n_poly':    n_poly,     # fraction of top 1% most beneficial mutations in different categories
    'x_enr':     x,          # percentile values for enrichment analysis
    'y_CD8_MPL': y_CD8_MPL,  # enrichment in putative CD8+ T cell escape mutations, MPL
    'y_CD8_SL':  y_CD8_SL,   # enrichment in putative CD8+ T cell escape mutations, SL
    'y_rev_MPL': y_rev_MPL,  # enrichment in nonsynonymous reversions, MPL
    'y_rev_SL':  y_rev_SL,   # enrichment in nonsynonymous reversions, SL
#     'fig_title': 'hiv-summary-reference'
#     'fig_title': 'hiv-summary-subtype-B',
#     'fig_title': 'hiv-summary-subtype-C',
}

# fig.plot_figure_hiv_summary_alternate(**pdata)
# fig.plot_figure_hiv_summary(**pdata)
fig.plot_figure_hiv_summary_single_column(**pdata)

MPL enrichment in escape mutations that are reversions (top 1%): 370
(467.2555852355512, 1.258626289145048e-25)

(MPL, SL) enrichment in escape mutations (top 1%): (22, 12)
	MPL fraction of SL: 1.85
(39.603087633854706, 3.2962043784223833e-40)

(MPL, SL) enrichment in non-epitope reversions (top 1%): (11, 33)
	MPL fraction of SL: 0.32
(12.61765457290267, 5.3773045246302286e-09)

HIV summary done.


<a id='figs3-figs6'></a>

### Extended Data Figure 5. Distributions of total effects on inferred selection coefficients due to linkage.  
### Extended Data Figure 7. Distributions of linkage effects on inferred selection, based on genomic distance.

In [5]:
from imp import reload
reload(mp)
reload(fig)


# Iterate through patients/regions and get |Delta s| values

df_range = pd.read_csv('%s/range.csv' % (HIV_DIR), comment='#', memory_map=True)
ds_patient = []
ds_label = []
ds_region = []
sum_ds_values = []
full_ds_values = []
full_ds_distance = []
large_sum_ds = []

for it, entry in df_range.iterrows():
    tag = str(entry.tag)
    
    df_ds = pd.read_csv('%s/analysis/%s-delta-s.csv' % (HIV_DIR, tag), comment='#', memory_map=True)
    df_ds = df_ds[~((df_ds.mask_polymorphic_index==df_ds.target_polymorphic_index) 
                     & (df_ds.mask_nucleotide==df_ds.target_nucleotide))]
    site_ids = list(np.unique(df_ds.mask_polymorphic_index))
    
    temp_sum_ds_values = []
    for i in site_ids:
        df_site = df_ds[df_ds.mask_polymorphic_index==i]
        var_ids = list(np.unique(df_site.mask_nucleotide))
        for a in var_ids:
            df_var = df_site[df_site.mask_nucleotide==a]
            sum_ds = np.sum(np.fabs(df_var.effect))
            temp_sum_ds_values.append(sum_ds)
            full_ds_values += list(np.fabs(df_var.effect))
            full_ds_distance += list(np.array(df_var.distance))
            
            if sum_ds>0.4:
                large_sum_ds.append([tag, i, a])
                print('%s\t%d%s\t%.3f' % (tag, i, a, sum_ds))
    
    temp_sum_ds_values = np.array(temp_sum_ds_values)
    sum_ds_values.append(temp_sum_ds_values)
    
    ds_patient.append(entry.ppt)
    ds_label.append(PPT2LABEL[entry.ppt])
    ds_region.append(str(entry.tag)[-1])
    
    
# Pass information to figure generator

pdata = {
    'patient_list': ds_patient,
    'label_list':   ds_label,
    'region_list':  ds_region,
    'ds_values':    sum_ds_values,
    'fig_title':    'ed-fig-5-absolute-delta-s'
}

fig.plot_supplementary_figure_absolute_delta_s(**pdata)


# Pass information to figure generator

pdata = {
    'ds_values':   full_ds_values,
    'ds_distance': full_ds_distance,
    'fig_title':   'ed-fig-7-delta-s-icov-distance'
}

# fig.plot_supplementary_figure_delta_s_distance(**pdata)
fig.plot_supplementary_figure_delta_s_icov_distance(**pdata)

706010164-5	23T	0.590
706010164-5	44T	0.614
705010198-3	194G	0.510
705010185-3	225T	0.760
705010162-3	101G	0.761
705010162-3	266G	0.466
704010042-5	29A	0.504
704010042-5	30A	0.549
704010042-3	96G	0.864
704010042-3	874-	0.803
703010256-5	9A	0.874
703010256-5	172T	0.621
703010256-3	62A	0.601
703010256-3	353A	0.684
703010159-5	3G	0.406
703010159-5	34G	0.488
703010159-5	179A	0.436
703010159-3	370T	0.637
703010159-3	371A	0.725
703010159-3	399G	0.509
703010159-3	403G	0.658
703010159-3	424G	0.488
703010131-5	3G	0.582
703010131-3	31C	0.461
703010131-3	32T	0.402
703010131-3	52T	0.465
703010131-3	126T	0.446
703010131-3	263A	0.832
703010131-3	352A	1.072
703010131-3	602A	0.684
703010131-3	603G	0.548
703010131-3	605T	0.888
703010131-3	619G	0.827
703010131-3	620C	1.281
703010131-3	620T	0.422
703010131-3	621G	0.440
703010131-3	624G	0.532
700010607-3	199A	0.443
700010470-3	1T	0.612
700010470-3	10T	0.551
700010470-3	352T	0.517
700010470-3	355G	0.905
700010077-3	35C	0.802
700010077-3	87A	0.494
700010077

<a id='fig4-figs4'></a>

### Figure 4. Visualization of large contributions to inferred selection due to linkage for selected individuals.
### Extended Data Figure 6. Visualization of large contributions to inferred selection due to linkage.  

In [6]:
from imp import reload
reload(mp)
reload(fig)


ds_patient = ['700010077', '700010058', '703010131', 
              '700010077', '700010058', '703010131' ]

ds_label = [PPT2LABEL[ppt] for ppt in ds_patient]

ds_region = ['5', '5', '5',
             '3', '3', '3' ]

pdata = {
    'patient_list': ds_patient,
    'label_list':   ds_label,
    'region_list':  ds_region,
    'fig_title':    'fig-4-delta-s-hive',
}

fig.plot_figure_delta_s_hive(**pdata)

700010077-5 ['FYKTLRAEQ', 'KISTESIVI', 'DEPAAVGVG', 'TSTLQEQVGW', 'ASRELERF', 'RMYSPTSIL', 'ISPRTLNAW']
700010058-5 ['TSTLQEQIGW', 'ISPRTLNAW']
703010131-5 ['RKAKIIKDY', 'VKVIEEKAF']
700010077-3 ['AVLNIPTRI', 'DEPAAVGVG', 'QF-RNKTIVF', 'KAALDLSHF', 'DLLKTVRLI', 'DRVIEELQR', 'TLSHVVDKL', 'VAREIHPEF', 'TTVPWNVSW']
700010058-3 ['HTQGYFPDW', 'ERYLRDQQL']
703010131-3 ['SPLSFQTLI', 'FQKKGLGISY', 'RKAKIIKDY', 'EEVGFPVKPQV', 'CPKISFDPI', 'KTACNNCYC', 'VTVYYGVPV']
fig-4-delta-s-hive done.


In [7]:
from imp import reload
reload(mp)
reload(fig)


ds_patient = ['706010164', '706010164', '705010198', 
              '705010198', '705010185', '705010185', 
              '705010162', '705010162', '704010042', '704010042', '703010256', 
              '703010256', '703010159', '703010159', '703010131', '703010131', 
              '700010607', '700010470', '700010470', '700010077', 
              '700010077', '700010058', '700010058', '700010040', '700010040', 'cap256']

ds_label = [PPT2LABEL[ppt] for ppt in ds_patient]

ds_region = ['5', '3', '5',
             '3', '5', '3', 
             '5', '3', '5', '3', '5',
             '3', '5', '3', '5', '3',
             '3', '5', '3', '5',
             '3', '5', '3', '5', '3', '3']

pdata = {
    'patient_list': ds_patient,
    'label_list':   ds_label,
    'region_list':  ds_region,
    'fig_title':    'ed-fig-6-delta-s-hive',
}

fig.plot_supplementary_figure_delta_s_hive(**pdata)

706010164-5 ['GQVDCSPGIW', 'KEGHIARNCKA', 'PFRDYVDRF', 'EAMSQANNA']
706010164-3 ['NCYCKMCSY', 'EEIIIRSENL', 'KRQDILDLWVY', 'SFDPIPIHY', 'EEVGFPVRPQV']
705010198-5 ['TSTLQEQVAW']
705010198-3 ['KAAFDLSFF']
705010185-5 ['TPQDLNTML', 'GTEELRSLY']
705010185-3 []
705010162-5 ['GKEGHIAKN', 'VSRGIRKVL', 'EEMNLTGKW', 'VHKGIKVKD']
705010162-3 ['RIRKTAPTA', 'KPQVPLRPM', 'LQAVRIIKI', 'EILDLWVYH', 'HYFDCFAGS']
704010042-5 ['NETPGIRYQ', 'QMVHQALSP']
704010042-3 ['KQRVHALFY', 'REILDLWVY', 'DETLLQAVR', 'NYTDIIYRL']
703010256-5 ['NRETKMGKA', 'PIQLPEKDS', 'QMVHQPLSPR', 'DRFFKTLRA']
703010256-3 ['RNRSIRLVN', 'EDRWNKPQK', 'TAVPWDSSW', 'QLAHRHMAR', 'LVQDWGLEL', 'EPIDPNLEPW']
703010159-5 ['TPQDLNTML', 'NWMTDTLLI']
703010159-3 ['REVLIWKFD', 'PTEPVPFQL', 'FPRPWLHNL']
703010131-5 ['RKAKIIKDY', 'VKVIEEKAF']
703010131-3 ['SPLSFQTLI', 'FQKKGLGISY', 'RKAKIIKDY', 'EEVGFPVKPQV', 'CPKISFDPI', 'KTACNNCYC', 'VTVYYGVPV']
700010607-3 ['KRREILDLWVY']
700010470-5 ['QIYPGIKVK', 'GGKKKYQLK', 'ELYPMTSLK', 'RGRQKVVSL', 'DIKDTK

<a id='figs5'></a>

### Supplementary Figure 1. Distribution of the maximum observed change in frequency per day for highly influential variants versus other variants.

In [8]:
from imp import reload
reload(mp)
reload(fig)


df_range     = pd.read_csv('%s/range.csv' % (HIV_DIR), comment='#', memory_map=True)
dx_max_sweep = []
dx_max_other = []
in_epitope_sweep = []
in_epitope_other = []
xx = []
xy = []

for it, entry in df_range.iterrows():
    tag = str(entry.tag)
    
    df_tr = pd.read_csv('%s/analysis/%s-poly.csv' % (HIV_DIR, tag), comment='#', memory_map=True)
    times = np.sort(np.array([c.split('_')[-1] for c in df_tr.columns if 'f_at' in c], int))
    
    df_ds = pd.read_csv('%s/analysis/%s-delta-s.csv' % (HIV_DIR, tag), comment='#', memory_map=True)
    df_ds = df_ds[~((df_ds.mask_polymorphic_index==df_ds.target_polymorphic_index) 
                     & (df_ds.mask_nucleotide==df_ds.target_nucleotide))]
    site_ids = np.unique(df_ds.mask_polymorphic_index)

    for i in site_ids:
        df_site = df_ds[df_ds.mask_polymorphic_index==i]
        var_ids = np.unique(df_site.mask_nucleotide)
        for a in var_ids:
            df_var = df_site[df_site.mask_nucleotide==a]
            sum_ds = np.sum(np.fabs(df_var.effect))
            dx_max = 0
            
            df_tr_var = df_tr[(df_tr.polymorphic_index==i) & (df_tr.nucleotide==a)].iloc[0]
            for k in range(len(times)-1):
                dx = np.fabs(df_tr_var['f_at_%d'%times[k+1]] - df_tr_var['f_at_%d'%times[k]]) / (times[k+1]-times[k])
                if dx>dx_max:
                    dx_max = dx
            
            if sum_ds>0.4:
                dx_max_sweep.append(dx_max)
                if 'cap256' not in tag:
                    if pd.notnull(df_tr_var.epitope):
                        in_epitope_sweep.append(1)
                    else:
                        in_epitope_sweep.append(0)
                    
                print('%s\t%d%s\t%.3f\t%s' % (tag, i, a, dx_max, 'EPITOPE' if pd.notnull(df_tr_var.epitope) else ''))
                
            else:
                dx_max_other.append(dx_max)
                if 'cap256' not in tag:
                    if pd.notnull(df_tr_var.epitope):
                        in_epitope_other.append(1)
                    else:
                        in_epitope_other.append(0)

print('Highly influential variants in epitopes: %.1f%%' % (100 * np.sum(in_epitope_sweep)/len(in_epitope_sweep)))

pdata = {
    'dx_sweep':  dx_max_sweep,
    'dx_other':  dx_max_other,
    'fig_title': 'sup-fig-1-max-dx',
}

fig.plot_supplementary_figure_max_dx(**pdata)

706010164-5	23T	0.008	
706010164-5	44T	0.010	EPITOPE
705010198-3	194G	0.055	EPITOPE
705010185-3	225T	0.015	
705010162-3	101G	0.039	
705010162-3	266G	0.022	
704010042-5	29A	0.013	EPITOPE
704010042-5	30A	0.026	EPITOPE
704010042-3	96G	0.037	EPITOPE
704010042-3	874-	0.037	
703010256-5	9A	0.006	
703010256-5	172T	0.004	EPITOPE
703010256-3	62A	0.018	
703010256-3	353A	0.006	
703010159-5	3G	0.005	
703010159-5	34G	0.028	
703010159-5	179A	0.028	
703010159-3	370T	0.020	EPITOPE
703010159-3	371A	0.011	EPITOPE
703010159-3	399G	0.050	
703010159-3	403G	0.029	
703010159-3	424G	0.020	EPITOPE
703010131-5	3G	0.008	
703010131-3	31C	0.006	
703010131-3	32T	0.097	
703010131-3	52T	0.009	
703010131-3	126T	0.012	
703010131-3	263A	0.014	
703010131-3	352A	0.011	
703010131-3	602A	0.069	
703010131-3	603G	0.069	
703010131-3	605T	0.014	
703010131-3	619G	0.049	EPITOPE
703010131-3	620C	0.028	EPITOPE
703010131-3	620T	0.003	EPITOPE
703010131-3	621G	0.021	EPITOPE
703010131-3	624G	0.021	EPITOPE
700010607-3	199A	0.026	EPITOPE

<a id='fig5'></a>

### Figure 5. Escape from the KF9 epitope targeted by individual CH77.

In [9]:
from imp import reload
reload(mp)
reload(fig)


# Pass information to figure generator

pdata = {
    'patient':       '700010077',                    # patient
    'region':        '3',                            # sequencing region in which the epitope is located
    'inf_idxs':      [[35,'C'],[87,'A'],[124,'G'],   # locations of highly influential variants (polymorphic index)
                      [149,'A']],                    # note: 186C = 9040C, so it is already included by default
    'epitope':       'KAALDLSHF',                    # epitope sequence
    'epitope_range': [[        6000,         6027],  #   rev: DLLKTVRLI  # CH077 epitopes
                      [6225+(340*3), 6225+(349*3)],  # gp120: TLSHVVDKL
                      [6225+(351*3), 6225+(361*3)],  # gp120: QF-RNKTIVF
                      [7758+ (94*3), 7758+(103*3)],  #  gp41: TTVPWNVSW
                      [7758+(315*3), 7758+(324*3)],  #  gp41: DRVIEELQR
                      [7758+(327*3), 7758+(336*3)],  #  gp41: AVLNIPTRI
                      [8797+ (25*3), 8797+ (31*3)],  #   nef: DEPAAVGVG (start is earlier but inserted wrt HXB2)
                      [8797+ (81*3), 8797+ (90*3)],  #   nef: KAALDLSHF
                      [8797+(193*3), 8797+(202*3)]], #   nef: VAREIHPEF
    'epitope_label': ['DI9', 'TL9', 'QF9', 'TW9',    # epitope labels
                      'DR9', 'AI9', 'DG9', 'KF9',
                      'VF9'],
    'cov_label':     'KF9',                          # plot covariance links for sites in this epitope
    'label2ddr':     {'tat exon 2': 0.09,            # label adjustment for plotting
                      'rev exon 2': 0.13,
                      'QF9':        0.005,
                      'AI9':        0.09,
                      'TL9':        0.09,
                      'DI9':        0.13,
                      'DR9':        0.15  },
    'legend_loc':    'bottom',                       # location of the circle legend
    'traj_ticks':    [0, 80, 160],                   # tick marks for trajectory plot
    'sel_ticks':     [0, 0.05, 0.10],                # tick marks for selection plot
    'sel_minors':    [0.025, 0.075],                 # minor tick marks for selection plot
    'sel_space':     0.02,                           # spacing between MPL and independent selection plots
    'fig_title':     'fig-5-ch77-kf9'                 # title of figure
}

fig.plot_figure_ch77_kf9(**pdata)

0.0000	-0.0005	0.0048	0.0007	0.0020	0.0158	-0.0107	0.0022	-0.0053
-0.0023	0.0000	0.0036	0.0002	0.0008	0.0077	-0.0062	0.0011	0.0026
0.0209	0.0035	0.0000	0.0002	0.0007	0.0011	-0.0027	-0.0066	0.0000
0.0064	0.0005	0.0005	0.0000	0.0027	-0.0006	-0.0018	0.0003	0.0010
0.0098	0.0009	0.0008	0.0014	0.0000	-0.0021	-0.0032	0.0006	0.0008
0.02088378
GACCTGCTCAAGACAGTAAGATTAATC
ACTTTAAGCCATGTAGTTGACAAATTA
CAATTTAGG---AATAAAACAATAGTCTTT
ACTACTGTGCCTTGGAATGTTAGTTGG
GATAGGGTTATAGAAGAATTACAAAGG
GCTGTTCTTAACATACCTACAAGAATA
GATGAGCCAGCAGCAGTGGGGGTGGGA
AAGGCAGCTCTTGATCTTAGCCACTTT
GTAGCCCGAGAAATACATCCGGAGTTT
fig-5-ch77-kf9 done.


<a id='figs7-figs8'></a>

### Extended Data Figure 8. A simple example of clonal interference (CH58 epitope TW10).  
### Extended Data Figure 9. A complex example of clonal interference (CH131 epitope EV11).

In [10]:
from imp import reload
reload(mp)
reload(fig)


# Pass information to figure generator

pdata = {
    'patient':       '700010058',
    'region':        '5',
    'epitope':       'TSTLQEQIGW',
    'epitope_range': [[1186+ (14*3), 1186+ (23*3)],  #   p24: ISPRTLNAW  # CH058 epitopes
                      [1186+(107*3), 1186+(117*3)]], #   p24: TSTLQEQIGW
    'epitope_label': ['IW9', 'TW10'],
    'cov_label':     'TW10',
    'label2ddr':     {},
    'legend_loc':    'top', 
    'traj_ticks':    [0, 70, 140, 210, 280, 350],
    'sel_ticks':     [0, 0.06, 0.12],
    'sel_minors':    [0.03, 0.09],
    'sel_space':     0.06,  
    'fig_title':     'ed-fig-8-ch58-tw10'
}
fig.plot_supplementary_figure_epitope(**pdata)

print('')

pdata = {
    'patient':       '703010131',
    'region':        '3',
    'epitope':       'EEVGFPVKPQV',
    'epitope_range': [[5885, 5911+1],  #   tat: KTACNNCYC  # CH131 epitopes
                      [5942, 5971+1],  #   tat: FQKKGLGISY
                      [6330, 6356+1],  # gp120: VTVYYGVPV
                      [6837, 6863+1],  # gp120: CPKISFDPI
                      [8361, 8387+1],  #  gp41: SPLSFQTLI
                      [8989, 9018+1]], #   nef: EEVGFPVKPQV
    'epitope_label': ['KC9', 'FY10', 'VV9', 'CI9', 'SI9', 'EV11'],
    'cov_label':     'EV11',
    'label2ddr':     {'KC9':        0.065,
                      'rev exon 1': 0.225,
                      'FY10':       0.11,
                      'CI9':        0.005,
                      'tat exon 2': 0.10,            
                      'rev exon 2': 0.135,
                      'SI9':        0.15,
                      'EV11':       0.12  },
    'legend_loc':    'top',
    'traj_ticks':    [0, 70, 140, 210, 280, 350],
    'sel_ticks':     [0, 0.05, 0.10],
    'sel_minors':    [0.025, 0.075],
    'sel_space':     0.02, 
    'fig_title':     'ed-fig-9-ch131-ev11'
}
fig.plot_supplementary_figure_epitope(**pdata)

ATATCGCCTAGAACTTTAAATGCATGG
ACTAGTACCCTTCAGGAACAAATAGGATGG
ed-fig-8-ch58-tw10 done.

AAGACTGCTTGTAATAATTGTTATTGT
TTTCAGAAAAAAGGCTTAGGCATTTCCTAT
GTCACAGTCTATTATGGGGTACCTGTA
TGTCCAAAGATCTCGTTTGATCCGATT
TCACCTTTATCGTTCCAGACCCTTATC
GAGGTAGGCTTTCCAGTCAAACCCCAGGTG
ed-fig-9-ch131-ev11 done.


<a id='fig6-figs9'></a>

### Figure 6. HIV evolution in individual CAP256.  
### Supplementary Figure 2. Recombination between primary and superinfecting strains in individual CAP256.

In [11]:
from imp import reload
reload(fig)
reload(mp)


# Pass information to figure generator

pdata = {
    'patient': 'cap256',     # patient
    'region':  '3',          # sequencing region in which the epitope is located
}

df = pd.read_csv('%s/analysis/total-selection.csv' % (HIV_DIR), comment='#', memory_map=True)
df = df[df.ppt!='cap256']
print('Percentile of the most strongly selected VRC26 resistance variant: %.1f%%\n' 
      % (100*np.sum(df.s_MPL>0.012)/float(len(df))))

fig.plot_figure_cap256_vrc26(**pdata)
fig.plot_supplementary_figure_cap256_recombination(**pdata)

Percentile of the most strongly selected VRC26 resistance variant: 8.1%

synonymous:	0 (MPL)	6 (SL)	 of 11 total
variant	s_MPL
6709C	0.041
6717T	-0.012
6730C	0.003
6730G	0.006
6730T	0.010
6731T	0.003

AATACAATCACAGAGGTAAGAGATAAGCAAAAGAA
CAP256-VRC26 done.


  m_loc = np.array(m_loc)
  m_c   = np.array(m_c)


CAP256 recombination done.


<a id='figs11'></a>

### Extended Data Figure 10. Comparison between selection coefficients inferred using different conditions for data processing.

In [12]:
from imp import reload
reload(fig)
reload(mp)


# Pass information to figure generator

pdata = {
    'columns': ['reference',         # columns in the .csv file specifying different data processing conventions
                'max-dt-200',
                'max-dt-400',
                'max-gap-freq-80',
                'max-gap-freq-99',
                'max-gap-num-50',
                'max-gap-num-500',
                'min-seqs-2',
                'min-seqs-6',
                'no-ambiguous'], 
    'labels':  ['Reference',         # corresponding labels for plotting
                'Max\n' + r'$\Delta t$' + ' = 200',
                'Max\n' + r'$\Delta t$' + ' = 400',
                'Max gap\nfreq. = 80%',
                'Max gap\nfreq. = 99%',
                'Max gap\nnum. = 50',
                'Max gap\nnum. = 500',
                'Min\nseqs. = 2',
                'Min\nseqs. = 6',
                'Remove\nambiguous'],      
}

fig.plot_supplementary_figure_s_conditions(**pdata)

choice 1		choice 2		R	R^2	p
reference		max-dt-200		0.99	0.98	0.00e+00
reference		max-dt-400		0.99	0.99	0.00e+00
reference		max-gap-freq-80		1.00	1.00	0.00e+00
reference		max-gap-freq-99		1.00	1.00	0.00e+00
reference		max-gap-num-50		1.00	0.99	0.00e+00
reference		max-gap-num-500		1.00	0.99	0.00e+00
reference		min-seqs-2		1.00	0.99	0.00e+00
reference		min-seqs-6		0.99	0.98	0.00e+00
reference		no-ambiguous		0.99	0.99	0.00e+00
max-dt-200		max-dt-400		0.99	0.97	0.00e+00
max-dt-200		max-gap-freq-80		0.99	0.98	0.00e+00
max-dt-200		max-gap-freq-99		0.99	0.98	0.00e+00
max-dt-200		max-gap-num-50		0.99	0.98	0.00e+00
max-dt-200		max-gap-num-500		0.99	0.98	0.00e+00
max-dt-200		min-seqs-2		0.99	0.98	0.00e+00
max-dt-200		min-seqs-6		0.98	0.97	0.00e+00
max-dt-200		no-ambiguous		0.99	0.97	0.00e+00
max-dt-400		max-gap-freq-80		0.99	0.99	0.00e+00
max-dt-400		max-gap-freq-99		0.99	0.99	0.00e+00
max-dt-400		max-gap-num-50		0.99	0.98	0.00e+00
max-dt-400		max-gap-num-500		0.99	0.98	0.00e+00
max-dt-400		min-s