# <font color = red> HIV Analysis
</font> Here we combine HIV sequence data from [the Los Alamos National Laboratory HIV Sequence Database](https://www.hiv.lanl.gov/content/index) and immunological data to investigate HIV evolution across 13 individuals. This data is contained in the data/HIV/ directory.

## Contents
- ### [Libraries and variables](#lib)
- ### Data analyze
    - #### [Escape information](#escape)
    - #### [Write shell script](#mpl)
    - #### [Deal with data](#deal)
    - #### [Calculate Δs_ij](#sij)

### <a id='lib'></a> Libraries and variables

In [5]:
print('This notebook was prepared using:')

import os
import sys
print('python version %s' % sys.version)

import numpy as np
print('numpy version %s' % np.__version__)

import pandas as pd
print('pandas version %s' % pd.__version__)

import math## <a id='lib'></a> Libraries and variables
from math import isnan

import matplotlib
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import matplotlib.image as mpimg
print('matplotlib version %s' % matplotlib.__version__)

import re
import sys
import argparse
import scipy as sp
try:
    import itertools.izip as zip
except ImportError:
    import itertools
import random

from scipy import integrate
import scipy.interpolate as sp_interpolate
import statistics

from dataclasses import dataclass
import time as time_module

# GitHub
# HIV_DIR = 'data/HIV200'
HIV_DIR = 'data/HIV'
SIM_DIR = 'data/simulation'
FIG_DIR = 'figures'
MPL_DIR = 'src'

NUC = ['-', 'A', 'C', 'G', 'T']


This notebook was prepared using:
python version 3.10.9 | packaged by conda-forge | (main, Feb  2 2023, 20:26:08) [Clang 14.0.6 ]
numpy version 1.24.2
pandas version 1.5.3
matplotlib version 3.7.1


#### <a id='escape'></a> Escape information

1. all epitopes just had one or two escape mutations (700010077-5,700010058-5,700010040-5,706010164-5,704010042-5)
2. all epitopes have more than two escape mutations (700010077-3,700010058-3,700010607-3,705010198-3,705010185-5,703010159-3) 
3. combination of the above (700010470-3 ,700010470-5 ,700010040-3,706010164-3, 705010162-3, 705010162-5, 704010042-3, 703010256-3, 703010256-5, 703010131-3, 703010131-5)
4. no mutation related to escape (700010607-5,705010198-5,705010185-3,703010159-5)

tag|sequence<br>length|escape<br>group|variants|max<br>time|time<br>step|gamma<br>s|raw<br>time|special<br>sites
:----:|:----:|:---:|:-----:|:----|:----|:----:|:----:|:----:
700010040-3|303|3|314|552|20|0.018|[0, 16, 45, 111, 181, 283, 412, 552]|[245, 246]
700010040-5|146|0|146|552|5 |0.018|[0, 16, 45, 111, 181, 283, 412, 552]|[31, 32, 39, 104]
700010058-3|90 |1|91 |85 |1 |0.118|[0, 8, 45, 85]|[]
700010058-5|96 |0|96 |350|1 |0.029|[0, 8, 45, 85, 154, 239, 252, 350]|[13, 16, 17]
700010077-3|203|6|221|159|5 |0.063|[0, 14, 32, 102, 159]|[]
700010077-5|48 |0|49 |159|1 |0.063|[0, 14, 32, 159]|[1, 4, 5, 47]
700010470-3|367|3|405|420|20|0.024|[0, 13, 41, 69, 174, 420]|[40, 334]
700010470-5|193|3|198|454|5 |0.022|[0, 13, 41, 69, 174, 420, 454]|[8, 9, 124]
700010607-3|239|1|252|21 |1 |0.476|[0, 9, 14, 21]|[]
700010607-5|78 |0|78 |21 |1 |     |[0, 9, 14, 21]|no mutation related to epitope	
703010131-3|744|1|807|333|20|0.030|[0, 21, 28, 34, 63, 91, 175, 273, 333]|[0, 45, 46, 47, 76, 189, 560, 561]
703010131-5|261|1|265|333|5 |0.030|[0, 21, 28, 34, 63, 91, 175, 273, 333]|[259]
703010159-3|477|2|517|302|20|0.033|[0, 8, 12, 22, 29, 56, 85, 302]|[]
703010159-5|216|0|224|302|5 |     |[0, 8, 12, 22, 29, 56, 85, 302]|no mutation related to epitope	
703010256-3|463|4|510|684|20|0.015|[0, 28, 63, 172, 426, 684]|[45, 46, 316, 317]
703010256-5|402|1|410|684|20|0.015|[0, 28, 63, 172, 426, 684]|[42, 208]
704010042-3|875|2|1074|676|20|0.015|[0, 21, 60, 172, 424, 676]|[95, 96, 765, 766]
704010042-5|266|0|272|676|5|0.015|[0, 21, 60, 172, 424, 676]|[29, 30, 125]
705010162-3|508|2|562|438|20|0.023|[0, 21, 77, 179, 438]|[33, 34, 327, 328]
705010162-5|254|3|259|438|5 |0.023|[0, 21, 77, 179, 438]|[155]
705010185-3|292|0|303|416|5 |     |[0, 25, 67, 180, 416]|no mutation related to epitope	
705010185-5|85 |1|89 |67 |1 |0.149|[0, 25, 67]|[]
705010198-3|204|1|223|60 |1 |0.167|[0, 11, 60]|[]
705010198-5|72 |0|73 |60 |1 |     |[0, 11, 60]|no mutation related to epitope	
706010164-3|485|3|532|434|20|0.023|[0, 14, 28, 70, 183, 434]|[463]
706010164-5|204|0|205|434|5 |0.023|[0, 14, 28, 70, 183, 434]|[44, 50, 51, 170]

Write sh file for HIV data

In [3]:
infer_tags = ['700010040-3','700010040-5','700010058-3','700010058-5','700010077-3','700010077-5',
              '700010470-3','700010470-5','700010607-3','703010131-3','703010131-5','703010159-3',
              '703010256-3','703010256-5','704010042-5','705010162-3','705010162-5',
              '705010185-5','705010198-3','706010164-3','706010164-5'] #

f = open("%s/HIV_run.sh"%MPL_DIR,'w')
f.write('#!/bin/bash\n')
f.write('cd ..\n')

# f.write('tags=(')
# for tag in infer_tags:
#     f.write('"%s" '%tag)
# f.write(')\n')
# f.write('for tag in "${tags[@]}"\n')
# f.write('do\n')
# f.write('   python ../inference_HIV.py -tag %s --raw & \n'%(tag))
# f.write('done\n')
# f.write('wait\n')
# f.write('echo "All scripts have been attempted."')
# f.close()  

for i in range(len(infer_tags)):
    tag = infer_tags[i]
    f.write('\tpython inference_HIV.py -tag %s --raw'%(tag))
    f.write(' || echo "CH%s failed, continuing with next script..." \n'%tag[-5:])
f.write('echo "All scripts have been attempted."')
f.close()  

python inference_HIV.py -tag '704010042-3' --raw
python inference_HIV.py -tag '700010077-5'

In [2]:
infer_tags = ['700010040-3','700010040-5','700010058-3','700010058-5','700010077-3','700010077-5',
              '700010470-3','700010470-5','700010607-3','703010131-3','703010131-5','703010159-3',
              '703010256-3','703010256-5','704010042-5','705010162-3','705010162-5',
              '705010185-5','705010198-3','706010164-3','706010164-5'] #'704010042-3'

f = open("%s/HIV_run_nog.sh"%MPL_DIR,'w')
f.write('#!/bin/bash\n')
f.write('cd ..\n')

for i in range(len(infer_tags)):
    tag = infer_tags[i]
    f.write('\tpython inference_HIV.py -tag %s -g1 0'%(tag))
    f.write(' || echo "CH%s failed, continuing with next script..." \n'%tag[-5:])
f.write('echo "All scripts have been attempted."')
f.close()  

### See the influence of the length of extended time
Use gamma = 0, and s(boundary) = 0 instead of s'(boundary) = 0

In [62]:
try_tags = ['700010040-3', '700010058-3','700010077-3','703010131-5','705010185-5']

theta_range = [0.5, 2, 5, 10]

f = open("%s/HIV_nog_try.sh"%MPL_DIR,'w')
f.write('#!/bin/bash\n')
f.write('cd ..\n')
for i in range(len(try_tags)):
    tag = try_tags[i]
    for theta in theta_range:
        f.write('python inference_HIV.py -tag %s -g1 0 -theta %s'%(tag, theta))
        f.write(' || echo "CH%s failed, continuing with next script..." \n'%tag[-5:])
f.write('echo "All scripts have been attempted."')
f.close()  

Dealing with mpl results (Normalize the selection coefficients such that the TF nucleotide at each site has a selection coefficient of zero) and record information in analyze files.

- /analysis/'tag'-analyze.csv
    - coefficients in 2 cases (old: constant case; new: time-varying case)
    - standard deviation for the selection coefficient
    - allele frequencies over time
- /epitopes/'tag'-trait.csv
    - individual information 
    - trait frequencies over time
    - escape coefficients over time

Use a very large $\gamma^{\prime}$ to get a flat selection coefficients. Use the average value of it. 

In [3]:
import epitope
import importlib
importlib.reload(epitope)

infer_tags = ['700010040-3','700010040-5','700010058-3','700010058-5','700010077-3','700010077-5',
          '700010470-3','700010470-5','700010607-3','703010131-3','703010131-5','703010159-3',
          '703010256-3','703010256-5','704010042-5','705010162-3','705010162-5',
          '705010185-5','705010198-3','706010164-3','706010164-5'] #'704010042-3'

HIV_DIR = 'data/HIV'
output  = 'output_nog'

# for tag in infer_tags:
#     epitope.analyze_result(HIV_DIR,output,tag)


In [7]:
tag = '700010040-3'
df_epi    = pd.read_csv('%s/epitopes/escape_group-%s.csv' %(HIV_DIR,tag), comment='#', memory_map=True)
cols_time = [i for i in list(df_epi) if 'f_at_' in i]
cols_xp   = [i for i in list(df_epi) if 'xp_at_' in i]
times     = [int(cols_time[i].split('_')[-1]) for i in range(len(cols_time))]

df_tv     = pd.read_csv('%s/%s/%s-tv.csv' %(HIV_DIR,output,tag), comment='#', memory_map=True)
cols      = [i for i in list(df_tv) if 'f_at_' not in i]

# get selection coefficient
data_pro     = np.load('%s/rawdata/rawdata_%s.npz'%(HIV_DIR,tag), allow_pickle="True")
time_step    = data_pro['time_step']

data_tc      = np.load('%s/%s/c_%s_%d.npz'%(HIV_DIR,output,tag,time_step), allow_pickle="True")
sc_tv_all    = data_tc['selection']# time range:times
muVec        = data_pro['muVec']

g = open('%s/%s/escape_group-%s.csv'%(HIV_DIR,output,tag),'w')
g.write('%s,' % (','.join(cols)))
g.write('%s' % (','.join(cols_time)))
for t in range(len(times)):
    g.write(',sc_at_%s'%times[t])
g.write('\n')

epitopes = df_epi['epitope'].unique()
for n in range(len(epitopes)):
    df_epi_n = df_epi[df_epi['epitope'] == epitopes[n]]
    for i in range(len(df_epi_n)):
        site_index = df_epi_n.iloc[i].polymorphic_index
        HXB2_index = df_epi_n.iloc[i].HXB2_index
        nucleotide = df_epi_n.iloc[i].nucleotide
        TF         = df_epi_n.iloc[i].TF
        consensus  = df_epi_n.iloc[i].consensus
        epitope    = df_epi_n.iloc[i].epitope
        escape     = df_epi_n.iloc[i].escape
        sc_c       = df_epi_n.iloc[i].sc_MPL
        
        index_mu   = muVec[site_index,NUC.index(nucleotide)]
        index_TF   = muVec[site_index,NUC.index(TF)]

        sc_tv      = sc_tv_all[int(index_mu)] - sc_tv_all[int(index_TF)]
        sc_mean    = np.average(sc_tv)
        sc_sigma   = statistics.stdev(sc_tv)
                
        g.write('%d,%s,%s,%s,' % (site_index, HXB2_index, nucleotide, TF))
        g.write('%s,%s,%s,%f,%s,%s,' % (consensus, epitope, escape, sc_c, sc_mean, sc_sigma))
        g.write('%s' % (','.join([str(df_epi_n.iloc[i][c]) for c in cols_time])))        
        for t in range(len(times)):
            g.write(',%f'%sc_tv[t])
        g.write('\n')
    
    sc_tv_n  = sc_tv_all[-(len(epitopes)-n)]
    sc_c = df_epi_n.iloc[0].tc_MPL
    g.write(' , , , , %s, , , %s, , ,'%(epitope,sc_c))
    g.write('%s' % (','.join([str(df_epi_n.iloc[0][c]) for c in cols_xp])))
    for t in range(len(times)):
        g.write(',%f'%sc_tv_n[t])
    g.write('\n')
    

### Write sh file for HIV data with shorter time - 200

In [50]:
import epitope
import importlib
importlib.reload(epitope)
    
min_n = 2
tags = ['700010040-3', '700010040-5', '700010058-3', '700010058-5', '700010077-3', '700010077-5',
        '700010470-3', '700010470-5', '700010607-3', '700010607-5', '703010131-3', '703010131-5', 
        '703010159-3', '703010159-5', '703010256-3', '703010256-5', '704010042-3', '704010042-5', 
        '705010162-3', '705010162-5', '705010185-3', '705010185-5', '705010198-3', '705010198-5', 
        '706010164-3', '706010164-5']  

HIV_DIR_200 = 'data/HIV200'
for tag in tags:
    epitope.find_trait_site(tag,min_n,HIV_DIR_200)   

	mutant at site 0 in codon for CH131-3 that does not terminate in alignment


In [51]:
for tag in tags:
    traitsite = epitope.read_file(HIV_DIR_200,'traitsite/traitsite-'+tag+'.dat')
    df = pd.read_csv('%s/interim/%s-escape.csv' %(HIV_DIR_200,tag), comment='#', memory_map=True)
    for i in range(len(traitsite)):
        for j in range(len(traitsite[i])):
            n_ij   = df[df['polymorphic_index'] == traitsite[i][j]]
            for a in range(len(n_ij)):
                if n_ij.iloc[a].escape == False and n_ij.iloc[a].nucleotide != n_ij.iloc[a].TF:
                    print(f'CH{tag[-5:]} {n_ij.iloc[a].polymorphic_index} {n_ij.iloc[a].nucleotide}')

CH077-3 35 C
CH077-3 161 A
CH162-3 194 T
CH162-3 198 T


In [52]:
special_tags = ['700010077-3', '705010162-3']
NUC = ['-', 'A', 'C', 'G', 'T']

# case 1: CH077-3 35C and 161A
tag = '700010077-3'
traitseq  = epitope.read_file(HIV_DIR_200,'traitseq/traitseq-'+tag+'.dat')
traitsite = epitope.read_file(HIV_DIR_200,'traitsite/traitsite-'+tag+'.dat')
f = open("%s/input/traitseq/traitallele-%s.dat"%(HIV_DIR_200,tag),'w')
for i in range(len(traitsite)):
    line_content = [
        f'{traitseq[i][j]}/{NUC.index("C")}' if traitsite[i][j] == 35 else
        f'{traitseq[i][j]}/{NUC.index("A")}' if traitsite[i][j] == 161 else
        str(traitseq[i][j])
        for j in range(len(traitsite[i]))
    ]
    
    line = '\t'.join(line_content)
    f.write(line + '\n')
f.close()

# case 2: CH162-3 194T and 198T
tag = '705010162-3'
traitseq  = epitope.read_file(HIV_DIR_200,'traitseq/traitseq-'+tag+'.dat')
traitsite = epitope.read_file(HIV_DIR_200,'traitsite/traitsite-'+tag+'.dat')
f = open("%s/input/traitseq/traitallele-%s.dat"%(HIV_DIR_200,tag),'w')
for i in range(len(traitsite)):
    line_content = [
        f'{traitseq[i][j]}/{NUC.index("T")}' if traitsite[i][j] == 194 else
        f'{traitseq[i][j]}/{NUC.index("T")}' if traitsite[i][j] == 198 else
        str(traitseq[i][j])
        for j in range(len(traitsite[i]))
    ]
    
    line = '\t'.join(line_content)
    f.write(line + '\n')
f.close()   

In [53]:
for tag in tags:
    epitope.analyze_result(tag,HIV_DIR_200) 

input sequences changed:\
CH470-3, CH470-5, CH159-3, CH256-3, CH256-5, CH042-3, CH042-5, CH162-3, CH162-5, CH164-3, CH164-5\
trait sites changed:\
CH470-5, CH256-3, CH256-5, CH042-3, CH162-5, CH164-3\
trait sites NOT changed:\
CH470-3, CH159-3, CH042-5, CH162-3, CH164-5

In [54]:
import inference_HIV as HIV

importlib.reload(HIV)

tags_200 = ['700010470-3','700010470-5','703010159-3','703010256-3','703010256-5','704010042-3',
            '704010042-5','705010162-3','705010162-5','706010164-3','706010164-5']

for i in range(len(tags_200)):
    tag = tags[i]
    result = HIV.AnalyzeData(tag,HIV_DIR_200)
    sample_times = result.uniq_t
    gamma  = round(10/sample_times[-1],3)
    print('%s|%d|%d|%d|%d|%d|%s|'%(tag,result.seq_length,len(result.escape_group),result.variants,\
                                   result.uniq_t[-1],result.time_step,gamma),end='')
    print([int(i) for i in result.uniq_t],end='')
    print('|',end='')
    print([int(i) for i in result.special_sites],end='')
    print('|')

700010040-3|303|3|314|552|20|0.018|[0, 16, 45, 111, 181, 283, 412, 552]|[245, 246]|
700010040-5|146|0|146|552|5|0.018|[0, 16, 45, 111, 181, 283, 412, 552]|[31, 32, 39, 104]|
700010058-3|90|1|91|85|1|0.118|[0, 8, 45, 85]|[]|
700010058-5|96|0|96|350|1|0.029|[0, 8, 45, 85, 154, 239, 252, 350]|[13, 16, 17]|
700010077-3|203|6|221|159|5|0.063|[0, 14, 32, 102, 159]|[]|
700010077-5|48|0|49|159|1|0.063|[0, 14, 32, 159]|[1, 4, 5]|
700010470-3|282|3|301|174|5|0.057|[0, 13, 41, 69, 174]|[32]|
700010470-5|139|1|141|174|5|0.057|[0, 13, 41, 69, 174]|[8, 17, 57, 59, 90]|
700010607-3|239|1|252|21|1|0.476|[0, 9, 14, 21]|[]|
700010607-5|78|0|78|21|1|0.476|[0, 9, 14, 21]|[]|
703010131-3|744|1|807|333|20|0.03|[0, 21, 28, 34, 63, 91, 175, 273, 333]|[45, 46, 47, 76, 189, 560, 561]|



tag|sequence<br>length|escape<br>group|variants|max<br>time|time<br>step|gamma<br>s|raw<br>time|special<br>sites
:----:|:----:|:---:|:-----:|:----|:----|:----:|:----:|:----:
700010470-3|282|3|301|174|5|0.057|[0, 13, 41, 69, 174]     |[32]|
700010470-5|139|1|141|174|5|0.057|[0, 13, 41, 69, 174]     |[8, 17, 57, 59, 90]|
703010159-3|312|2|322|85|1|0.118|[0, 8, 12, 22, 29, 56, 85]|[]|
703010256-3|175|3|180|172|5|0.058|[0, 28, 63, 172]         |[20, 105, 159, 160]|
703010256-5|236|1|238|172|5|0.058|[0, 28, 63, 172]         |[]|
704010042-3|499|1|585|172|20|0.058|[0, 21, 60, 172]        |[20, 21, 51, 52, 426]|
704010042-5|80|0|80|172|1|0.058|[0, 21, 60, 172]           |[9, 10]|
705010162-3|394|2|404|179|20|0.056|[0, 21, 77, 179]        |[18, 19, 224]|
705010162-5|135|2|135|179|5|0.056|[0, 21, 77, 179]         |[2, 3]|
706010164-3|275|3|280|183|5|0.055|[0, 14, 28, 70, 183]     |[269]|
706010164-5|109|0|110|183|5|0.055|[0, 14, 28, 70, 183]     |[27, 93]|


In [59]:
tags_200 = ['700010470-3','700010470-5','703010159-3','703010256-3','703010256-5','704010042-3',
            '704010042-5','705010162-3','705010162-5','706010164-3','706010164-5']

f = open("%s/HIV_200.sh"%MPL_DIR,'w')
f.write('#!/bin/bash\n')
f.write('cd ..\n')

for tag in tags_200:
    f.write('python inference_HIV.py -tag %s -dir \'data/HIV200\' --raw'%(tag))
    f.write(' || echo "CH%s failed, continuing with next script..." \n'%tag[-5:])
f.write('echo "All scripts have been attempted."')
f.close()  

### Write sh file for HIV data with shorter time - 250

In [38]:
import epitope
import importlib
importlib.reload(epitope)
    
min_n = 2
tags = ['700010040-3', '700010040-5', '700010058-3', '700010058-5', '700010077-3', '700010077-5',
        '700010470-3', '700010470-5', '700010607-3', '700010607-5', '703010131-3', '703010131-5', 
        '703010159-3', '703010159-5', '703010256-3', '703010256-5', '704010042-3', '704010042-5', 
        '705010162-3', '705010162-5', '705010185-3', '705010185-5', '705010198-3', '705010198-5', 
        '706010164-3', '706010164-5']  

HIV_DIR_250 = 'data/HIV250'
for tag in tags:
    epitope.find_trait_site(tag,min_n,HIV_DIR_250)   

	mutant at site 0 in codon for CH131-3 that does not terminate in alignment


In [39]:
for tag in tags:
    traitsite = epitope.read_file(HIV_DIR_250,'traitsite/traitsite-'+tag+'.dat')
    df = pd.read_csv('%s/interim/%s-escape.csv' %(HIV_DIR_250,tag), comment='#', memory_map=True)
    for i in range(len(traitsite)):
        for j in range(len(traitsite[i])):
            n_ij   = df[df['polymorphic_index'] == traitsite[i][j]]
            for a in range(len(n_ij)):
                if n_ij.iloc[a].escape == False and n_ij.iloc[a].nucleotide != n_ij.iloc[a].TF:
                    print(f'CH{tag[-5:]} {n_ij.iloc[a].polymorphic_index} {n_ij.iloc[a].nucleotide}')

CH077-3 35 C
CH077-3 161 A
CH162-3 194 T
CH162-3 198 T


In [40]:
special_tags = ['700010077-3', '705010162-3']
NUC = ['-', 'A', 'C', 'G', 'T']

# case 1: CH077-3 35C and 161A
tag = '700010077-3'
traitseq  = epitope.read_file(HIV_DIR_250,'traitseq/traitseq-'+tag+'.dat')
traitsite = epitope.read_file(HIV_DIR_250,'traitsite/traitsite-'+tag+'.dat')
f = open("%s/input/traitseq/traitallele-%s.dat"%(HIV_DIR_250,tag),'w')
for i in range(len(traitsite)):
    line_content = [
        f'{traitseq[i][j]}/{NUC.index("C")}' if traitsite[i][j] == 35 else
        f'{traitseq[i][j]}/{NUC.index("A")}' if traitsite[i][j] == 161 else
        str(traitseq[i][j])
        for j in range(len(traitsite[i]))
    ]
    
    line = '\t'.join(line_content)
    f.write(line + '\n')
f.close()

# case 2: CH162-3 194T and 198T
tag = '705010162-3'
traitseq  = epitope.read_file(HIV_DIR_250,'traitseq/traitseq-'+tag+'.dat')
traitsite = epitope.read_file(HIV_DIR_250,'traitsite/traitsite-'+tag+'.dat')
f = open("%s/input/traitseq/traitallele-%s.dat"%(HIV_DIR_250,tag),'w')
for i in range(len(traitsite)):
    line_content = [
        f'{traitseq[i][j]}/{NUC.index("T")}' if traitsite[i][j] == 194 else
        f'{traitseq[i][j]}/{NUC.index("T")}' if traitsite[i][j] == 198 else
        str(traitseq[i][j])
        for j in range(len(traitsite[i]))
    ]
    
    line = '\t'.join(line_content)
    f.write(line + '\n')
f.close()

In [41]:
for tag in tags:
    epitope.analyze_result(tag,HIV_DIR_250) 

input sequences changed:\
CH256-3, CH256-5, CH042-3, CH042-5, CH162-3, CH162-5, CH164-3, CH164-5\
trait sites changed:\
CH256-3, CH256-5, CH042-3, CH162-5, CH164-3\
trait sites NOT changed:\
CH042-5, CH162-3, CH164-5

In [47]:
import inference_HIV as HIV

importlib.reload(HIV)

tags_250 = ['703010256-3','703010256-5','704010042-3','704010042-5',
            '705010162-3','705010162-5','706010164-3','706010164-5']

for i in range(len(tags_250)):
    tag = tags[i]
    result = HIV.AnalyzeData(tag,HIV_DIR_250)
    sample_times = result.uniq_t
    gamma  = round(10/sample_times[-1],3)
    print('%s|%d|%d|%d|%d|%d|%s|'%(tag,result.seq_length,len(result.escape_group),result.variants,\
                                   result.uniq_t[-1],result.time_step,gamma),end='')
    print([int(i) for i in result.uniq_t],end='')
    print('|',end='')
    print([int(i) for i in result.special_sites],end='')
    print('|')

703010256-3|175|3|180|172|5|0.058|[0, 28, 63, 172]|[20, 105, 159, 160]|
703010256-5|236|1|238|172|5|0.058|[0, 28, 63, 172]|[]|
704010042-3|499|1|585|172|20|0.058|[0, 21, 60, 172]|[20, 21, 51, 52, 426]|
704010042-5|80|0|80|172|1|0.058|[0, 21, 60, 172]|[9, 10]|
705010162-3|394|2|404|179|20|0.056|[0, 21, 77, 179]|[18, 19, 224]|
705010162-5|135|2|135|179|5|0.056|[0, 21, 77, 179]|[2, 3]|
706010164-3|275|3|280|183|5|0.055|[0, 14, 28, 70, 183]|[269]|
706010164-5|109|0|110|183|5|0.055|[0, 14, 28, 70, 183]|[27, 93]|


tag|sequence<br>length|escape<br>group|variants|max<br>time|time<br>step|gamma<br>s|raw<br>time|special<br>sites
:----:|:----:|:---:|:-----:|:----|:----|:----:|:----:|:----:
703010256-3|175|3|180|172|5|0.058|[0, 28, 63, 172]         |[20, 105, 159, 160]|
703010256-5|236|1|238|172|5|0.058|[0, 28, 63, 172]         |[]|
704010042-3|499|1|585|172|20|0.058|[0, 21, 60, 172]        |[20, 21, 51, 52, 426]|
704010042-5|80|0|80|172|1|0.058|[0, 21, 60, 172]           |[9, 10]|
705010162-3|394|2|404|179|20|0.056|[0, 21, 77, 179]        |[18, 19, 224]|
705010162-5|135|2|135|179|5|0.056|[0, 21, 77, 179]         |[2, 3]|
706010164-3|275|3|280|183|5|0.055|[0, 14, 28, 70, 183]     |[269]|
706010164-5|109|0|110|183|5|0.055|[0, 14, 28, 70, 183]     |[27, 93]|


In [63]:
tags_250 = ['703010256-3','703010256-5','704010042-3','704010042-5',
            '705010162-3','705010162-5','706010164-3','706010164-5']

f = open("%s/HIV_250.sh"%MPL_DIR,'w')
f.write('#!/bin/bash\n')
f.write('cd ..\n')

for tag in tags_250:
    f.write('python inference_HIV.py -tag %s -dir \'data/HIV250\' --raw'%(tag))
    f.write(' || echo "CH%s failed, continuing with next script..." \n'%tag[-5:])
f.write('echo "All scripts have been attempted."')
f.close()  