# FitSeq2.0
In this notebook we walk through an example use case, generating data from simulation and then inferring fitness. Though we expect most users will want to use our code to infer fitness on their own experimental datasets, we include instructions on how to generate simulated data for the sake of completeness.

1. First we install Python packages that required

In [None]:
!pip install -r ../requirements.txt

2. Then we generate the input files required to run a simulation.

In [1]:
# input for fitmutsimu_run.py
import numpy as np
import itertools
import csv

dir_simulation = './'

lineages_num = int(1e4)

delta_t = 4
t_seq = np.arange(0, delta_t*6, delta_t)

tmp_1 = {'0':t_seq, 
         '1':20*np.ones(np.shape(t_seq)), 
         '2':50*np.ones(np.shape(t_seq)), 
         '3':100*np.ones(np.shape(t_seq))}

tmp = list(itertools.zip_longest(*list(tmp_1.values())))
with open(dir_simulation + 'simu_input_time_points.csv', 'w') as f:
    w = csv.writer(f)
    w.writerows(tmp)
        
        
        
####################
# initial cell number distribution
np.random.seed(10)
param_a = 100
param_b = 2.5
mean_value = param_a
variance_value = 2 * param_a * param_b
param_k = mean_value**2/variance_value
param_theta = variance_value/mean_value
n0 = np.random.gamma(param_k, param_theta, lineages_num).astype('int') 


step_size = 0.001
np.random.seed(5)
s = np.random.normal(0, 0.15, lineages_num)
s_norm = s - np.dot(s, n0)/np.sum(n0) # normalize the fitness to relative fitness (relative to the mean fitness)
    
s_lim_left, s_lim_right = -1, 1
s_norm[s_norm < s_lim_left] = s_lim_left
s_norm[s_norm > s_lim_right] = s_lim_right
       
tmp_2 = {'0':s_norm, '1':n0}
tmp = list(itertools.zip_longest(*list(tmp_2.values())))
with open(dir_simulation + 'simu_input_fitness.csv', 'w') as f:
    w = csv.writer(f)
    w.writerows(tmp)
        


3. Then we run the commend (fitseqsimu_run.py) to generate simulated data.

In [2]:
import os

dir_simulation = './'
dir_code = '../main_code/'
        
os.system('python ' + dir_code + 'fitseqsimu_run.py -t '
              + dir_simulation + 'simu_input_time_points.csv -s '
              + dir_simulation + 'simu_input_fitness.csv -o '
              + dir_simulation + 'test')


0

4. Next, we generate the input files required to run the inference algorithm.

In [3]:
# input for fitseq2_run.py
import numpy as np
import pandas as pd
import itertools
import csv

dir_simulation = './'
dir_result = './'

csv_input = pd.read_csv(dir_simulation + 'simu_input_time_points.csv', low_memory=False, header=None)
t_seq = np.array(csv_input[0][~pd.isnull(csv_input[0])], dtype=float)

csv_input = pd.read_csv(dir_simulation + 'simu_input_fitness.csv', low_memory=False, header=None)
n0_array = np.array(csv_input[1][~pd.isnull(csv_input[1])], dtype=float)
cell_depth_seq = np.sum(n0_array) * np.ones(t_seq.shape) # total cell number at the bottleneck
    
input_tmp = {'0':t_seq, '1':cell_depth_seq}
tmp = list(itertools.zip_longest(*list(input_tmp.values())))
with open(dir_result + 'fitseq_input_time_points.csv', 'w') as f:
    w = csv.writer(f)
    w.writerows(tmp)


5. Finally we run the commend (fitseq2_run.py) to infer fitness of each genotype using simulated data.

In [4]:
import os

dir_code = '../main_code/'
dir_simulation = './'
dir_result = './'

os.system('python ' + dir_code + 'fitseq2_run.py -i ' 
          + dir_simulation + 'test_0_EvoSimulation_Read_Number.csv -t '
          + dir_result + 'fitseq_input_time_points.csv -o ' 
          + dir_result + 'test')


0

6. Additonly, we also include a Python re-coded version of the MATLAB tool FitSeq (fitseq1_run.py).

In [6]:
import os

dir_code = '../main_code/old_version/'
dir_simulation = './'
dir_result = './'

os.system('python ' + dir_code + 'fitseq_run.py -i ' 
          + dir_simulation + 'test_0_EvoSimulation_Read_Number.csv -t '
          + dir_result + 'fitseq_input_time_points.csv -o ' 
          + dir_result + 'test_old')


0