# FitMut2.0
In this notebook we walk through an example use case, generating data from simulation and then inferring fitness and establishment times. Though we expect most users will want to use our code to infer mutational parameters on their own experimental datasets, we include instructions on how to generate simulated data for the sake of completeness.

First we install the required Python packages (uncomment this line and run if necessary)

In [3]:
 !pip install -r ../requirements.txt

Collecting numpy==1.18.5 (from -r ../requirements.txt (line 1))
  Using cached numpy-1.18.5.zip (5.4 MB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25lerror
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mPreparing metadata [0m[1;32m([0m[32mpyproject.toml[0m[1;32m)[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m [31m[25 lines of output][0m
  [31m   [0m Running from numpy source directory.
  [31m   [0m Traceback (most recent call last):
  [31m   [0m   File "/opt/anaconda3/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
  [31m   [0m     main()
  [31m   [0m   File "/opt/anaconda3/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
  [31m   [0m     json_out['return_val'] = hook(**hook_

In [5]:
import numpy as np
import pandas as pd
import itertools
import csv
import os

Then we generate the input files required to run a simulation. This includes the definition of $\mu(s)$, which we can define as a histogram with frequencies and bins. We separately specify the overall mutations rate $\int ds \mu(s)$.

In [3]:
dir_sim = './'
dir_code = '../main_code/'
dir_result = './'

delta_t = 8
t_seq = np.arange(0, delta_t*15, delta_t)

tmp_1 = {'0':t_seq, 
         '1':10*np.ones(np.shape(t_seq)), 
         '2':20*np.ones(np.shape(t_seq)), 
         '3':50*np.ones(np.shape(t_seq)), 
         '4':100*np.ones(np.shape(t_seq))}


tmp = list(itertools.zip_longest(*list(tmp_1.values())))
with open(dir_sim + 'simu_input_time_points.csv', 'w') as f:
    w = csv.writer(f)
    w.writerows(tmp)
        
        
step_size = 0.001
np.random.seed(5)
s = np.random.exponential(0.045,100000) 
bins_edge = np.arange(0, 0.145, step_size)
        
freq_bin = np.histogram(s, bins=bins_edge, density=True)[0] * step_size
input_test = {'0':[1e-5], '1':bins_edge, '2':freq_bin}
tmp = list(itertools.zip_longest(*list(input_test.values())))
with open(dir_sim + 'simu_input_mutation_fitness.csv', 'w') as f:
    w = csv.writer(f)
    w.writerows(tmp)


Then we run the script `fitmutsimu_run.py` to generate simulated data.

In [4]:
lineages_num = int(1e4)

os.system('python3 {}fitmutsimu_run.py -l {} -t {}simu_input_time_points.csv -s {}\
simu_input_mutation_fitness.csv -o {}simu'.format(dir_code,lineages_num,dir_sim,dir_sim,dir_sim))

0

Next, we generate the input files required to run the inference algorithm.

In [5]:
csv_input = pd.read_csv(dir_sim + 'simu_input_time_points.csv', low_memory=False, header=None)
t_seq = np.array(csv_input[0][~pd.isnull(csv_input[0])], dtype=float)
cell_num_average_bottleneck = 100
t_delta = 8

csv_input = pd.read_csv(dir_sim + 'simu_0_EvoSimulation_Read_Number.csv', low_memory=False, header=None)
lineages_num = csv_input.shape[0]
 
cell_depth_seq = cell_num_average_bottleneck*lineages_num*np.ones(t_seq.shape)*t_delta
input_tmp = {'0':t_seq, '1':cell_depth_seq}
tmp = list(itertools.zip_longest(*list(input_tmp.values())))
with open(dir_result + 'fitmut_input_time_points.csv', 'w') as f:
    w = csv.writer(f)
    w.writerows(tmp)


Finally we run the script `fitmut2_run.py` to detect adaptive mutations and infer their fitness effects and establishment time using simulated data.

In [6]:
os.system('python3 {}fitmut2_run.py -i {}simu_3_EvoSimulation_Read_Number.csv -t {}\
fitmut_input_time_points.csv -o {}test'.format(dir_code,dir_sim,dir_result,dir_result))

0