<a href="https://colab.research.google.com/github/HayLab/AlleleSail/blob/main/AlleleSail_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Modeling Allele Sail: A tutorial!
--------------------------------------
This is a python-based tool with a command-line interface, so using a terminal is a pre-requisite! Also, I'm a Windows user, so I don't know how much of this is translatable to Mac. With any questions, please don't hesistate to reach out to mljohnso@caltech.edu (or haybruce@caltech.edu, if my contact info ends up changing and I forget to update it here).

Starting from how to get the simulation onto your computer: We'll clone the project and move into the directory. This simulation writes data to files, so we'll also create a folder to store the data files in

In [None]:
!git clone https://github.com/HayLab/AlleleSail
%cd AlleleSail
!mkdir demo_data

Cloning into 'AlleleSail'...
remote: Enumerating objects: 23, done.[K
remote: Counting objects: 100% (23/23), done.[K
remote: Compressing objects: 100% (15/15), done.[K
remote: Total 23 (delta 6), reused 21 (delta 4), pack-reused 0[K
Receiving objects: 100% (23/23), 29.95 KiB | 3.74 MiB/s, done.
Resolving deltas: 100% (6/6), done.
/content/AlleleSail


If we were running this on a local machine, here we would want to create a virtual environment to install the required packages, but colab takes care of this for us. To learn how to create a venv and install packages, you can find more information on this process [here.](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/) We'll move onto using the simulation, assuming that all required packages are installed.

The simulation uses a command-line interface to take in inputs. Let's see what inputs are possible.

In [6]:
!python alleleSail_sim.py -h

usage: alleleSail_sim.py [-h] [-a ALLELES [ALLELES ...]] [-m MODS] [--somatic]
                         [-sex SEX_DETERMINATION] [--MC] [-n NUM_GENS] [-i INTRO] [-e EFFICIENCY]
                         [-rd RECOMB_DIST [RECOMB_DIST ...]] [-fc FITNESS_COST [FITNESS_COST ...]]
                         [--dominant_FC] [-sc STERILITY_COST [STERILITY_COST ...]]
                         [-ai ADDITIONAL_INTROS [ADDITIONAL_INTROS ...]]
                         [-l RUN_LABEL_TYPE [RUN_LABEL_TYPE ...]] [-k K] [-no NUM_OFFSPRING]
                         [-gf GROWTH_FACTOR] [-r NUM_RUNS] [-fn FILE_NAME]

run a stochastic simulation of allele sail behavior

options:
  -h, --help            show this help message and exit
  -a ALLELES [ALLELES ...], --alleles ALLELES [ALLELES ...]
                        alleles being used for simulation
  -m MODS, --mods MODS  modifications being used. Honestly? Never touch this
  --somatic             sail cleavage occurs somaticly, as opposed to in the germline


That's a lot of inputs ! To break it down a little more, let's see an example. We'll start by modeling an allele sail for modification.

The default configuration uses four alleles - **[['E', 'O'], ['S', 'W']]**. E and O represent locus 1; they are our Edited allele (E) or our Original allele (S). At the other locus, we either have our editor (S) or the wildtype locus (W).

Let's use a population of size 1000 (-k 1000), and we'll introduce our sail at a frequency of 5% (-i 0.05). I only want this simulation to run for 50 generations (-n 50) and to run 5 times (-r 5). We'll also add in maternal carryover (--MC) and a fitness cost (by default additive) on the editor (S) that applies to both males and females (-fc 2 E 0.1). *Note: If we wanted this to just apply to females, we'd use -fc 0 E 0.1, and if we just wanted this to apply to males, we'd use -fc 1 E 0.1*

In [25]:
!python alleleSail_sim.py -l intro_0.05_fc_0.1 -fn demo_data/modification_dominant_MC \
  -r 5 -n 50 -k 1000 -i 0.05 --MC -fc 2 E 0.1

[50]
[[2, ['E'], 0.1, ['Q'], 'additive']]
[]
[]
time taken on run 1: 6.156033992767334
time taken on run 2: 7.175011157989502
time taken on run 3: 5.915485382080078
time taken on run 4: 7.29848575592041
time taken on run 5: 6.009943246841431
time taken on run 6: 7.4186341762542725
time taken on run 7: 6.270315885543823
time taken on run 8: 6.953797340393066
time taken on run 9: 5.821960926055908
time taken on run 10: 7.198100805282593
intro_0.05_fc_0.1 appended to data/modification_dominant_MC



Let's check to see if our files are there!

In [26]:
!ls demo_data

modification_dominant_MC_allele.csv	suppression_XY_noMC_allele.csv
modification_dominant_MC_genotype.csv	suppression_XY_noMC_genotype.csv
modification_dominant_MC_NEWallele.csv	suppression_XY_noMC_NEWallele.csv
modification_dominant_MC_total.csv	suppression_XY_noMC_total.csv
modification_dominant_MC_total_pop.csv	suppression_XY_noMC_total_pop.csv


We now have five files! Each contains different information.
**modification_dominant_MC_allele.csv** contains information about the number of allele-carriers in the population, for each allele.
**modification_dominant_MC_genotype.csv** contains information on how many individuals of each genotype exist for each generation.
**modification_dominant_MC_NEWallele.csv** contains information on how many alleles exist in the population, regardless of individuals.
**modification_dominant_MC_total.csv** contains information about the total number of individuals in each generation, including those that are released
**modification_dominant_MC_total_pop.csv** contains informations about the total number of individuals that grew up in the "wild" - this is the total number of individual NOT including releases.

To analyze this data, use the csv analyzer of your choice!

Now, let's try modeling one for suppression. Let's model an allele sail in the XY system without maternal carryover, for a population of size 1000. We'll label the run with the introduction frequency and number of repeated releases, and save our data to the file "suppression_XY_noMC". We'll also do 10 runs for 50 generations each. Since we want to do releases every generation, as opposed to just at the beginning, lets set the original introduction frequency to 0, and use the 'additional intros' flag to release every generation. Males are denoted by 1, the transgenic genotype we want is genotype 1 (more information on this below), we want to start at generation 0, and we'll go every generation until the simulation ends. I'll store this additional intro info into some variables, to make them easier to re-use. To do so, we'll switch to the shell for this, using the command %%shell

In [None]:
%%shell
males=1
genotype=0
first_gen=0
num_runs=10
repeats=51
gap=1
freq=0.1

python alleleSail_sim.py -l intro_${freq}_repeats_50 -fn demo_data/suppression_XY_noMC \
  -sex XY -r $num_runs -n 50 -k 1000  \
  -i 0 -ai $males $genotype $freq $first_gen $gap $repeats

Let's check to see if our files ran

In [24]:
!ls demo_data/

suppression_XY_noMC_allele.csv	   suppression_XY_noMC_total.csv
suppression_XY_noMC_genotype.csv   suppression_XY_noMC_total_pop.csv
suppression_XY_noMC_NEWallele.csv


We now have information on allele carriers, individuals of different genotypes, allele frequencies, the total number of individuals (including those released), and the total number of individuals not including those released.
A note on runtimes: the above simulations, with 1000 individuals, run at about 7 seconds per run on my machine. For the paper, we used simulations with carrying capacity of 10000 individuals, and saw run-times of closer to 70 seconds per simulation.
This demo will hopefully be updated in the future, with more info about how to play around with the code and analyze files, but for any questions please feel free to reach out to mljohnso@caltech.edu