## Simulate non-cyanogenic evolution via drift alone

There is a finding that across a cline from less to more urban the frequency of a non-cyanogenic phenotype increases
The phenotype is genetically controlled by two loci, both of which have a segregating knock-out allele
If any individual is homozygous for either knockout they become non-cyanogenic

In this extremely simple simulation I create a 'population' which is represented by 2 lists of alleles (A/a and B/b)
To simulate evolution I randomly sample with replacement from the lists to create new list that represent the next generation.

By repeating this process with variable starting frequencies, population sizes and numbers of generations (functionally equivalent to steps in a strict stepping stone) we can look at the change in the frequency of cyanogenic and non-cyanogenic phenotypes

## Extension to previous simulations — Spatial structure

To build off of the previous single-population stepping stone model, I have added a few functions that allow us to add spatial structure to our simulations, effectively simulating the trajectory of multiple populations simultaneously. Once migration is added, this will be analogous to a metapopulation.

#### Major changes to script

1. Since there will be multiple populations, they are now stored as entries in a dictionary. Each key in the dictionarry corresponds to a population and the value is a list containing the frequencies of alleles phenotype, updated each generation. 
2. The alleles that correspond to each population are stored in a separate dictionary with keys matching those from the population dictionary. As such, allele frequencies for populations are calculated by sampling the allele lists in the 'alleles' dictionary with the key matching the population.
3. Every generation, each population has some probability (p) of generating a new population with N alleles sampled from the population that created it. This is not the most realistic scenario but can easily be modified later. 
4. Simulations are now stored separately in a dictionary called 'sim' where each key corresponds to a simulation. 
5. These changes allow us to model clines in two ways:
    - Clines across 'time': looking at phenotype frequencies **within** populations
    - Clines across 'space': looking at phenotype frequencies **across** created populations

In [11]:
# Modules used throughout script
import random
from collections import OrderedDict
import csv
import time
from datetime import datetime
import os
import itertools


# Randomly sample 'N' alleles from lists containing alleles for locus A 
def sample_population_A(locus_A, N):
    new_locus_A = [random.choice(locus_A) for _ in range(N)]
    return new_locus_A

# Randomly sample 'N' alleles from lists containing alleles for locus B
def sample_population_B(locus_B, N):
    new_locus_B = [random.choice(locus_B) for _ in range(N)]
    return new_locus_B

# From list containing alleles, calculate the frequency of 'A' or 'B' allele. 
def allele_freq(locus):
    p = sum(1*i.isupper() for i in locus)/float(len(locus))
    return p

# Create new population as empty list and add to 'pops' dictionary. Also create two lists of alleles
# sampled from pool of alleles from population that generated the new one. Alleles added to 'alleles'
# dictionary
def create_population(p, Akey, Avalue, pops, alleles): #'Akey' and 'Avalue' arguments local variables defined in 'cline' function
    if not alleles['1']['A']: #If there are no alleles for first population, pass. Only valid for first iteration when the populations have yet to be initialized
        #print 'There are no populations from which to sample!!'
        pass
    else:
        prob_list = (['1'] * int(10*p) ) + (['0'] * int(round(10*(1-p)))) #List with 0's and 1's based on argument 'p'
        create = [random.choice(prob_list) for _ in range(1)] #Randomly select an element from 'prob_list'
        if create[0] == '1': #If a '1' is sampled, create population
            global pop_counter #global variable that tracks the number of populations created
            pop_counter += 1 #Increment 'pop_counter' by 1 if population is being created. 
            #Add two lists to 'alleles' dictionary ('A' and 'B'). Naming: 'Pop.number_Population that created the new one'
            alleles['{0}_{1}'.format(pop_counter,Akey)] = {'A':sample_population_A(alleles[Akey].items()[0][1],N),'B':sample_population_B(alleles[Akey].items()[1][1],N)}
            pops['{0}_{1}'.format(pop_counter,Akey)] = [] #Empty list for new population. Naming same as alleles.
            
# Join list containing simulation results (e.g. Population size, allele frequencies, phenotype frequency, etc.) with 
# list containing iteration number (i.e. sim)
def final_results(results,sim):
    return [a + b for a,b in zip(results,sim)]


# Given the frequencies of 'A' and 'B' alleles, return the frequency of the 'acyanogenic' phenotype (i.e. recessive
# at either the A locus, B locus, or both) 
def phenotype(pA, pB):
    qA = 1-pA
    qB = 1-pB
    mut= qA**2 + qB**2 - (qA**2 * qB**2)
    WT = 1-mut
    return mut # Frequency of acyanogenic phenotype

# Cline function. Every generation, 'N' alleles are sampled for pool of alleles sampled the previous generation.
# Ever generation, every population has some probability (p) of generating a new population, with alleles
# sampled from the population that created it.
def cline(locus_A,locus_B, steps, N, p, pops, alleles):
        for i in range(steps):
            for Akey, Avalue in alleles.items():
                if Akey in pops.keys():
                    if 'A' and 'B' in Avalue.keys():
                        if not Avalue['A'] and not Avalue['B']: 
                            #If allele lists are empty, sample from list of initial allele frequencies. Only used for first generation
                            Avalue['A'] = (sample_population_A(locus_A, N))
                            Avalue['B'] = (sample_population_B(locus_B, N))
                        else:
                            #If allele lists are not empty, sample from previously sampled set of alleles. 
                            Avalue['A'] = (sample_population_A(Avalue['A'], N))
                            Avalue['B'] = (sample_population_B(Avalue['B'], N))
                    create_population(p, Akey, Avalue, pops, alleles) #Create population. Alleles will be sampled (see above). Population is currently empty list
            for Pkey in pops.keys():
                #Calculate allele and phetype frequencies for every population, including newly created ones. 
                pA = allele_freq(alleles[Pkey].items()[0][1]) 
                pB = allele_freq(alleles[Pkey].items()[1][1])
                pops[Pkey].append([N, i, pA, pB, phenotype(pA, pB)])
        #return pops

        
# Using the functions defined above, 'simulate' performs 'sims' iterations of the cline function -- simulating the effects 
# of drift in a stepping stone model -- each time storing the results.
def simulate(pA, pB, steps, N, sims):
    qA = 1-pA # Frequency of 'a' allele
    qB = 1-pB
    # Make the two lists based on the allele frequency to represent the initial population
    locus_A = (['A'] * int(N*pA) ) + (['a'] * int(round(N*qA)) ) # [A,A,A,A,a,a,a,a,....]
    locus_B = (['B'] * int(N*pB) ) + (['b'] * int(round(N*qB)) ) 
    ####### sims simulations #####################
    # We will simulate 'steps' iterations of resampling this population to simulate drift
    # We will then repeat that simulation of 'steps' iterations 1000 times to get a mean
    ##############################################
    for s in range(sims):
        pops = OrderedDict({'1':[]}) # Re-initialize dictionary to store populations
        alleles = OrderedDict({'1':{'A':[],'B':[]}}) # Re-initialize dictionary to store allele lists
        global pop_counter # Reset population counter
        pop_counter = 1
        # reset the population for each iteration. I don't actually think this is necessary
        locus_A = (['A'] * int(N*pA) ) + (['a'] * int(round(N*qA)) )  # Re-initialize initial allele lists.
        locus_B = (['B'] * int(N*pB) ) + (['b'] * int(round(N*qB)) ) 
        cline(locus_A,locus_B, steps, N, p, pops, alleles) # Run cline function
        global sim
        sim[s] = pops # Append results to global 'sim' dictionary
        


In [12]:
N = 10 # Starting population size (i.e. sample 10 alleles)
pA = 0.5 # Initial frequency of 'A' allele
pB = 0.5 # Initial frequency of 'B' allele
qA = 1 - pA # 'a' allele
qB = 1 - pB # 'b' allele
steps = 5 # Number of generations
sims = 2 # Number of simulations
pop_counter = 1 # Count of number of populations being created. Modified by 'create_population'. Used in naming populations
p = 1 # Probability of creating a population. 
sim = {} # Global dictionary used for storing results of each iteration
simulate(pA, pB, steps, N, sims) # Simulate function. 
sim

{0: OrderedDict([('1',
               [[10, 0, 0.3, 0.6, 0.5715999999999999],
                [10, 1, 0.5, 0.9, 0.2575],
                [10, 2, 0.4, 1.0, 0.36],
                [10, 3, 0.5, 1.0, 0.25],
                [10, 4, 0.5, 1.0, 0.25]]),
              ('2_1',
               [[10, 0, 0.5, 0.4, 0.52],
                [10, 1, 0.4, 0.3, 0.6735999999999999],
                [10, 2, 0.3, 0.5, 0.6175],
                [10, 3, 0.3, 0.2, 0.8164],
                [10, 4, 0.5, 0.5, 0.4375]]),
              ('3_1',
               [[10, 1, 0.5, 1.0, 0.25],
                [10, 2, 0.4, 1.0, 0.36],
                [10, 3, 0.4, 1.0, 0.36],
                [10, 4, 0.5, 1.0, 0.25]]),
              ('4_2_1',
               [[10, 1, 0.2, 0.5, 0.7300000000000001],
                [10, 2, 0.1, 0.4, 0.8783999999999998],
                [10, 3, 0.0, 0.4, 0.9999999999999999],
                [10, 4, 0.0, 0.6, 1.0]]),
              ('5_1',
               [[10, 2, 0.3, 1.0, 0.48999999999999994],
        

In [10]:
%reset

Once deleted, variables cannot be recovered. Proceed (y/[n])? y
