## Simulate non-cyanogenic evolution via drift alone

There is a finding that across a cline from less to more urban the frequency of a non-cyanogenic phenotype increases
The phenotype is genetically controlled by two loci, both of which have a segregating knock-out allele
If any individual is homozygous for either knockout they become non-cyanogenic

In this extremely simple simulation I create a 'population' which is represented by 2 lists of alleles (A/a and B/b)
To simulate evolution I randomly sample with replacement from the lists to create new list that represent the next generation.

By repeating this process with variable starting frequencies, population sizes and numbers of generations (functionally equivalent to steps in a strict stepping stone) we can look at the change in the frequency of cyanogenic and non-cyanogenic phenotypes

In [96]:
import random
from collections import OrderedDict
import csv
import time


# Randomly sample 'N' alleles from lists containing alleles for locus A (allels: A or a) and locus B (alleles: B or b). 
# Return list containing sampled alleles.
def sample_population(locus_A, locus_B, N):
    new_locus_A = [random.choice(locus_A) for _ in range(N)]
    new_locus_B = [random.choice(locus_B) for _ in range(N)]
    return new_locus_A, new_locus_B

# From list containing alleles, calculate the frequency of 'A' or 'B' allele. 
def allele_freq(locus):
    p = sum(1*i.isupper() for i in locus)/float(len(locus))
    return p

# Join list containing simulation results (e.g. Population size, allele frequencies, phenotype frequency, etc.) with 
# list containing iteration number (i.e. sim)
def final_results(results,sim):
    return [a + b for a,b in zip(results,sim)]


# Given the frequencies of 'A' and 'B' alleles, return the frequency of the 'acyanogenic' phenotype (i.e. recessive
# at either the A locus, B locus, or both) 
def phenotype(pA, pB):
    qA = 1-pA
    qB = 1-pB
    mut= qA**2 + qB**2 - (qA**2 * qB**2)
    WT = 1-mut
    return mut # Frequency of acyanogenic phenotype


# This function returns the proportion of all simulations that resulted in fixation of either phenotype. 'FixedU' 
# is the proportion fixed for acyanogenesis and 'FixedD' for cyanogenesis. 
def prop_fixed(big_dict,sims):
    fixedU = 0
    fixedD = 0
    for s in big_dict[len(big_dict)-1]: # Only look at final generation
        if s[2] == 1.0:  # Index of 2 required to access third item in step_dict list (i.e. phenotype frequency)
            fixedU += 1
        if s[2] == 0.0:
            fixedD += 1
    return fixedU/float(sims), fixedD/float(sims)

# Randomly samples N alleles from locus A and locus B, calculates the frequency of both alleles followed by the frequency
# of 'acyanogenesis' phenotype and adds this frequency to step_dict list. Repeats process "step" times. Note that locus_A
# and locus_B are defined in the for loop. Therefore with each generation (i.e. step), loci are sampled from those sampled
# in the previous generation. This is analogous to a stepping stone model. 
def cline(locus_A,locus_B, steps, N):
    for i in range(steps):
        #print i,
        locus_A, locus_B = (sample_population(locus_A,locus_B, N))
        pA, pB = allele_freq(locus_A), allele_freq(locus_B)
        #print pA, pB, phenotype(pA, pB)
        results.append([N,i,pA,pB,phenotype(pA,pB)])
    return results # results contains pop. size (N), step, (generation), allele frequencies and frequency of acyanogenic phenotype


# Same function as above but used for generating results in a way that allows me to more easily calculate the proportion of 
# simulations resulting in fixation of either phenotype (using "prop_fixed" function). 
# I'm sure there is a way to avoid using the same function twice but I haven't figured it out yet. 
def cline_fixed(locus_A,locus_B, steps, N):
    step_dict = OrderedDict() # Create ordered list that successively stores results of the cline functions step times (i.e. each generation)
    for i in range(steps):
        #print i,
        locus_A, locus_B = (sample_population(locus_A,locus_B, N))
        pA, pB = allele_freq(locus_A), allele_freq(locus_B)
        #print pA, pB, phenotype(pA, pB)
        step_dict[i] = pA, pB, phenotype(pA, pB)
    return step_dict # step_dict contains allele frequencies and frequency of acyanogenesis
        
# Using the functions defined above, 'simulate' performs 'sims' iterations of the cline function -- simulating the effects 
# of drift in a stepping stone model -- each time storing the results.
def simulate(pA, pB, steps, N, sims):
    qA = 1-pA # Frequency of 'a' allele
    qB = 1-pB
    # Make the two lists based on the allele frequency to represent the initial population
    locus_A = (['A'] * int(N*pA) ) + (['a'] * int(round(N*qA)) ) # [A,A,A,A,a,a,a,a,....]
    locus_B = (['B'] * int(N*pB) ) + (['b'] * int(round(N*qB)) ) 
    ####### sims simulations #####################
    # We will simulate 'steps' iterations of resampling this population to simulate drift
    # We will then repeat that simulation of 'steps' iterations 1000 times to get a mean
    ##############################################
    big_dict = OrderedDict() # Large ordered lists to store the results of 'sims' iterations of the cline function
    for s in range(steps): big_dict[s] = [] # Create as many empty dictionaries as there as steps (i.e. generations)
    for i in range(sims):
        # reset the population for each iteration. I don't actually think this is necessary
        locus_A = (['A'] * int(N*pA) ) + (['a'] * int(round(N*qA)) ) 
        locus_B = (['B'] * int(N*pB) ) + (['b'] * int(round(N*qB)) ) 
        step_dict = cline_fixed(locus_A,locus_B, steps, N) # Adds results to step_dict. Only used for calculating fixation
        cline(locus_A,locus_B, steps, N) # appends results to globally defined 'results' list
        for s in step_dict: big_dict[s].append(step_dict[s])
        for x in range(steps):sim.append([i])
    fixed.append(prop_fixed(big_dict,sims)) # Proportion of simulations resulting in fixation of either phenotype.
    #return big_dict, step_dict

In [97]:
# To look at how the effects of drift in generating clines in acyanogenesis changes with population size (N), we will perform
# simulations using the function above at varying populatiion sizes. We will vary population size as follows:
# Population size (N): (Start, End, By) -- (10,100,10);(100,500,100);(500,1000,500);(1000,5000,1000);(5000,10000,5000)
# 'A' and 'B' alleles held constant at 0.5. 
# 'Steps' held constant at 50
# 'Sims' held constant at 10000
# All results exported as 'csv' files for import and analysis in R

start_time = time.time()

N = 10 # Starting population size (i.e. sample 10 alleles)
pA = 0.5
pB = 0.5
steps = 50
sims = 10
results = [['N','step','pA','pB','Phen']] # Results stored here. Header added.
sim = [['sim']] # Iteration of cline function (i.e. 'sim') stored here. Will later be appended to 'results'
fixed = [['Up','Down']] # Proportion of simulations resulting in fixation stored here
for i in range(100):
    if N >=10 and N < 100: # When N between 10 and 100, run simulation, increment population size by 10
        simulate(pA,pB,steps,N,sims) 
        N += 10
    elif N >= 100 and N < 500:
        simulate(pA,pB,steps,N,sims)
        N += 100
    elif N >= 500 and N < 1000:
        simulate(pA,pB,steps,N,sims)
        N += 500
    elif N >= 1000 and N <= 5000:
        simulate(pA,pB,steps,N,sims)
        N += 1000
    elif N > 5000:
        final = final_results(results,sim) # Store final results with 'sim' appended to the rest of 'results'
        with open("20160825_SEC_Drift_Nvary.results.csv", "wb") as f:
            writer = csv.writer(f)
            writer.writerows(final)
        with open("20160825_SEC_Drift_Nvary.fixed.csv", "wb") as f:
            writer = csv.writer(f)
            writer.writerows(fixed)
        break

print "My program took", time.time() - start_time, "seconds to run"

My program took 37.0164730549 seconds to run


In [1]:
%reset

Once deleted, variables cannot be recovered. Proceed (y/[n])? y


In [4]:
N = 10
pA = 0.5
pB = 0.5
qA = 0.5
qB = 0.5
steps = 50
locus_A = (['A'] * int(N*pA) ) + (['a'] * int(round(N*qA)) ) # [A,A,A,A,a,a,a,a,....]
locus_B = (['B'] * int(N*pB) ) + (['b'] * int(round(N*qB)) ) 
cline(locus_A,locus_B, steps, N)

[[10, 0, 0.0, 1.0, 1.0],
 [10, 1, 0.0, 1.0, 1.0],
 [10, 2, 0.0, 1.0, 1.0],
 [10, 3, 0.0, 1.0, 1.0],
 [10, 4, 0.0, 1.0, 1.0],
 [10, 5, 0.0, 1.0, 1.0],
 [10, 6, 0.0, 1.0, 1.0],
 [10, 7, 0.0, 1.0, 1.0],
 [10, 8, 0.0, 1.0, 1.0],
 [10, 9, 0.0, 1.0, 1.0],
 [10, 10, 0.0, 1.0, 1.0],
 [10, 11, 0.0, 1.0, 1.0],
 [10, 12, 0.0, 1.0, 1.0],
 [10, 13, 0.0, 1.0, 1.0],
 [10, 14, 0.0, 1.0, 1.0],
 [10, 15, 0.0, 1.0, 1.0],
 [10, 16, 0.0, 1.0, 1.0],
 [10, 17, 0.0, 1.0, 1.0],
 [10, 18, 0.0, 1.0, 1.0],
 [10, 19, 0.0, 1.0, 1.0],
 [10, 20, 0.0, 1.0, 1.0],
 [10, 21, 0.0, 1.0, 1.0],
 [10, 22, 0.0, 1.0, 1.0],
 [10, 23, 0.0, 1.0, 1.0],
 [10, 24, 0.0, 1.0, 1.0],
 [10, 25, 0.0, 1.0, 1.0],
 [10, 26, 0.0, 1.0, 1.0],
 [10, 27, 0.0, 1.0, 1.0],
 [10, 28, 0.0, 1.0, 1.0],
 [10, 29, 0.0, 1.0, 1.0],
 [10, 30, 0.0, 1.0, 1.0],
 [10, 31, 0.0, 1.0, 1.0],
 [10, 32, 0.0, 1.0, 1.0],
 [10, 33, 0.0, 1.0, 1.0],
 [10, 34, 0.0, 1.0, 1.0],
 [10, 35, 0.0, 1.0, 1.0],
 [10, 36, 0.0, 1.0, 1.0],
 [10, 37, 0.0, 1.0, 1.0],
 [10, 38, 0.0, 1.0, 1.

In [24]:
results = []
fixed = []
simulate(pA, pB, steps, N, 10)