## Simulate non-cyanogenic evolution via drift alone

There is a finding that across a cline from less to more urban the frequency of a non-cyanogenic phenotype increases
The phenotype is genetically controlled by two loci, both of which have a segregating knock-out allele
If any individual is homozygous for either knockout they become non-cyanogenic

In this extremely simple simulation I create a 'population' which is represented by 2 lists of alleles (A/a and B/b)
To simulate evolution I randomly sample with replacement from the lists to create new list that represent the next generation.

By repeating this process with variable starting frequencies, population sizes and numbers of generations (functionally equivalent to steps in a strict stepping stone) we can look at the change in the frequency of cyanogenic and non-cyanogenic phenotypes

## Extension to previous simulations — Spatial structure

To build off of the previous single-population stepping stone model, I have added a few functions that allow us to add spatial structure to our simulations, effectively simulating the trajectory of multiple populations simultaneously. Once migration is added, this will be analogous to a metapopulation.

#### Major changes to script

1. Since there will be multiple populations, they are now stored as entries in a dictionary. Each key in the dictionarry corresponds to a population and the value is a list containing the frequencies of alleles phenotype, updated each generation. 
2. The alleles that correspond to each population are stored in a separate dictionary with keys matching those from the population dictionary. As such, allele frequencies for populations are calculated by sampling the allele lists in the 'alleles' dictionary with the key matching the population.
3. Every generation, each population has some probability (p) of generating a new population with N alleles sampled from the population that created it. This is not the most realistic scenario but can easily be modified later. 
4. Simulations are now stored separately in a dictionary called 'sim' where each key corresponds to a simulation. 
5. These changes allow us to model clines in two ways:
    - Clines across 'time': looking at phenotype frequencies **within** populations
    - Clines across 'space': looking at phenotype frequencies **across** created populations

## Extension to previous simulations — Logistic population growth

To build off of the simulations that included spatial structure, I have incorporated logistic population growth into the simulations. 

**Major changes to script**

1. Newly created populations form as a bottleneck from the population that created it. The proportion of alleles sampled is some fraction (bot) multiplied by the population size of the source population. 
2. Every generation, every population grows in size according to an approximate logistic growth curve. The logistic growth function incorporates a fractional constant that, when multiplied by the current population size, modifies the per capita growth rate of the population. The carrying capacity is thus altered by changing this fractional constant. 
3. The size of every population is stored as an integer in a list within the alleles dictionary under the key corresponding to the population. This integer is called as needed by the functions that require population size as an argument (e.g. 'bottle', 'pop_growth').

## Extension to previous simulations — Migration

To simulations that previously included spatial structure and logistic population growth, I have added migration

**Major changes to script**

1. Every generation, every population exchanges alleles with every other population at a rate that declines with the distance between them. Migration rate declines linearly with distance and is determined by the 'migration_rate' function. 
2. Distances between populations are calculated as the difference between the lengths of the population ID's (keys in alleles and pops dictionaries) since these track population history. As such, a larger difference = greater distance = lower migration rate. 
3. Distances and migration rates are stored in a dictionary ('Dis'), which is re-created at the start of every generation since more populations may have been generated. The key's in this dictionary are concatenated strings formed from the union of all pairwise combinations of existing populations. This makes it easy to determine the correct migration rate between all pairs of populations.


## Extension to previous simulations — Explicit spatial structure

To previous simulations, I have made spatial structure among populations explicit.

**Major changes to script**

1. Simulations are now initialized by generating an m x n dimensional matrix with 1 unit between adjacent cells on x and y axes. Distances along diagonals are calculated using pythagorean theorem. All distances and migration rates are calculated before simulations are run and are stored in the 'Disntance_Dic' dictionary (global)
2. The first population is randomly placed within the array. New populations can only be created in adjacent cells.
3. Populations are named sequentially based on order of creation (i.e. 1, 2, 3, etc.). Naming in matrix corresponds to naming in dictionaries containing population information (i.e. allele frequencies) and allele lists.

In [1]:
# Modules used throughout script
import random
from collections import OrderedDict
import csv
import time
from datetime import datetime
import os
import itertools
import math
import numpy as np
import pandas as pd
from numpy.random import choice

# Randomly sample 'N' alleles from lists containing alleles for locus A 
def sample_population_A(locus_A, N):
    '''
    ## PARAMETERS ##
    
    locus_A: List containing alleles 'a' and 'A'
    
    N: Number of alleles to sample from locus_A (i.e. population size)
    
    ## USED IN FUNCTIONS ##
    
    1. migrate_A
    2. migrate_B
    3. create_population_2
    4. cline
    '''  
    new_locus_A = [random.choice(locus_A) for _ in range(N)]
    return new_locus_A

# Randomly sample 'N' alleles from lists containing alleles for locus B
def sample_population_B(locus_B, N):
    '''
    ## PARAMETERS ##
    
    locus_B: List containing alleles 'b' and 'B'
    
    N: Number of alleles to sample from locus_B (i.e. population size)
    
    ## USED IN FUNCTIONS ##
    
    1. migrate_A
    2. migrate_B
    3. create_population_2
    4. cline
    '''  
    new_locus_B = [random.choice(locus_B) for _ in range(N)]
    return new_locus_B

# Function used to determine migration rates based on distances between all populations. 
# Linear decrease in migration rate with increasing distance. Rate of decreased based on slope ('m')
# which depends on the maximum distance between populations and the desired 'y_int' (i.e. migration rate when distace = 0)
def migration_rate(Distance_Dic, y_int):
    '''
    ## PARAMETERS ##
    
    Distance_Dic: Dictionary containing distances between all pairwise combinations of cells in 'Matrix'. 
    Will also contain migration rates after use of this function
    
    y_int: Desired migration rate when distance = 0
        
    ## USED IN FUNCTIONS ##
    
    1. Distance_Mig
    '''  
    max_dis = max(Distance_Dic.values()) # Max distance between populations
    m = (y_int - 0)/(max_dis[0] - 0) # Slope. Assumes close to no migration at max distance. Relized migration at max distance may be slightly greater than 0 due to rounding. 
    for Dkey, Dvalue in Distance_Dic.items():
        Mig_prop = y_int - m*Dvalue[0] 
        Distance_Dic[Dkey].append(round(Mig_prop,2)) # Append migration rate to 'Distance_Dic' dictionary
        
# Function to calculate distances between all pairwise combinations of cells in matrix. Also calls
# the migration rate between populations and appends this to the distance dictionary. 
def Distance_Mig(Matrix, y_int):
    '''
    ## PARAMETERS ##
    
    Matrix: m x n dimensional matrix, initialized at the outset of each simulation. 
    Contains empty cells that may become filled with populations. 
    
    y_int: Desired migration rate when distance = 0
        
    ## USED IN FUNCTIONS ##
    
    1. Simulate
    ''' 
    rows = [i for i in range(len(Matrix))] 
    cols = [i for i in range(len(Matrix))]
    matelem = [(i,j) for i in rows for j in cols]
    dis = [[i,j] for i in matelem for j in matelem]
    for i in dis:
        if i[0][0] == i[1][0]: # Within rows
            dis1 = abs(i[1][1] - i[0][1])
            i.append(dis1)
        elif i[0][1] == i[1][1]: # Within columns
            dis2 = abs(i[1][0] - i[0][0])
            i.append(dis2)    
        else: # Diagonals
            dis3 = (((i[1][1] - i[0][1])**2) + ((i[1][0] - i[0][0])**2))**(0.5)
            i.append(dis3)
    global Distance_Dic
    Distance_Dic = {'{0}.{1}'.format(key1,key2):[round(key3,2)] for key1,key2,key3 in dis}
    migration_rate(Distance_Dic, y_int)
    
# Migration function for locus A. Every generation, every population exchanges alleles with every
# other at a rate that declines with increasing distance between populations. For each (FOCAL) population, 
# this functiona cycles through each other (SOURCE) population and determines how many alleles
# will migrate from the SOURCE population to the FOCAL population. This is determined by the product
# of the migration rate and the population size of the SOURCE population. Two new lists are then 
# created: one containing a random sample of current alleles from the FOCAL population that will be kept 
# (FOCAL population size minus the number of allele being migrated) and one containing a random sample of alleles from the 
# SOURCE population (equal to the number of alleles being migrated). These two lists are then combined,
# forming the new allele list of the FOCAL population. This process is repeated for every combination of
# populations.
def migrate_A(Akey, Avalue, Distance_Dic, alleles, pop_list, Matrix):
    '''
    ## PARAMETERS ##
    
    Akey: Index used to cycle through keys in alleles dictionary. 
    Note keys correspond to populations. See 'cline' function.  
    
    Avalue: Index used to cycle through values in alleles dictionary. See 'cline function. 
    
    Distance_Dic: Dictionary containing distances and migrations rates between all pairwise
    combinations of cells in 'Matrix'
    
    alleles: Dictionary used to stored lists of alleles population size for each population. 
    Updated every generation. 
    
    pop_list: List containing all populations currently involved in simulations.
    Created at the start of every generation. See 'cline' function.
    
    Matrix: m x n dimensional matrix, initialized at the outset of each simulation. Contains empty cells that may become filled with populations. 
            
    ## USED IN FUNCTIONS ##
    
    1. Cline
    ''' 
    if len(pop_list) > 1: # Migration only occurs if there is more than one population
        to_pop = (np.where(Matrix == int(Akey))[0][0], np.where(Matrix == int(Akey))[1][0]) # Location of focal population in matrix. 
        Mig_Alleles = [] # Create list that will store potential migratory alleles (from all source populations)
        for i in pop_list: # For every population in existence
            if Akey == i: # Do not migrate within populations
                pass
            else:
                from_pop = (np.where(Matrix == int(i))[0][0], np.where(Matrix == int(i))[1][0]) # Location of source population in matrix
                con = str(to_pop) + '.' + str(from_pop) # Create concatenated string from current focal population and source population
                mig_ind = int(math.ceil(Distance_Dic[con][1]*alleles[i]['S'][0])) # Calculate the number of alleles to migrate from migration rate and population size. Determined from source population. NOTE ROUNDING
                Mig_Alleles.append([random.choice(alleles[i]['A']) for _ in range(mig_ind)])         
        locus_A_keep = (sample_population_A(Avalue['A'], Avalue['S'][0] - mig_ind)) # Create list of alleles in focal population to be kept. Random sample of existing alleles. Complement of number of migratory individuals
        Mig_Alleles = [val for sublist in Mig_Alleles for val in sublist] # Flatten'Mig_Alleles' list
        locus_A_rep = [random.choice(Mig_Alleles) for _ in range(mig_ind)] # Create list of replacement alleles by sampling alleles from list containing all migratory alleles from source populations. 
        Avalue['Am'] = locus_A_keep + locus_A_rep # Combine kept and replaced alleles into single list.  
        return Avalue['Am']
    else:
        #If there is only one population, then resampling of existing alleles occurs (i.e. drift)
        Avalue['Am'] = sample_population_A(Avalue['Am'], Avalue['S'][0])
        return Avalue['Am']
    
# Migration function for locus B. See annotations above. 
def migrate_B(Akey, Avalue, Distance_Dic, alleles, pop_list, Matrix):
    '''
    ## PARAMETERS ##
    
    Akey: Index used to cycle through keys in alleles dictionary. 
    Note keys correspond to populations. See 'cline' function.  
    
    Avalue: Index used to cycle through values in alleles dictionary. See 'cline function. 
    
    Distance_Dic: Dictionary containing distances and migrations rates between all pairwise
    combinations of cells in 'Matrix'
    
    alleles: Dictionary used to stored lists of alleles population size for each population. 
    Updated every generation. 
    
    pop_list: List containing all populations currently involved in simulations.
    Created at the start of every generation. See 'cline' function.
    
    Matrix: m x n dimensional matrix, initialized at the outset of each simulation. 
    Contains empty cells that may become filled with populations. 
            
    ## USED IN FUNCTIONS ##
    
    1. Cline
    ''' 
    if len(pop_list) > 1: # Migration only occurs if there is more than one population
        to_pop = (np.where(Matrix == int(Akey))[0][0], np.where(Matrix == int(Akey))[1][0]) # Location of focal population in matrix. 
        Mig_Alleles = [] # Create list that will store potential migratory alleles (from all source populations)
        for i in pop_list: # For every population in existence
            if Akey == i: # Do not migrate within populations
                pass
            else:
                from_pop = (np.where(Matrix == int(i))[0][0], np.where(Matrix == int(i))[1][0]) # Location of source population in matrix
                con = str(to_pop) + '.' + str(from_pop) # Create concatenated string from current focal population and source population
                mig_ind = int(math.ceil(Distance_Dic[con][1]*alleles[i]['S'][0])) # Calculate the number of alleles to migrate from migration rate and population size. Determined from source population. NOTE ROUNDING
                Mig_Alleles.append([random.choice(alleles[i]['B']) for _ in range(mig_ind)])         
        locus_B_keep = (sample_population_B(Avalue['B'], Avalue['S'][0] - mig_ind)) # Create list of alleles in focal population to be kept. Random sample of existing alleles. Complement of number of migratory individuals
        Mig_Alleles = [val for sublist in Mig_Alleles for val in sublist] # Flatten'Mig_Alleles' list
        locus_B_rep = [random.choice(Mig_Alleles) for _ in range(mig_ind)] # Create list of replacement alleles by sampling alleles from list containing all migratory alleles from source populations. 
        Avalue['Bm'] = locus_B_keep + locus_B_rep # Combine kept and replaced alleles into single list.  
        return Avalue['Bm']
    else:
        #If there is only one population, then resampling of existing alleles occurs (i.e. drift)
        Avalue['Bm'] = sample_population_B(Avalue['Bm'], Avalue['S'][0])
        return Avalue['Bm']

# From list containing alleles, calculate the frequency of 'A' or 'B' allele. 
def allele_freq(locus):
    '''
    ## PARAMETERS ##
    
    locus: List containing alleles
    
    ## USED IN FUNCTIONS ##
    
    1. cline
    '''
    p = sum(1*i.isupper() for i in locus)/float(len(locus))
    return p

# Function for logistic population growth. Takes current population size (from alleles dictionary) 
# as input and return new population size. 
def pop_growth(r, Akey, Avalue, alleles, K):
    '''
    ## PARAMETERS ##
    
    r: Maximum per capita population growth rate
    
    Akey: Index used to cycle through keys in alleles dictionary. 
    Note keys correspond to populations. See 'cline' function.  
    
    Avalue: Index used to cycle through values in alleles dictionary. See 'cline function. 
    
    alleles: Dictionary used to stored lists of alleles population size for each population. 
    Updated every generation.
    
    K: Carrying capacity (i.e. maximum sustainable population size)
    
    ## USED IN FUNCTIONS ##
    
    1. cline
    '''
    size = Avalue['S'][0] # Retrieve size of population. 'Akey' allows indexing of alleles dictionary in cline function
    d = (r - 1)/K # Calculates the proportional reduction of population growth rate based on desired carrying capacity ('K'). At 'K', growth rate = 1 = no change
    R = r - (d*size) # Calculate modified growth rate. 
    new_size = [int(math.ceil(R*size))] # Calculate new population size
    return new_size # Returns new population size

# Simple bottleneck function. 
def bottle(bot, Akey, Avalue):
    '''
    ## PARAMETERS ##
    
    bot: Desired bottleneck proportion
    
    Akey: Index used to cycle through keys in alleles dictionary. 
    Note keys correspond to populations. See 'cline' function.  
    
    Avalue: Index used to cycle through values in alleles dictionary. See 'cline function. 
    
    ## USED IN FUNCTIONS ##
    
    1. create_population_2
    '''
    return int(math.ceil(bot*Avalue['S'][0]))

# Create new population as empty list and add to 'pops' dictionary. Also create four lists of alleles
# sampled from pool of alleles from population that generated the new one. Alleles added to 'alleles'
# dictionary. Also adds population to matrix. This function first evaluates whether a population will 
# be created then randomly selects a vacant neighboring cell where population will go. If no cells are
# vacant, the function passes.
def create_population_2(p, Akey, Avalue, pops, alleles, bot, Matrix): 
    '''
    ## PARAMETERS ##
    
    p: Desired probability of creating a new population
    
    Akey: Index used to cycle through keys in alleles dictionary. 
    Note keys correspond to populations. See 'cline' function.  
    
    Avalue: Index used to cycle through values in alleles dictionary. See 'cline function. 
    
    pops: Dictionary containing information (e.g. allele frequencies) of each population
    Updated at the end of every generation.
    
    alleles: Dictionary used to stored lists of alleles population size for each population. 
    Updated every generation.
    
    bot: Desired bottleneck proportion. 
    
    Matrix: m x n dimensional matrix, initialized at the outset of each simulation. 
    Contains empty cells that may become filled with populations. 
    
    ## USED IN FUNCTIONS ##
    
    1. cline
    '''
    if not alleles['1']['A']: #If there are no alleles for first population, pass. Only valid for first iteration when the populations have yet to be initialized
    #print 'There are no populations from which to sample!!'
        pass
    else:
        prob_list = (['1'] * int(10*p) ) + (['0'] * int(round(10*(1-p)))) #List with 0's and 1's based on argument 'p'
        create = [random.choice(prob_list) for _ in range(1)] #Randomly select an element from 'prob_list'
        if create[0] == '1': #If a '1' is sampled, create population
            x, y = np.where(Matrix == int(Akey))[0][0], np.where(Matrix == int(Akey))[1][0]
            X, Y = (len(Matrix) - 1), (len(Matrix) - 1)
            # Create list containijng all neighboring cells
            Nlist = [(x2, y2) for x2 in range(x-1, x+2)
                for y2 in range(y-1, y+2)
                    if (-1 < x <= X and
                    -1 < y <= Y and
                    (x != x2 or y != y2) and
                    (0 <= x2 <= X) and
                    (0 <= y2 <= Y))]
            # Reduced list containing only neighboring cells that lack a population
            Nlist_red = []
            for item in Nlist:
                i, j = item[0], item[1]
                if Matrix[i, j] == 0:
                    Nlist_red.append(item)
            # If all neighboring cells are occupied, pass
            if not Nlist_red:
                pass
            else:
            # Otherwise, select and empty neighboring cell at random and place new population
                Nsam = random.randint(0, len(Nlist_red) - 1)
                i, j = Nlist_red[Nsam][0], Nlist_red[Nsam][1] 
                while True: # CAN PROBABLY DELETE WHILE LOOP. NO LONGER NEEDED. 
                    if Matrix[i, j] == 0:
                        global pop_counter #global variable that tracks the number of populations created
                        pop_counter += 1 #Increment 'pop_counter' by 1 if population is being created. 
                        #Add two lists to 'alleles' dictionary ('A' and 'B'). Naming: 'Pop.number' 
                        alleles['{0}'.format(pop_counter)] = {'A':sample_population_A(Avalue['A'], bottle(bot, Akey, Avalue)),'B':sample_population_B(Avalue['B'], bottle(bot, Akey, Avalue)), 'S':[bottle(bot, Akey, Avalue)],'Am':sample_population_A(Avalue['A'], bottle(bot, Akey, Avalue)),'Bm':sample_population_B(Avalue['B'], bottle(bot, Akey, Avalue))}
                        pops['{0}'.format(pop_counter)] = [] #Empty list for new population. Naming same as alleles.
                        Matrix[i, j] = pop_counter
                        break
                    Nsam = random.randint(0, len(Nlist) - 1)
                    i, j = Nlist[Nsam][0], Nlist[Nsam][1] 
        else:
            pass
        
# Given the frequencies of 'A' and 'B' alleles, return the frequency of the 'acyanogenic' phenotype (i.e. recessive
# at either the A locus, B locus, or both) 
def phenotype(pA, pB):
    '''
    ## PARAMETERS ##
    
    pA: Frequency of 'A' allele
    
    pB: Frequency of 'B' allele
    
    ## USED IN FUNCTIONS ##
    
    1. cline
    '''
    qA = 1-pA
    qB = 1-pB
    mut= qA**2 + qB**2 - (qA**2 * qB**2)
    WT = 1-mut
    return mut # Frequency of acyanogenic phenotype

# Cline function. Every generation, alleles are exchanged among populations. Populations follow
# logistic population growth. Ever generation, every population has some probability (p) of generating a new population, with alleles
# sampled from the population that created it.
def cline(locus_A, locus_B, steps, N, p, pops, alleles, bot, Matrix, K):
    '''
    ## PARAMETERS ##
    
    locus_A: List containing alleles 'a' and 'A'
    
    locus_B: List containing alleles 'b' and 'B'
    
    steps: Number of generations  
    
    N: Number of alleles to sample (i.e. population size). In this case, starting population size.
    
    p: Desired probability of creating a new population
    
    pops: Dictionary containing information (e.g. allele frequencies) of each population
    Updated at the end of every generation.
    
    alleles: Dictionary used to stored lists of alleles population size for each population. 
    Updated every generation.
    
    bot: Desired bottleneck proportion. 
    
    Matrix: m x n dimensional matrix, initialized at the outset of each simulation. 
    Contains empty cells that may become filled with populations. 
    
    K: Desired carrying capacity. 
    
    ## USED IN FUNCTIONS ##
    
    1. simulate
    '''
    for i in range(steps):
        pop_list = pops.keys()
        for Akey, Avalue in alleles.items():
            if Akey in pops.keys():
                if 'A' and 'B' in Avalue.keys():
                    if not Avalue['A'] and not Avalue['B']: 
                        #If allele lists are empty, sample from list of initial allele frequencies. Only used for first generation
                        Avalue['S'] = [N]
                        Avalue['Am'] = (sample_population_A(locus_A, N))
                        Avalue['Bm'] = (sample_population_B(locus_B, N))
                    else:
                        #If allele lists are not empty, sample from previously sampled set of alleles. 
                        Avalue['S'] = pop_growth(r, Akey, Avalue, alleles, K)
                        Avalue['Am'] = (migrate_A(Akey, Avalue, Distance_Dic, alleles, pop_list, Matrix))
                        Avalue['Bm'] = (migrate_B(Akey, Avalue, Distance_Dic, alleles, pop_list, Matrix))      
                create_population_2(p, Akey, Avalue, pops, alleles, bot, Matrix) #Create population. Alleles will be sampled (see above). Population is currently empty list
        for Akey, Avalue in alleles.items():
            Avalue['A'] = Avalue['Am']
            Avalue['B'] = Avalue['Bm']
            #Calculate allele and phetype frequencies for every population, including newly created ones. 
            pA = allele_freq(Avalue['A']) 
            pB = allele_freq(Avalue['B'])
            pops[Akey].append([Avalue['S'][0], i, pA, pB, phenotype(pA, pB)])
    #return Matrix, pops

        
# Using the functions defined above, 'simulate' performs 'sims' iterations of the cline function -- simulating 
# the combined effects of drift and migration in a spatially explicit framework  -- each time storing the results.
def simulate(pA, pB, steps, N, sims):
    '''
    ## PARAMETERS ##
    
    pA: Initial frequency of 'A' alleles.
    
    pB: Initial frequency of 'B' alleles.
    
    steps: Number of generations  
    
    N: Number of alleles to sample (i.e. population size). In this case, starting population size.
    
    sims: Number of iterations. 
    '''
    qA = 1-pA # Frequency of 'a' allele
    qB = 1-pB
    # Make the two lists based on the allele frequency to represent the initial population
    locus_A = (['A'] * int(N*pA) ) + (['a'] * int(round(N*qA)) ) # [A,A,A,A,a,a,a,a,....]
    locus_B = (['B'] * int(N*pB) ) + (['b'] * int(round(N*qB)) ) 
    ####### sims simulations #####################
    # We will simulate 'steps' iterations of resampling this population to simulate drift
    # We will then repeat that simulation of 'steps' iterations 1000 times to get a mean
    ##############################################
    for s in range(sims):
        pops = OrderedDict({'1':[]}) # Re-initialize dictionary to store populations
        alleles = OrderedDict({'1':{'A':[],'B':[],'S':[],'Am':[],'Bm':[]}}) # Re-initialize dictionary to store allele lists
        Matrix = np.zeros((5,5), dtype = 'int')
        Distance_Mig(Matrix, y_int)
        i,j = random.randint(0, len(Matrix) - 1), random.randint(0, len(Matrix) - 1)
        Matrix[i,j] = 1
        global pop_counter # Reset population counter
        pop_counter = 1
        # reset the population for each iteration. I don't actually think this is necessary
        locus_A = (['A'] * int(N*pA) ) + (['a'] * int(round(N*qA)) )  # Re-initialize initial allele lists.
        locus_B = (['B'] * int(N*pB) ) + (['b'] * int(round(N*qB)) ) 
        cline(locus_A,locus_B, steps, N, p, pops, alleles, bot, Matrix, K) # Run cline function
        global sim
        sim[s] = pops # Append results to global 'sim' dictionary
        


In [2]:
y_int = 0.3 # Migration rate when distance = 0
K = 1000 # Carrying capacity
N = 100 # Starting population size (i.e. sample 10 alleles)
pA = 0.5 # Initial frequency of 'A' allele
pB = 0.5 # Initial frequency of 'B' allele
qA = 1 - pA # 'a' allele
qB = 1 - pB # 'b' allele
steps = 10 # Number of generations
sims = 2 # Number of simulations
bot = 0.1 # Bottleneck proportion
pop_counter = 1 # Count of number of populations being created. Modified by 'create_population'. Used in naming populations
p = 1 # Probability of creating a population. 
r = float(2) # Growth rate
sim = {} # Global dictionary used for storing results of each iteration
simulate(pA, pB, steps, N, sims) # Simulate function. 
sim # Output: Sim: Pop: [N, Gen, pA, pB, Phen]

{0: OrderedDict([('1',
               [[100, 0, 0.49, 0.47, 0.46793791],
                [190,
                 1,
                 0.5368421052631579,
                 0.4789473684210526,
                 0.42777108524336066],
                [344,
                 2,
                 0.5668604651162791,
                 0.4941860465116279,
                 0.39545805147358315],
                [570,
                 3,
                 0.5403508771929825,
                 0.47368421052631576,
                 0.42976005402045725],
                [816,
                 4,
                 0.5416666666666666,
                 0.4571078431372549,
                 0.4428871731984428],
                [967,
                 5,
                 0.5584281282316442,
                 0.45191313340227507,
                 0.4368113749937109],
                [999,
                 6,
                 0.5505505505505506,
                 0.4544544544544545,
                 0.43950409069682],


In [3]:
DataFrame = []
Colnames = ["Sim","Population","Pop_size","Generation","pA","pB","Phen"]
for i in sim.keys():
    for j, x in sim[i].items():
        for z in x:
            DataFrame.append([i, j, z[0], z[1], z[2], z[3], z[4]])

In [4]:
Test = pd.DataFrame(DataFrame, columns = Colnames)

In [5]:
Test

Unnamed: 0,Sim,Population,Pop_size,Generation,pA,pB,Phen
0,0,1,100,0,0.490000,0.470000,0.467938
1,0,1,190,1,0.536842,0.478947,0.427771
2,0,1,344,2,0.566860,0.494186,0.395458
3,0,1,570,3,0.540351,0.473684,0.429760
4,0,1,816,4,0.541667,0.457108,0.442887
5,0,1,967,5,0.558428,0.451913,0.436811
6,0,1,999,6,0.550551,0.454454,0.439504
7,0,1,1000,7,0.543000,0.454000,0.444704
8,0,1,1000,8,0.548000,0.454000,0.441514
9,0,1,1000,9,0.546000,0.479000,0.421609


In [55]:
%reset

Once deleted, variables cannot be recovered. Proceed (y/[n])? y


In [2]:
pops = OrderedDict({'1':[]}) # Re-initialize dictionary to store populations
alleles = OrderedDict({'1':{'A':[],'B':[],'S':[10],'Am':[],'Bm':[]}}) # Re-initialize dictionary to store allele lists
Matrix = np.zeros((10,10), dtype = 'int')
y_int = 0.3
Distance_Mig(Matrix, y_int)
i,j = random.randint(0, len(Matrix) - 1), random.randint(0, len(Matrix) - 1)
Matrix[i,j] = 1
K = 10
N = 10 # Starting population size (i.e. sample 10 alleles)
pA = 0.5 # Initial frequency of 'A' allele
pB = 0.5 # Initial frequency of 'B' allele
qA = 1 - pA # 'a' allele
qB = 1 - pB # 'b' allele
steps = 10 # Number of generations
bot = 1
pop_counter = 1 # Count of number of populations being created. Modified by 'create_population'. Used in naming populations
p = 0.3 # Probability of creating a population. 
locus_A = (['A'] * int(N*pA) ) + (['a'] * int(round(N*qA)) ) # [A,A,A,A,a,a,a,a,....]
locus_B = (['B'] * int(N*pB) ) + (['b'] * int(round(N*qB)) )
r = 2.0

In [3]:
cline(locus_A, locus_B, steps, N, p, pops, alleles, bot, Matrix, K)

In [4]:
Matrix

array([[0, 0, 0, 0, 3, 0, 0, 0, 0, 0],
       [0, 0, 0, 2, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

In [5]:
for Akey, Avalue in alleles.items():
    print Avalue['A'], allele_freq(Avalue['A'])

['A', 'A', 'A', 'A', 'A', 'A', 'A', 'a', 'A', 'a'] 0.8
['A', 'a', 'A', 'A', 'A', 'A', 'A', 'A', 'a', 'A'] 0.8
['a', 'a', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'] 0.8


In [9]:
Distance_Dic['(1, 4).(1, 3)'][1] * allele_freq(alleles['2']['A'])

0.22400000000000003

In [38]:
Distance_Dic['(4, 1).(4, 0)'][1]

0.28

In [174]:
pop_list = pops.keys()
Back_Mig = {}
SumF = {}
Mig_Alleles = {}
Ns = {}
for Akey, Avalue in alleles.items():
    to_pop = (np.where(Matrix == int(Akey))[0][0], np.where(Matrix == int(Akey))[1][0]) # Location of focal population in matrix. 
    SumF[Akey] = []
    Back_Mig[Akey] = []
    Mig_Alleles[Akey] = []
    Ns[Akey] = []
    for i in pop_list: # For every population in existence
        if Akey == i: # Do not migrate within populations
            pass
        else:
            from_pop = (np.where(Matrix == int(i))[0][0], np.where(Matrix == int(i))[1][0]) # Location of source population in matrix
            con = str(to_pop) + '.' + str(from_pop) # Create concatenated string from current focal population and source population
            SumF[Akey].append(Distance_Dic[con][1] * alleles[i]['S'][0])
            SumF[Akey] = [sum(SumF[Akey])]
            Back_Mig[Akey] = (Distance_Dic[con][1] * alleles[i]['S'][0])/((alleles[Akey]['S'][0]*(1 - meanF)) + SumF[Akey][0])
            Ns[Akey].append(alleles[i]['S'][0])
            Mig_Alleles[Akey].append(Distance_Dic[con][1] * alleles[i]['S'][0] * allele_freq(alleles[i]['A']))
    Mig_Alleles[Akey].append((1 - Back_Mig[Akey]) * alleles[Akey]['S'][0] * allele_freq(alleles[Akey]['A']))
    Ns[Akey].append(alleles[Akey]['S'][0]) 
    Ns[Akey] = [sum(Ns[Akey])]
    Mig_Alleles[Akey] = [sum(Mig_Alleles[Akey])]
            #mig_ind = int(math.ceil(Distance_Dic[con][1]*alleles[i]['S'][0])) # Calculate the number of alleles to migrate from migration rate and population size. Determined from source population. NOTE ROUNDING
            #Mig_Alleles.append([random.choice(alleles[i]['A']) for _ in range(mig_ind)])         
    #locus_A_keep = (sample_population_A(Avalue['A'], Avalue['S'][0] - mig_ind)) # Create list of alleles in focal population to be kept. Random sample of existing alleles. Complement of number of migratory individuals
    #Mig_Alleles = [val for sublist in Mig_Alleles for val in sublist] # Flatten'Mig_Alleles' list
    #locus_A_rep = [random.choice(Mig_Alleles) for _ in range(mig_ind)] # Create list of replacement alleles by sampling alleles from list containing all migratory alleles from source populations. 
    #Avalue['Am'] = locus_A_keep + locus_A_rep # Combine kept and replaced alleles into single list. 


In [239]:
((1 - 0.209)*(0.3)) + (0.2 * 0.6) + (0.15 * 0.9)

0.4923

In [237]:
(100 * (0.21 + 0.208))/(100 + 150 + 210)

0.0908695652173913

In [205]:
(0.2 * 100)/((100 * (1 - 0.17)) + 61.5)

0.2179930795847751

In [91]:
values = []
for key, value in Distance_Dic.items():
    values.append(value[1])
meanF = sum(values)/len(values)

In [28]:
for Akey, Avalue in alleles.items():
    print Avalue['Am'],allele_freq(Avalue['Am'])

['a', 'a', 'a', 'A', 'a', 'a', 'a', 'a', 'a', 'A'] 0.2
['A', 'a', 'A', 'a', 'a', 'a', 'A', 'a', 'a', 'a'] 0.3
['a', 'A', 'A', 'A', 'a', 'a', 'A', 'A', 'a', 'a'] 0.5
['A', 'a', 'A', 'A', 'a', 'a', 'A', 'a', 'a', 'a'] 0.4
['a', 'A', 'A', 'A', 'A', 'A', 'a', 'a', 'a', 'A'] 0.6
['a', 'a', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'a'] 0.7
['A', 'A', 'A', 'a', 'a', 'A', 'a', 'a', 'a', 'a'] 0.4
['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'] 0.0
['a', 'a', 'A', 'A', 'a', 'a', 'A', 'a', 'a', 'a'] 0.3
['a', 'a', 'a', 'a', 'a', 'a', 'a', 'A', 'a', 'A'] 0.2
['A', 'a', 'a', 'a', 'a', 'A', 'A', 'a', 'A', 'a'] 0.4
['A', 'A', 'A', 'A', 'A', 'A', 'A', 'a', 'a', 'a'] 0.7
['A', 'A', 'a', 'a', 'a', 'a', 'A', 'a', 'A', 'a'] 0.4
['a', 'a', 'A', 'A', 'A', 'a', 'a', 'A', 'a', 'a'] 0.4
['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'A'] 0.1


In [130]:
Mig_Alleles

['a', 'A', 'a', 'A', 'A', 'A', 'a', 'A', 'A', 'A', 'a', 'a', 'a', 'a', 'A']

In [12]:
list_of_candidates = ['A','a']
number_of_items_to_pick = 100
probability_distribution= [0.6, 0.4]
draw = choice(list_of_candidates, number_of_items_to_pick, p=probability_distribution)
print draw

['A' 'a' 'a' 'A' 'a' 'A' 'A' 'A' 'A' 'A' 'A' 'a' 'A' 'a' 'a' 'a' 'a' 'A'
 'A' 'A' 'A' 'a' 'A' 'A' 'a' 'A' 'a' 'a' 'a' 'A' 'a' 'A' 'A' 'A' 'a' 'a'
 'A' 'a' 'A' 'A' 'A' 'a' 'A' 'A' 'A' 'a' 'A' 'A' 'a' 'a' 'A' 'a' 'a' 'A'
 'A' 'a' 'a' 'A' 'A' 'A' 'A' 'A' 'A' 'A' 'A' 'A' 'a' 'a' 'a' 'A' 'A' 'A'
 'A' 'A' 'A' 'a' 'A' 'A' 'a' 'a' 'a' 'a' 'A' 'A' 'A' 'a' 'a' 'a' 'a' 'A'
 'a' 'a' 'A' 'A' 'A' 'a' 'A' 'A' 'A' 'A']


In [8]:
for i, v in alleles.items():
    print v['A']

['a', 'A', 'A', 'A', 'A', 'A', 'a', 'A', 'A', 'A']
['A', 'a', 'A', 'A', 'A', 'a', 'A', 'a', 'A', 'a']
['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A']
['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A']
['a', 'a', 'a', 'A', 'a', 'A', 'a', 'a', 'A', 'A']
['a', 'a', 'a', 'A', 'a', 'A', 'A', 'A', 'a', 'A']
['A', 'a', 'A', 'A', 'a', 'a', 'A', 'A', 'a', 'A']
['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A']
['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A']
['A', 'A', 'a', 'A', 'A', 'a', 'A', 'A', 'A', 'A']
['A', 'A', 'A', 'A', 'a', 'A', 'a', 'A', 'A', 'A']
['a', 'A', 'A', 'A', 'a', 'a', 'a', 'A', 'a', 'A']
['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A']
['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A']
['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A']
['A', 'a', 'A', 'A', 'A', 'A', 'a', 'A', 'A', 'A']
['A', 'A', 'A', 'a', 'A', 'A', 'A', 'a', 'A', 'A']
['A', 'A', 'A', 'A', 'a', 'A', 'A', 'A', 'a', 'A']


In [None]:
def migrate_A(Akey, Avalue, Distance_Dic, alleles, pop_list, Matrix):
    if len(pop_list) > 1: # Migration only occurs if there is more than one population
        to_pop = (np.where(Matrix == int(Akey))[0][0], np.where(Matrix == int(Akey))[1][0]) # Location of focal population in matrix. 
        Mig_Alleles = [] # Create list that will store potential migratory alleles (from all source populations)
        for i in pop_list: # For every population in existence
            if Akey == i: # Do not migrate within populations
                pass
            else:
                from_pop = (np.where(Matrix == int(i))[0][0], np.where(Matrix == int(i))[1][0]) # Location of source population in matrix
                con = str(to_pop) + '.' + str(from_pop) # Create concatenated string from current focal population and source population
                mig_ind = int(math.ceil(Distance_Dic[con][1]*alleles[i]['S'][0])) # Calculate the number of alleles to migrate from migration rate and population size. Determined from source population. NOTE ROUNDING
                Mig_Alleles.append([random.choice(alleles[i]['A']) for _ in range(mig_ind)])         
        locus_A_keep = (sample_population_A(Avalue['A'], Avalue['S'][0] - mig_ind)) # Create list of alleles in focal population to be kept. Random sample of existing alleles. Complement of number of migratory individuals
        Mig_Alleles = [val for sublist in Mig_Alleles for val in sublist] # Flatten'Mig_Alleles' list
        locus_A_rep = [random.choice(Mig_Alleles) for _ in range(mig_ind)] # Create list of replacement alleles by sampling alleles from list containing all migratory alleles from source populations. 
        Avalue['Am'] = locus_A_keep + locus_A_rep # Combine kept and replaced alleles into single list.  
        return Avalue['Am']
    else:
        #If there is only one population, then resampling of existing alleles occurs (i.e. drift)
        Avalue['Am'] = sample_population_A(Avalue['Am'], Avalue['S'][0])
        return Avalue['Am']