In [1]:
import pandas as pd
import numpy as np

from IPython.display import display, display_html

from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go

from simulator import *
from simulator_plotting import *

init_notebook_mode(connected=True)

https://nbviewer.jupyter.org/url/connor.jp/UCLA/Simulator.ipynb

https://nbviewer.jupyter.org/url/jops.bol.ucla.edu/New-Dataset.ipynb

https://nbviewer.jupyter.org/url/jops.bol.ucla.edu/18-05-31/Notebook.ipynb

## Which trajectory does a population follow?

The **greedy path** can be determined by applying a simple algorithm to the fitness landscape. Start at the seed genotype and choose its fittest (highest growth rate) mutational neighbor. Continue choosing fittest mutational neighbors until the global optimum is reached, or until a mutational neighbor with higher fitness than the current genotype can no longer be found. In the latter case, a local optimum has been reached.

Through modifications to the simulator code, an "**actual path**" can be determined. During the mutation phase of the simulation, determine whether each mutant indicates the first appearance of a genotype. If a genotype is appearing for the first time, store the genotype it mutated from in an array. After the simulation has finished running, generate a path by starting from the dominant genotype and adding the genotype it mutated from, and then the genotype that genotype mutated from, and so on, until the seed is reached.

Often, during a simulation the population will simply follow the trajectory of the greedy path (case 1). Sometimes a path leading to a local optimum will out-compete the greedy path, and so the "actual path" represents the true evolutionary trajectory of the population (case 2). Other times the greedy path and the "actual path" will differ, but both lead to the global optimum (case 3). In this case it can be said that the true trajectory of the population consists of the greedy path and the "actual path" being followed in parallel. Since both paths are being followed, re-running the simulation multiple times will likely lead to some proportion of the resulting "actual paths" being equivalent to the greedy path.

In sum, plotting the greedy path and the "actual path" will likely suffice in getting a sense of the evolutionary trajectory of the population. This is now the default for the `plot_simulation()` function.

### Case 1: The greedy path and the "actual path" are the same

*P. vivax* treated with 53.60μM of pyrimethamine starting from seed 0000:

In [108]:
landscape = dataset1[2].loc['53.60μM'].tolist()
results = simulate(landscape)
plot_simulation(results)
print('Greedy path: ' + str(results['greedy_path']))
print('Actual path: ' + str(results['actual_path']))

Greedy path: ['0000', '0010', '0110', '1110']
Actual path: ['0000', '0010', '0110', '1110']


### Case 2: The "actual path" leads to a local optimum

*P. vivax* with no drug treatment starting from seed 0011:

In [54]:
landscape = dataset1[2].loc['No drug'].tolist()
results = simulate(landscape, seed='0011')
plot_simulation(results)
print('Greedy path: ' + str(results['greedy_path']))
print('Actual path: ' + str(results['actual_path']))

Greedy path: ['0011', '0111', '0110', '1110']
Actual path: ['0011', '0001', '0000']


### Case 3: The greedy path and the "actual path" are explored simultaneously

*P. falciparum* treated with 1μM of pyrimethamine starting from seed 1001:

In [72]:
landscape = dataset1[0].loc['1μM'].tolist()
results = simulate(landscape, seed='0001')
plot_simulation(results)
print('Greedy path: ' + str(results['greedy_path']))
print('Actual path: ' + str(results['actual_path']))

Greedy path: ['0001', '1001', '1011', '1010', '1110']
Actual path: ['1001', '0001', '0101', '0111', '0110', '1110']


## The effects of population size on switching

By default we have been running simulations with the carrying capacity (and initial population of the seed genotype) set to 10^9.

### AMC and AM

Let us consider switching between between AMC and AM at a frequency of 100 timesteps.

In [70]:
AMC = dataset2.loc['AMC'].tolist()
AM = dataset2.loc['AM'].tolist()

results = simulate([AMC, AM], frequency=100)
plot_simulation(results)
print('Time to fixation: ' + str(results['T_f']))

Time to fixation: 434


We get a time to fixation of about 430 timesteps. Increasing the switching frequency any further appears to be detrimental:

In [129]:
results = simulate([AMC, AM], frequency=50)
plot_simulation(results)
print('Time to fixation: ' + str(results['T_f']))

Time to fixation: 773


But if we reduce the population size by an order of magnitude, to 10^8, the lower frequency is viable:

In [139]:
results = simulate([AMC, AM], frequency=50, carrying_cap=int(1.0e8), prob_mutation=1.0e-7)
plot_simulation(results)
print('Time to fixation: ' + str(results['T_f']))

Time to fixation: 360


Reducing the carrying capacity further continues to yield lower times to fixation:

In [14]:
results = simulate([AMC, AM], frequency=50, carrying_cap=int(1.0e7), prob_mutation=1.0e-6)
plot_simulation(results)
print('Time to fixation: ' + str(results['T_f']))

Time to fixation: 259


In [26]:
results = simulate([AMC, AM], frequency=50, carrying_cap=int(1.0e6), prob_mutation=1.0e-5)
plot_simulation(results)
print('Time to fixation: ' + str(results['T_f']))

Time to fixation: 157


### FEP and CAZ

Another example is alternating between FEP and CAZ every 100 timesteps. The problem is that running this simulation is highly variable.

In [114]:
FEP = dataset2.loc['FEP'].tolist()
CAZ = dataset2.loc['CAZ'].tolist()
results = simulate([FEP, CAZ], frequency=100)
plot_simulation(results)
display(results['T_f'])

487

We can produce a histogram of the results of multiple simulations.

In [31]:
x = []
for i in range(100):
    t = simulate([FEP, CAZ], frequency=100)['T_f']
    if t != -1:
        x.append(t)
    else:
        x.append(0)
trace = go.Histogram(x=x, xbins=dict(start=0, size=100, end=1200))

layout = go.Layout(
    title="Time to fixation during 100 simulations"
)

fig = go.Figure(data=[trace], layout=layout)
iplot(fig)

Reducing the carrying capacity to 10^7 and the frequency to 50 yields improved results. Most of the simulations still produce a time to fixation between 400 and 499, but there is more of a skew to times to fixation below that bin as well.

In [55]:
plot_simulation(simulate([FEP, CAZ], frequency=50, carrying_cap=int(1.0e7), prob_mutation=1.0e-6))

In [56]:
x = []
for i in range(100):
    t = simulate([FEP, CAZ], frequency=50, carrying_cap=int(1.0e7), prob_mutation=1.0e-6)['T_f']
    if t != -1:
        x.append(t)
    else:
        x.append(0)
trace = go.Histogram(x=x, xbins=dict(start=0, size=100, end=1200))

layout = go.Layout(
    title="Time to fixation during 100 simulations"
)

fig = go.Figure(data=[trace], layout=layout)
iplot(fig)

At a carrying capacity of 10^6, returning to a frequency of 100 timesteps actually yields the best results.

In [68]:
plot_simulation(simulate([FEP, CAZ], frequency=100, carrying_cap=int(1.0e6), prob_mutation=1.0e-5))

In [61]:
x = []
for i in range(100):
    t = simulate([FEP, CAZ], frequency=50, carrying_cap=int(1.0e6), prob_mutation=1.0e-5)['T_f']
    if t != -1:
        x.append(t)
    else:
        x.append(0)
trace = go.Histogram(x=x, xbins=dict(start=0, size=100, end=1200))

layout = go.Layout(
    title="Time to fixation during 100 simulations"
)

fig = go.Figure(data=[trace], layout=layout)
iplot(fig)

We have already established that reducing the carrying capacity tends to reduce the time to fixation. When running simulations that involve switching, times to fixation can be further reduced by the fact that we can also increase switching frequency at smaller population sizes. However, this needs to be explored on a case-by-case basis--sometimes a higher frequency is detrimental even at low carrying capacities for unclear reasons.

## Very small population sizes

Even at low carrying capacities, the dominant genotype remains proportionally much higher than all other genotypes (remember that the y-axis is log scale):

In [102]:
landscape = dataset1[2].loc['53.60μM'].tolist()
results = simulate(landscape, carrying_cap=int(1.0e4), prob_mutation=1.0e-3, timesteps=2000)
plot_simulation(results, genotype_strings(16))
abundances = [v[-1] for v in results['trace'].values()]
max(abundances) / sum(abundances)

0.9816018398160185