### Optimizing promoter strengths for phix-174
In this script, I will optimize the promoter strengths for each of the three promoter sites in Phix_174. 

#### Main Idea
1. Define a unique normal distribution (**~N(u,o)**) for each of the three promoters (pA, pB, pD). Let the promoter strength be **e^u_value** and step size be the st. deviation **o**. 

2. Define a normal distribution (**~N(0.5,?)**) to describe **lambda**. Here, **lambda** represents the value by which the step size will be scaled.

3. Randomly select a promoter to optimize (pA, pB, pD). Feed promoter values into pinetree & calculate RMSE. If the new error (**NE**) is less than the old error (**OE**), then keep the promoter value. If not, increment the promoter value by the step size (add **e^o**). Keep repeating with randomly selected **o** value until **NE** < **OE**. *Stop after 5 iterations & move onto the next promoter value.*

#### Key Attributes
1. 

#### Terminology/Naming Conventions

##### Import Packages & Set Directories

In [None]:
import pandas as pd
import numpy as np
import pinetree as pt
from sklearn.metrics import mean_squared_error 
base_dir = "/Users/tanviingle/Documents/Wilke/phix174/"

##### Run Pinetree
This function runs the pinetree simulator given parameters

STEPS: 
1. Feed in promoter values
2. Run pinetree simulator
3. Write file as gen.csv --> store as basedir/output/opt_test/gen_#.csv

In [None]:
def run_pt(gen, pA, pB, pD): #Start with only feeding in values for pA

    print(gen, pA)
    print("Defining PhiX-174 genome")
    
    # Create host cell & genome
    CELL_VOLUME = 1.1e-15 # from T7
    model = pt.Model(cell_volume=CELL_VOLUME)
    phage = pt.Genome(name="phix_174", length=5386)
    
    # Read genomic coordinates from csv into dataframe
    genomic_coords = pd.read_csv(base_dir + "output/" + "genomic_coords.csv")
    print(genomic_coords.at[0, "type"])
    
    
    # Add genomic ns (loop through ^ df); hardcode necessary strengths according to preomtimized_model
    ## for length of genomic_coords, add elements
    #for n in genomic_coords:{
    n = 0
    while(n < len(genomic_coords)):
        
        if genomic_coords.at[n, "type"] == "gene": 
            phage.add_gene(name= genomic_coords.at[n, "type"] + "_" + genomic_coords.at[n, "name"], 
                           start= genomic_coords.at[n, "new_start"], 
                           stop= genomic_coords.at[n, "new_end"],
                           rbs_start=genomic_coords.at[n, "new_start"], 
                           rbs_stop=genomic_coords.at[n, "new_start"] + 15, rbs_strength=1e7) 

        elif genomic_coords.at[n, "type"] == "promoter" and genomic_coords.at[n, "name"] == "A":
            phage.add_promoter(name= genomic_coords.at[n, "type"] + "_" + genomic_coords.at[n, "name"], 
                               start= genomic_coords.at[n, "new_start"], 
                               stop= genomic_coords.at[n, "new_end"],
                               interactions={"ecolipol": pA})

        elif genomic_coords.at[n, "type"] == "promoter" and genomic_coords.at[n, "name"] == "B1":
            phage.add_promoter(name= genomic_coords.at[n, "type"] + "_" + genomic_coords.at[n, "name"], 
                               start= genomic_coords.at[n, "new_start"], 
                               stop= genomic_coords.at[n, "new_end"],
                               interactions={"ecolipol": pB})

        elif genomic_coords.at[n, "type"] == "promoter" and genomic_coords.at[n, "name"] == "D":
            phage.add_promoter(name= genomic_coords.at[n, "type"] + "_" + genomic_coords.at[n, "name"], 
                               start= genomic_coords.at[n, "new_start"], 
                               stop= genomic_coords.at[n, "new_end"],
                               interactions={"ecolipol": pD})

        else:
            print("ignoring pB2")

        n = n+1
    
    print("all genes and promoters added")
    
    # Add terminators manually 
    phage.add_terminator(name="terminator_J", start=2402, stop=2403, # Right before gene F start=2404, stop=3687,
                       efficiency={"ecolipol": 0.7}) # 0.7
    phage.add_terminator(name="terminator_F", start=3796, stop=3797, # Right before gene G start=3798, stop=4325
                     efficiency={"ecolipol": 0.8}) # 0.8
    phage.add_terminator(name="terminator_G", start=4332, stop=4333, # Right before gene H start=4334, stop=5320
                     efficiency={"ecolipol": 0.6}) # 0.6
    phage.add_terminator(name="terminator_H", start=5321, stop=5322, # Right after gene H
                     efficiency={"ecolipol": 0.3}) # 0.3

    print("all terminators added")
    
    # Register genome after promoters/terminators are added
    model.register_genome(phage)
    print("genome is registered")

    # Define interactions
    print("Defining Polymerases & Interactions")
    # Add polymerases & species
    model.add_polymerase(name="ecolipol", speed=35, footprint=35, copy_number=0)
    model.add_species("bound_ecolipol", 1800)  # initialization
    model.add_species("ecoli_genome", 0)
    model.add_species("ecoli_transcript", 0)
    model.add_reaction(1e6, ["ecolipol", "ecoli_genome"], ["bound_ecolipol"]) # 1e7
    model.add_reaction(0.04, ["bound_ecolipol"], ["ecolipol", "ecoli_genome", "ecoli_transcript"])
    model.add_ribosome(10, 30, 100)
    model.add_species("bound_ribosome", 100)
    model.seed(34)
    
    # Run simulation
    print("running simulation")
    model.simulate(time_limit=1200, time_step=5, output= base_dir + "output/opt_test/gen_"+str(gen)+".tsv") # TODO change limit
    print("Simulation successful!")

##### Calculate Error
This script compares a pinetree run to the experimental data
STEPS:
1. Read pinetree run file
2. Use RMSE to calculate error 
3. Return error

In [None]:
def get_error(file):
    sim = pd.read_csv(file, sep = "\t")
    sim = sim.round({'time': 0})
    sim = sim[sim['time'] == 1200.0]
    sim = sim[sim.species.str.match("gene_")]
    sim["norm"] = sim['transcript']/(sim.iloc[0]["transcript"])
    sim["exp"] = [1, 1, 6, 6, 17, 17, 11, 5, 1, 17, 6]
    error = mean_squared_error(sim.exp, sim.norm, squared = False)
    return(error)

##### Define Normal Distributions

In [None]:
uA = 12.21
oA = 2
pA = np.exp(np.random.normal(uA, oA, 1))

uB = 17.7
oB = 3
pB = np.exp(np.random.normal(uB, oB, 1))

uD = 14.5
oD = 2
pD = np.exp(np.random.normal(uD, oD, 1))

l = np.random.normal(0.5, 0.2, 1) # add code to resample if l < 0 or l > 1

In [None]:
# randomly select item from a list of 3
params = ["pA", "pB", "pD"]
print(np.random.choice(params))

In [None]:
gen = 0
error = 1e10
step = np.exp(oA)
params = ["pA", "pB", "pD"]
report_df = pd.DataFrame(columns = ['gen', 'pA', 'pB', 'pD', 'error'])

first_run = {'gen': "NA", 'pA': pA, 'pB': pB, 'pD': pD,'error': error} 
report_df = report_df.append(first_run, ignore_index = True)

old_error = error
new_error = error

no_change = 0

while (old_error >= new_error and no_change <= 5):
    # STEP 1: Randomly select promoter to optimize; if gen == 50, quit. 
    if (gen == 5):
        break
        
    pX = np.random.choice(params)
    # STEP 2: Run pinetree with selected promoter & value
    # Increment pX by oX
    if (pX == "pA"):
        pA = pA + oA 
        try:
            run_pt(gen = gen, pA = pA, pB = pB, pD = pD)
        except:
            
    if (pX == "pB"):
        pB = pB + oB
        run_pt(gen = gen, pA = pA, pB = pB, pD = pD)
    if (pX == "pD"):
        pD = pD + oD
        run_pt(gen = gen, pA = pA, pB = pB, pD = pD)
    
    # STEP 3: Calculate Error
    old_error = new_error
    new_error = get_error(file = base_dir + "output/opt_test/gen_" +str(gen)+".tsv")
    
    # STEP 4: Compare Old Error to New Error; 
    if (new_error > old_error):
        no_change = no_change + 1
        print("new_error = " + str(new_error) + ", old_error = " + str(old_error) + "\n")
        new_error = old_error
        old_error = report_df.at[gen-1, "error"]
        if (pX == "pA"):
            step = oA*np.random.normal(0.5, 0.2, 1)
            pA = report_df.at[gen-1, "pA"] + np.exp(step)
            continue
        if (pX == "pB"):
            step = oB*np.random.normal(0.5, 0.2, 1)
            pB = report_df.at[gen-1, "pB"] + np.exp(step)
            continue
        if (pX == "pD"):
            step = oD*np.random.normal(0.5, 0.2, 1)
            pD = report_df.at[gen-1, "pD"] + np.exp(step)
            continue
        
    new_run = {'gen': gen, 'pA': pA, 'pB': pB, 'pD': pD,'error': error} 
    report_df = report_df.append(new_run, ignore_index = True)
    if (pX == "pA"):
        pA = pA + np.exp(step)
    if (pX == "pB"):
        pB = pB + np.exp(step)
    if (pX == "pD"):
        pD = pD + np.exp(step)
   
    gen = gen + 1
    #display(report_df)
    print(f"\n")
        
#display(report_df)
report_df.to_csv(base_dir + "output/opt_test/relative_report_test.csv")
display(report_df)

#### Weird Pinetree Overlap Errors
Some promoter values cause an overlap error in the simulation. Here I test using try/except to find promoter values to test that do not result in this error 

In [41]:
j = 0
k = 0
pA = np.exp(11.9)

while (j==0 and k < 100):
    print(k)
    try:
        print(pA)
        run_pt(gen = 0, pA = pA, pB = 5e6, pD = 2e5)
        j = 1
    except:
        print("Overlap issue")
        print(f"\n")
        j = 0
        pA = pA + np.exp(6*np.random.normal(0.5, 0.2, 1))
        k = k + 1
        

0
147266.6252405527
0 147266.6252405527
Defining PhiX-174 genome
gene
ignoring pB2
all genes and promoters added
all terminators added
genome is registered
Defining Polymerases & Interactions
running simulation
Overlap issue


1
[147325.74471158]
0 [147325.74471158]
Defining PhiX-174 genome
gene
ignoring pB2
all genes and promoters added
all terminators added
genome is registered
Defining Polymerases & Interactions
running simulation
Overlap issue


2
[147336.00986554]
0 [147336.00986554]
Defining PhiX-174 genome
gene
ignoring pB2
all genes and promoters added
all terminators added
genome is registered
Defining Polymerases & Interactions
running simulation
Overlap issue


3
[147338.53854913]
0 [147338.53854913]
Defining PhiX-174 genome
gene
ignoring pB2
all genes and promoters added
all terminators added
genome is registered
Defining Polymerases & Interactions
running simulation
Overlap issue


4
[147350.93567637]
0 [147350.93567637]
Defining PhiX-174 genome
gene
ignoring pB2
all genes

Overlap issue


40
[148898.79784239]
0 [148898.79784239]
Defining PhiX-174 genome
gene
ignoring pB2
all genes and promoters added
all terminators added
genome is registered
Defining Polymerases & Interactions
running simulation
Overlap issue


41
[148900.13946529]
0 [148900.13946529]
Defining PhiX-174 genome
gene
ignoring pB2
all genes and promoters added
all terminators added
genome is registered
Defining Polymerases & Interactions
running simulation
Overlap issue


42
[148941.39926106]
0 [148941.39926106]
Defining PhiX-174 genome
gene
ignoring pB2
all genes and promoters added
all terminators added
genome is registered
Defining Polymerases & Interactions
running simulation
Overlap issue


43
[148985.83569554]
0 [148985.83569554]
Defining PhiX-174 genome
gene
ignoring pB2
all genes and promoters added
all terminators added
genome is registered
Defining Polymerases & Interactions
running simulation
Overlap issue


44
[149020.20311097]
0 [149020.20311097]
Defining PhiX-174 genome
gene
i

Simulation successful!


In [42]:
run_pt(gen = 0, pA = 2e4, pB = 5e6, pD = 2e5)

0 20000.0
Defining PhiX-174 genome
gene
ignoring pB2
all genes and promoters added
all terminators added
genome is registered
Defining Polymerases & Interactions
running simulation


RuntimeError: Polymerase __ribosome (start: 1520, stop: 1529, index: 4) is overlapping polymerase __ribosome (start: 1527, stop: 1536, index: 5) by more than one position on polymer __rna

In [None]:
"""
gen = 0
error = 1e10
step = np.exp(oA)
report_df = pd.DataFrame(columns = ['gen', 'pA', 'error'])

first_run = {'gen': "NA", 'pA': pA, 'error': error} 
report_df = report_df.append(first_run, ignore_index = True)
   
old_error = error
new_error = error

no_change = 0

# Introduce while loop conditional 
while (old_error >= new_error and no_change <= 5):
    run_pt(gen, pA) # output = file
    old_error = new_error
    new_error = get_error(file = base_dir + "output/opt_test/gen_" +str(gen)+".tsv") # output error
    if(new_error > old_error):
        no_change = no_change + 1
        print("new_error = " + str(new_error) + ", old_error = " + str(old_error) + "\n")
        # this means we've stepped too far
        # back up and use smaller step!
        step = oA*np.random.normal(0.5, 0.2, 1)
        new_error = old_error
        old_error = report_df.at[gen-1, "error"]
        pA = report_df.at[gen-1, "pA"] + np.exp(step)
        continue
    new_run = {'gen': gen, 'pA': pA, 'error': new_error} 
    report_df = report_df.append(new_run, ignore_index = True)
    if gen == 6:
        break
    pA = pA + np.exp(step)
    gen = gen + 1
    #display(report_df)
    print(f"\n")

#display(report_df)
report_df.to_csv(base_dir + "output/opt_test/relative_report_test.csv")
display(report_df)

"""