# Profiling ecModel simulations in Caffeine

This notebook contains tests on the stability of ecModels in the simulations service.

Note: using the requirements from [simulations](https://github.com/DD-DeCaF/simulations), i.e. create a conda env and in it `pip install -r ./requirements/requirements.txt`

Note2: For profiling, warnings from the simulations service were adapted:

* After L243 add `sp = top_protein.iloc[0]`
* Replace L252-255 with `warning = [top_protein, sp]`

Benjamín J. Sánchez, 2020-24-01

## 1. Loading Model and Data

In [1]:
# Add simulations to python's search path:
import sys
sys.path.append('../../simulations/src')

# Import simulation functions:
from simulations.modeling.driven import limit_proteins, bounds, flexibilize_proteomics
from simulations.modeling.simulations import simulate

# Check optlang solvers (GLPK does not work with ecModels):
import optlang
print(optlang.available_solvers)

# Other stuff:
import cobra
import copy
import os
import pandas as pd
import wget

#E. coli model:
wget.download("https://raw.githubusercontent.com/BenjaSanchez/notebooks/master/e_coli_simulations/eciML1515.xml", "eciML1515.xml", bar = False)
ecModel = cobra.io.read_sbml_model("eciML1515.xml")
os.remove("eciML1515.xml")

{'GUROBI': True, 'GLPK': True, 'MOSEK': False, 'CPLEX': False, 'SCIPY': True}
Using license file C:\Users\bejsab\gurobi.lic
Academic license - for non-commercial use only


In [2]:
#proteomics data:
data = pd.read_csv("ecoli_data_heinemann2016.csv")  # E. coli
original_proteomics = []
for key, value in data.iterrows():
    protein = {"identifier":value[0], "measurement":value[1], "uncertainty":0}
    original_proteomics.append(protein)

print(original_proteomics[:10])
print(len(original_proteomics))

[{'identifier': 'P0A8V2', 'measurement': 5.03e-06, 'uncertainty': 0}, {'identifier': 'P0AFG8', 'measurement': 8.2e-06, 'uncertainty': 0}, {'identifier': 'P36683', 'measurement': 7.7e-06, 'uncertainty': 0}, {'identifier': 'P0A6F5', 'measurement': 2.47e-05, 'uncertainty': 0}, {'identifier': 'P0A6Y8', 'measurement': 1.74e-05, 'uncertainty': 0}, {'identifier': 'P05793', 'measurement': 2.64e-05, 'uncertainty': 0}, {'identifier': 'P23847', 'measurement': 1.07e-05, 'uncertainty': 0}, {'identifier': 'P25665', 'measurement': 2.49e-05, 'uncertainty': 0}, {'identifier': 'P0A8T7', 'measurement': 1.18e-06, 'uncertainty': 0}, {'identifier': 'P00968', 'measurement': 3.69e-06, 'uncertainty': 0}]
2015


## 2. Preliminar Testing

In [3]:
solution = ecModel.optimize()
print(solution)

<Solution 0.877 at 0x2973d38bc48>


In [4]:
def compute_measurements(proteomics, ecModel):
    measurements = pd.DataFrame()
    for protein in proteomics:
        protein_id = protein["identifier"]
        lb, ub = bounds(protein["measurement"], protein["uncertainty"])
        for met in ecModel.metabolites:
            if protein_id in met.id:
                new_row = pd.DataFrame(data={"met_id": met.id, "value": ub}, index=[protein_id])
                measurements = measurements.append(new_row)
    return measurements

proteomics = copy.deepcopy(original_proteomics)
measurements = compute_measurements(proteomics, ecModel)

limit_proteins(ecModel, measurements["value"])
solution = ecModel.optimize()
print(solution.objective_value)

0.0002632811513809728


In [5]:
biomass_reaction = "BIOMASS_Ec_iML1515_core_75p37M"
growth_rate = {"measurement":0.1, "uncertainty":0}
new_growth_rate, new_proteomics, warnings = flexibilize_proteomics(ecModel, biomass_reaction, growth_rate, proteomics, [])
print(new_growth_rate)
print(len(new_proteomics))
print(len(warnings))
solution = ecModel.optimize()
print(solution)

{'measurement': 0.1, 'uncertainty': 0}
1993
22
<Solution 0.100 at 0x2973cf36788>


## 3. Consistency Testing

In [6]:
warnings_df = pd.DataFrame()
for i in range(10):
    # flexibilize proteins:
    proteomics = copy.deepcopy(original_proteomics)
    new_growth_rate, new_proteomics, warnings = flexibilize_proteomics(ecModel, biomass_reaction, growth_rate, proteomics, [])
    print(len(warnings))
    
    #create datframe of warnings with shadow prices and add them to a cumulative one:
    warning_df = pd.DataFrame()
    for warning in warnings:
        pos = warnings.index(warning)
        new_row = pd.DataFrame({f"simulation_{str(i)}": f"{str(pos)} -> {'{0:.2f}'.format(warning[1])}"}, index=[warning[0]])
        warning_df = warning_df.append(new_row)
    warnings_df = pd.concat([warnings_df, warning_df], axis=1, sort=False)
    
    # compute growth:
    ecModel.reactions.get_by_id(biomass_reaction).bounds = (0.1, 0.1)
    flux_distribution, new_growth_rate = simulate(ecModel, biomass_reaction, "pfba", [], [])
    print(new_growth_rate)
    
warnings_df.to_csv("shadow_prices.csv")

20
0.1
22
0.1
22
0.1
23
0.1
21
0.1
22
0.1
21
0.1
21
0.1
21
0.1
21
0.1


We see that different iterations have from 20 to 23 proteins flexibilized. This could then lead to some simulations in which it only flexibilized 20 that later when simulated does not achieve growth (although the later was not seen in this notebook). A reason that could explain this is the variability of the shadow prices:

In [7]:
print(warnings_df)

            simulation_0       simulation_1       simulation_2  \
P15254      0 -> -401.77       0 -> -401.77       0 -> -401.77   
P76503     1 -> -2449.66                NaN                NaN   
P0A6C5     2 -> -9493.17      2 -> -9493.17      2 -> -9493.17   
P17854     3 -> -4873.07      3 -> -4873.15      3 -> -4873.15   
P0A7E3    4 -> -34825.02     5 -> -34825.02     5 -> -34825.02   
P00561    5 -> -34428.08     4 -> -34428.20     4 -> -34428.20   
P0A6X1   6 -> -151341.36    7 -> -151341.36    6 -> -151341.36   
P09053   7 -> -117556.34    6 -> -117556.34    7 -> -117556.34   
P77718   8 -> -193723.21    8 -> -193723.21    8 -> -193723.21   
P23845   9 -> -337484.17   11 -> -337489.62   10 -> -337489.62   
P09029   10 -> -47432.50     9 -> -47432.72    11 -> -47432.72   
P0A6I9  11 -> -256914.20   10 -> -256914.20   12 -> -256914.20   
P00934    12 -> -6524.22     14 -> -6524.22     13 -> -6524.22   
P36683    13 -> -8788.31                NaN                NaN   
P0A725  14

We distinguish 3 cases:

1. **Ties:** Some proteins tie in shadow price because they are isozymes (e.g., `P76503` and `P21151`) or form complexes (e.g. `P0AD65` and `P0AD68`), so the flexibilization order will choose only one (in isozyme cases) or switch the order (in complex cases). This is unavoidable and should not pose any risk.
2. **Additional proteins:** Some proteins appear at the end of the flexibilization in a few conditions, namely `P0AC75`, `P0AGG0` and `P22188`. They are most likely due to aproximation errors by the solver in cases the desired growth rate is close to the threshold. These cases should also not really be a problem, as if a given flexibilization kicks out MORE proteins than average, the simulations later should still work.
3. **Missing proteins:** Some proteins are not flexibilized in a few conditions, e.g. `P21177`. This could pose a problem, as if the flexibilization decides to keep it, but later the simulation cannot handle it, it could generate an unfeasible problem. The solution for this case should be to add wiggle to the flexibilization (PR [#156](https://github.com/DD-DeCaF/simulations/pull/156)).