# Ensemble Modeling
This notebook will demonstrate how **MASSpy** can be used to generate an ensemble of models. 

In [1]:
# Disable logging output for this notebook.
try:
    import gurobipy
    gurobipy.setParam("OutputFlag", 0)
except ImportError:
    pass

import logging
logging.getLogger("").setLevel("CRITICAL")

# Configure roadrunner to allow for more output rows
import roadrunner
roadrunner.Config.setValue(
    roadrunner.Config.MAX_OUTPUT_ROWS, 1e6)

from cobra.sampling import sample

import mass.test

from mass.thermo import (
    ConcSolver, sample_concentrations)

from mass.thermo.ensemble import (
    Ensemble, generate_ensemble)

# Load the model
reference_model = mass.test.create_test_model("textbook", io="json")

Academic license - for non-commercial use only


## Generating Data for Ensembles

In addition to loading external sources of data for use (e.g. loading excel sheets), sampling can be used to get valid data for the generation of ensembles. As an example, a small set of samples will be generated for use in this notebook.

Utilizing [**COBRApy** flux sampling](https://cobrapy.readthedocs.io/en/latest/sampling.html#), the following flux samples will be used in generating the ensemble of models. All steady state flux values will be allowed to deviate by up to 25% of their defined baseline values.

In [2]:
for reaction in reference_model.reactions:
    flux = reaction.steady_state_flux
    reaction.bounds = sorted([
        round(flux * 0.75, 12), round(flux * 1.25, 12)])
    
flux_samples = sample(reference_model, n=5, seed=100)

Utilizing [**MASSpy** concentration sampling](./thermo_concentrations.ipynb#Concentration-Sampling), the following concentration samples will be used in generating the ensemble of models. All concentration values will be allowed to deviate by up to 75% of their defined baseline values.

In [3]:
conc_solver = ConcSolver(
    reference_model,
    excluded_metabolites=["h_c", "h2o_c"],
    equilibrium_reactions=["HBDPG", "HBO1", "HBO2", "HBO3", "HBO4", "ADK1"],
    constraint_buffer=1e-7)

conc_solver.setup_sampling_problem(
    conc_percent_deviation=0.75,
    Keq_percent_deviation=0)

conc_samples = sample_concentrations(conc_solver, n=4,
                                     seed=100)

Because there are 5 flux data sets and 4 concentration data sets being passed to the function, there will be $5 * 4 = 20$ models generated in total.

## The Ensemble Object

The `Ensemble` object inherits from the `Simulation` object described in [Dynamic Simulation](./dynamic_simulation.ipynb). Just like the `Simulation` object, the `Ensemble` object requires a reference `MassModel` to be initialized.

In [4]:
ensemble = Ensemble(reference_model=reference_model, verbose=True)

Successfully loaded MassModel 'RBC_PFK' into RoadRunner.


The `Ensemble` object has two methods for creating models from `pandas.DataFrame` objects.

* The `create_models_from_flux_data` method creates an ensemble of models from a `DataFrame` containing flux data, where rows correspond to samples and columns correspond to reaction identifiers.
* The `create_models_from_concentration_data` method creates an ensemble of models from a `DataFrame` containing concentration data, where rows correspond to samples and columns correspond to metabolite identifiers.

The methods can be used seperately or together to generate models. In this example, an ensemble of 20 models will be generated by utilizing both modle generation methods. 

First, the 5 flux samples will be used to generate 5 models with varying flux states from a single reference `MassModel`.

In [5]:
flux_models = ensemble.create_models_from_flux_data(
    models=reference_model, data=flux_samples)
flux_models

[<MassModel RBC_PFK_F0 at 0x12d8e6e90>,
 <MassModel RBC_PFK_F1 at 0x11a0dda10>,
 <MassModel RBC_PFK_F2 at 0x125689f50>,
 <MassModel RBC_PFK_F3 at 0x12589a210>,
 <MassModel RBC_PFK_F4 at 0x125a8cf10>]

Treating each of the 5 models with varying flux states as a reference model, the `list` of models can be passed to the `create_models_from_concentration_data` along with the 4 concentration samples to create 4 models with varying concentration states per flux state, giving a total of 20 models generated.

In [6]:
conc_models = ensemble.create_models_from_concentration_data(
    models=flux_models, data=conc_samples)
conc_models

[<MassModel RBC_PFK_F0_C0 at 0x12ddcdf10>,
 <MassModel RBC_PFK_F0_C1 at 0x12ddcda10>,
 <MassModel RBC_PFK_F0_C2 at 0x12dd6c0d0>,
 <MassModel RBC_PFK_F0_C3 at 0x12e19dd90>,
 <MassModel RBC_PFK_F1_C0 at 0x12e3a9cd0>,
 <MassModel RBC_PFK_F1_C1 at 0x12e5c0e50>,
 <MassModel RBC_PFK_F1_C2 at 0x12e7d3fd0>,
 <MassModel RBC_PFK_F1_C3 at 0x12e9c7c10>,
 <MassModel RBC_PFK_F2_C0 at 0x12ebfbb90>,
 <MassModel RBC_PFK_F2_C1 at 0x12e7aca90>,
 <MassModel RBC_PFK_F2_C2 at 0x12e1786d0>,
 <MassModel RBC_PFK_F2_C3 at 0x12f221c90>,
 <MassModel RBC_PFK_F3_C0 at 0x12effcb50>,
 <MassModel RBC_PFK_F3_C1 at 0x12f658b90>,
 <MassModel RBC_PFK_F3_C2 at 0x12f863990>,
 <MassModel RBC_PFK_F3_C3 at 0x12fb18a90>,
 <MassModel RBC_PFK_F4_C0 at 0x12f631ad0>,
 <MassModel RBC_PFK_F4_C1 at 0x12fd13e50>,
 <MassModel RBC_PFK_F4_C2 at 0x1300b0e10>,
 <MassModel RBC_PFK_F4_C3 at 0x1302d3e10>]

Generating the models does not always ensure that the models will be thermodynamically feasible. The `ensure_positive_percs` method can be used ensure all reactions passed to the `reactions` argument produce positive PERCs for each provided model. Those that produce all positive PERCs are seperated from those that produce at least one negative PERC, and the two lists are returned.

If the `update_values` argument is set to `True`, model PERC values will be updated for models that produce all positive PERCs.

In [7]:
reactions_to_check_percs = [
    r.id for r in reference_model.reactions
    if r not in reference_model.boundary
    and r not in reference_model.enzyme_modules.PFK.enzyme_module_reactions]

positive, negative = ensemble.ensure_positive_percs(
    models=conc_models, reactions=reactions_to_check_percs, 
    update_values=True)

print("Models with positive PERCs: {0}".format(len(positive)))
print("Models with negative PERCs: {0}".format(len(negative)))

Models with positive PERCs: 20
Models with negative PERCs: 0


To ensure that all generated models can reach a steady state, the `ensure_steady_state` method can be used. If the `update_values` argument is set to `True`, models that can reach a steady state  values will be updated with the steady state values.

In [8]:
feasible, infeasible = ensemble.ensure_steady_state(
    models=conc_models, strategy="simulate",
    update_values=True)

print("Reached steady state: {0}".format(len(feasible)))
print("No steady state reached: {0}".format(len(infeasible)))



Reached steady state: 15
No steady state reached: 5


Models that did not reach a steady state can be removed using the 
`remove_models` method.

In [9]:
ensemble.remove_models(infeasible)
print("Generated models remaining: {0}".format(
    len(ensemble.models) - 1))  # Subtract one to account for the reference

Generated models remaining: 15


The `perturbations` argument of the `ensure_steady_state` method can be used to ensure that models can reach a steady state with a given perturbation. 

In [10]:
feasible, infeasible = ensemble.ensure_steady_state(
    models=conc_models, strategy="simulate",
    perturbations={"kf_ATPM": "kf_ATPM * 1.5"},
    update_values=False)

print("Reached steady state: {0}".format(len(feasible)))
print("No steady state reached: {0}".format(len(infeasible)))



Reached steady state: 12
No steady state reached: 8


After removing any additional infeasible models, all of the models that remain in the ensemble are thermodynamically feasible and were able to reach a steady state, even with the given disturbance.

In [11]:
ensemble.remove_models(infeasible)
print("Generated models remaining: {0}".format(
    len(ensemble.models) - 1))  # Subtract one to account for the reference

conc_sol_list, flux_sol_list = ensemble.simulate(
    feasible, time=(0, 1000, 1e4 + 1),
    perturbations={"kf_ATPM": "kf_ATPM * 1.5"})

print("Number of simulation solutions produced: {0}".format(len(conc_sol_list)))

Generated models remaining: 12
Number of simulation solutions produced: 12


## Automated Ensemble Creation

Perhaps the easiest way of generating an ensemble of models is to utilize the `generate_ensemble` function. The `generate_ensemble` method has been built for performance and therefore, provides less control over the ensemble generation process and has an setup time associated with it. However, for the generation of large ensembles, this may be a more desirable method.

This function utilizes a single `MassModel` as a reference model, and then utilizes data provided as a `pandas.DataFrame` for the `flux_data` and `conc_data` arguments. 

* For `flux_data`, the columns are the reaction identifiers where each value represents the steady state flux of the reaction for the data set given by each row.
* For `conc_data`, the columns are the metabolite identifiers where each value represents the concentration/initial condition of the metabolite for the data set given by each row.

At least one of the above arguments must be passed to the function. After generating models, an `Ensemble` object containing the models is returned.

In [12]:
ensemble = generate_ensemble(
    reference_model=reference_model, 
    flux_data=flux_samples,
    conc_data=conc_samples)

Total models generated: 20


To ensure that the PERCs for certain reactions are positive, a `list` of reactions to check can be passed to the `ensure_positive_percs` argument. 

In [13]:
reactions_to_check_percs = [
    r.id for r in reference_model.reactions
    if r not in reference_model.boundary
    and r not in reference_model.enzyme_modules.PFK.enzyme_module_reactions]

ensemble = generate_ensemble(
    reference_model=reference_model, 
    flux_data=flux_samples,
    conc_data=conc_samples,
    ensure_positive_percs=reactions_to_check_percs)

Total models generated: 20
Feasible: 20
Infeasible, negative PERCs: 0


To ensure that all models can reach a steady state with their new values, a strategy for finding the steady state can be passed to the `steady_state_strategy` argument. 

In [14]:
ensemble = generate_ensemble(
    reference_model=reference_model, 
    flux_data=flux_samples,
    conc_data=conc_samples,
    ensure_positive_percs=reactions_to_check_percs,
    steady_state_strategy="simulate")

Total models generated: 20
Feasible: 15
Infeasible, negative PERCs: 0
Infeasible, no steady state found: 5


To ensure that all models can reach a steady state with their new values after a given perturbation, in addition to passing a value to the `steady_state_strategy` argument, one or more perturbations can be given to the `perturbations` argument. The `perturbations` argument will take a `list` of `dict` objects containing the perturbations as described in [Dynamic Simulation](./dynamic_simulation.ipynb). 

In [15]:
ensemble = generate_ensemble(
    reference_model=reference_model, 
    flux_data=flux_samples,
    conc_data=conc_samples,
    ensure_positive_percs=reactions_to_check_percs,
    steady_state_strategy="simulate",
    perturbations=[
        {"kf_ATPM": "kf_ATPM * 1.5"},
        {"kf_ATPM": "kf_ATPM * 0.85"}]
)

Total models generated: 20
Feasible: 12
Infeasible, negative PERCs: 0
Infeasible, no steady state found: 5
Infeasible, no steady state with pertubration 1: 3
Infeasible, no steady state with pertubration 2: 0


Note that perturbations are not applied all at once; each `dict` provided corresponds to a new attempt to find a steady state. For example, two dictionaries passed to the `perturbations` argument indicate that 3 steady state determinations will be performed, once for the model without any pertubrations and once for each `dict` passed.

If it is desirable to return the models that were not `infeasible`, the `return_infeasible` kwarg can be set to `True` to generate and return a second `Ensemble` object containing only infeasible model.

In [16]:
feasible, infeasible = generate_ensemble(
    reference_model=reference_model, 
    flux_data=flux_samples,
    conc_data=conc_samples,
    ensure_positive_percs=reactions_to_check_percs,
    steady_state_strategy="simulate",
    perturbations=[
        {"kf_ATPM": "kf_ATPM * 1.5"},
        {"kf_ATPM": "kf_ATPM * 0.85"}],
    return_infeasible=True)

for ensemble in [feasible, infeasible]:
    print("{0} contains {1} generated models".format(
        ensemble.id, len(ensemble.models) - 1)) # Subtract one to account for the reference

Total models generated: 20
Feasible: 12
Infeasible, negative PERCs: 0
Infeasible, no steady state found: 5
Infeasible, no steady state with pertubration 1: 3
Infeasible, no steady state with pertubration 2: 0
RBC_PFK_Ensemble_Feasible contains 12 generated models
RBC_PFK_Ensemble_Infeasible contains 8 generated models


In general, it is recommended to utilize the methods in the `Ensemble` object to generate small ensembles while experimenting with various settings, and then to utilize the `generate_ensemble` function to generate the larger ensemble.