
###### Original notebook developed by Vitor Pereira

###### Lucas Carvalho modified "Refining solutions within the solution space" and "GKOProblem".


# Strain Optimization in MEWpy

This notebook exemplifies how MEWpy may be used in strain optimization tasks.
We will consider the as goal to increase the production of succinate in E. coli in anaerobic conditions.

In [None]:
%matplotlib inline
from cobra.io.sbml import read_sbml_model
model = read_sbml_model("data/e_coli_core.xml.gz")

Analyse the model's default medium

In [None]:
from mewpy.simulation.environment import Environment
import pandas as pd

medium = Environment.from_model(model).get_compounds()
env = Environment.from_compounds(medium)
pd.DataFrame(env,index=['lb','ub']).T


Commercial solvers are better suited to address optimization tasks, such, as stain designs. As such, you might want to check the linear programming solvers available on your system:

In [None]:
#old versions
#from mewpy.simulation import solvers
#solvers

# depending of mewpy version, you could use
from mewpy.simulation import get_default_solver
get_default_solver()

from mewpy.simulation import set_default_solver
set_default_solver('glpk')

## Optimization problem

Before running any optimization let us verify the production rate as well as the theoretic maximum rate of succinate in anaerobic conditions.
To ease the script reutilization we will define some reaction identifiers as constants:

In [None]:
BIOMASS = 'BIOMASS_Ecoli_core_w_GAM'
PRODUCT = 'EX_succ_e'
O2 = 'EX_o2_e'
GLC = 'EX_glc__D_e'

In [None]:
anaerobic = {O2: (0, 0)}

In [None]:
from mewpy.simulation import get_simulator
sim = get_simulator(model,envcond=anaerobic)
sim.FVA(reactions=[PRODUCT],format='df')

In [None]:
sim.simulate(method='pFBA').find(BIOMASS)

In [None]:
res = sim.simulate(method='pFBA')
res.fluxes[PRODUCT]

The optimization problem requires the definition of one or more objective functions.
In the next example, we define as objective the maximization of the Biomass-Product coupled yield and the maximization of the targeted product. MEWpy makes available other optimization objectives, please refer to the documentation.

In [None]:
from mewpy.optimization.evaluation import BPCY, TargetFlux

objs = [ 
    BPCY(BIOMASS, PRODUCT),
    TargetFlux(PRODUCT)
]

We can now define the growth medium and select the type of modification.
We will begin by performing gene deletion (GKO) by selecting the appropriate problem instance, GKOProblem. 

In [None]:
from mewpy.problems import GKOProblem, GOUProblem

problem = GOUProblem(model, objs, envcond=anaerobic, candidate_max_size=6)

Other optimization methods/strategies may be used:
* Reaction deletion: *RKOProblem*
* Reaction under or over expression: *ROUProblem*
* Gene under or over expression: *GOUProblem*

A problem may include other parameters, such as the maximum number of genetic modification, a list of specitic targets, or non-targets (reactions/genes not to be modified). 

## Optimization algorithm

An optimization engine to solve the problem need to be instanciated. MEWpy uses Evolutionary Algorithms (EA) for this task.

EAs are algorithms that mimic the Darwinian evolutionary process, where a population of solutions evolves generation after generation. In the example we define a maximum of 50 generations.  

In [None]:
from mewpy.optimization import EA
ea = EA(problem, max_generations=50)

For informations on additional options you may use the `help` command

In [None]:
help(EA)

The EA may contemplate aditional parameters such as the identification of a specific algorithm (e.g. SPEA2, NSGAII, NSGAIII, GDE3, etc.), multiprocessing options, initial seeding, etc.

To start the optimization process invoke the run method:

In [None]:
solutions = ea.run()

We can now list the set of solutions:

In [None]:
df=ea.dataframe()
df

In [None]:
df["biomass"] = df["BPCY"]/df["TargetFlux"]
df

Or view the best solutions on the objective space, the so called Pareto front:

In [None]:
ea.plot()

We can save the solutions to file:

In [None]:
df.to_csv('solutions.csv')

or performe additional analysis on the solutions by retreiving the simulator used during the optimization.
 

## Working with solutions

Let us select the first solution:

In [None]:
solution = solutions[1]
solution

The solution is converted to metabolict constraints to be applied to the model. We can access these constrains using *solution.constraints*

In [None]:
solution.constraints

We may run phenotypic simulations with the solution using those constraints. Note that there is no need to redefine the medium as they are persistent in the simulator instance.

In [None]:
sim = problem.simulator
res=sim.simulate(constraints=solution.constraints,method='ROOM')
res

In [None]:
res.dataframe

In [None]:
res.find([PRODUCT,BIOMASS])

In [None]:
from mewpy.visualization.envelope import plot_flux_envelope

plot_flux_envelope(sim,BIOMASS,PRODUCT,constraints = solution.constraints)

## Refining solutions within the solution space

How to filter the solutions?

One approach for filtering solutions involves identifying solutions that satisfy the following criteria:

1. Minimum biomass requirement of 0.1. and/or
2. The presence of a minimum FVA value greater than 0 (FVA_min > 0) in succinate exchange reaction when used as constraints. and/or
3. The size of modifications (principally in GKOProblem)

In [None]:
final_solutions=pd.DataFrame(columns=["solution","pFBA_succ","Biomass"])
k=0

for sol in solutions:
    
    sim = problem.simulator
    res=sim.simulate(constraints=sol.constraints,method='pFBA')
    
    #pFBA fluxes and biomass 
    #remember: Biomass = BPCY/TargetFlux
    fluxes = res.fluxes
    biomass = fluxes[BIOMASS]
    succ_pFBA = fluxes[PRODUCT]

    if biomass >= 0.1:
        final_solutions.loc[k]=[sol,succ_pFBA,biomass]
        k=k+1


In [None]:
final_solutions

Also, we could find the most frequently genes in all solutions

In [None]:
# Initialize the final list
final_list = []

for sol in solutions:
    
    values = sol.constraints.keys()

    # Add the different lists of reactions to the final list
    final_list.extend(list(values))

# Calculate the frequency of each reaction in the final list
react_frequency = {}
for react in final_list:
    if react in react_frequency:
        react_frequency[react] += 1
    else:
        react_frequency[react] = 1

# Print the frequency of each reaction
for react, frequency in react_frequency.items():
    print(f"{react}: {frequency}")


## GKOProblem

Let's find the deletions to increase the succinate production.

We're going to use the BIOMASS and PRODUCT variables defined in GOUProblem.

In [None]:
BIOMASS = 'BIOMASS_Ecoli_core_w_GAM'
PRODUCT = 'EX_succ_e'
O2 = 'EX_o2_e'
GLC = 'EX_glc__D_e'

anaerobic = {O2: (0, 0)}

In [None]:
from mewpy.problems import GKOProblem

objs = [ 
    BPCY(BIOMASS, PRODUCT),
    TargetFlux(PRODUCT)
]

problem = GKOProblem(model, objs, envcond=anaerobic, candidate_max_size=6)


In [None]:
from mewpy.optimization import EA

ea = EA(problem, max_generations=20)

solutions = ea.run()


In [None]:
df=ea.dataframe()
print(df)


We may check the minimum biomass production

In [None]:
df["biomass"] = df["BPCY"]/df["TargetFlux"]
df

Save the solutions in a csv file

In [None]:
df.to_csv('solutions_GKOProblem.csv')

In GKOProblem, you may filter, using the same critereas. But, if you'd like to check the pFBA values to see the succ production and confirm with the EA solution, you may run:

In [None]:
sol = df["Modification"][0] #select one solution
constraints = problem.solution_to_constraints(sol)
constraints

In [None]:
sim = get_simulator(model,envcond=anaerobic,constraints=constraints)
res=sim.simulate(constraints=constraints,method='pFBA')
res.find([BIOMASS,PRODUCT])

Let's filter the solution based on the minimum biomass value and min FVA theorical production

In [None]:

#final dataframe
column_names = ["solution", "Size", "pFBA", "EX_etoh_pFBA", "Biomass", "FVA_min_etoh","FVA_max_etoh"]
final_solutions = pd.DataFrame(columns = column_names)
k=0

#all solutions
tam = len(df["Modification"])

for i in range(0,tam):
    
    sol = df["Modification"].iloc[i]
    size = df["Size"].iloc[i]
    constraints = problem.solution_to_constraints(sol)

    #run pFBA and get the fluxes
    res = problem.simulate(constraints = constraints, method="pFBA")
    pfba_sol = res.objective_value
    fluxes = res.fluxes
    etoh = fluxes[PRODUCT]
    biomass = fluxes[BIOMASS]    

    #FVA based on constraints
    sim = problem.simulator
    res_fva = sim.FVA(constraints=constraints, obj_frac=0.90)
    res_fva

    #fva min succ > 0
    fva_min_etoh = res_fva["EX_succ_e"][0]
    fva_max_etoh = res_fva["EX_succ_e"][1]
    
    if biomass > 0.1 and fva_min_etoh > 0:
        final_solutions.loc[k]=[sol,size,pfba_sol,etoh,biomass,fva_min_etoh,fva_max_etoh]
        k=k+1
    

In [None]:
final_solutions

## Exercice 1

Alter the notebook to run a gene over/under-expression (GOUProblem) optimization task. You may also try other optimization objectives (replacing or adding new objectives) such as `CandidateSize` or `BPCY_FVA`.

## Exercice 2

Alter the notebook to find possible genetic modifications for the increased production of ethanol (EX_etoh_e)

# Simulating user defined modifications

Genetic modifications at the gene, enzyme, transcription or regulatory levels need to be translated to the (pseudo) reaction level. This task is problem dependent and consequently requires the instantiation of a problem. If we do not intend run any optimization task, there is no need to define optimization objectives.

In [None]:
problem = GOUProblem(model,[], envcond=anaerobic)
sim = problem.simulator

A solution for a GOUProblem is a dictionary of genes' expression folds, e.g.:

In [None]:
solution = {'b3956': 8, 'b2914': 4}

You may get information on the genes using the simulator find functions

In [None]:
all_reactions = sim.get_gene_reactions()
gene_list = ['b3956','b2914']
found_reactions = [all_reactions[gene] for gene in gene_list if gene in all_reactions]

print(found_reactions)

The gene expressions are converted by the problem into reaction constraints:

In [None]:
constraints = problem.solution_to_constraints(solution)
constraints

that are used to run phenotypic simulations:

In [None]:
sim.simulate(constraints=constraints).find()

This functionality allows, for example, to simulate a same genetic modification on different methods or using other GEMs.