# Strain Optimization in MEWpy

This notebook exemplifies how MEWpy may be used in strain optimization tasks.
We will consider the as goal to increase the production of succinate in E. coli in anaerobic conditions.

In [None]:
%matplotlib inline
from cobra.io.sbml import read_sbml_model
model = read_sbml_model("models/ec/e_coli_core.xml.gz")

Analyse the model's default medium

In [None]:
from mewpy.simulation.environment import Environment
import pandas as pd

medium = Environment.from_model(model).get_compounds()
env = Environment.from_compounds(medium)
pd.DataFrame(env,index=['lb','ub']).T


Commercial solvers are better suited to address optimization tasks, such, as stain designs. As such, you might want to check the linear programming solvers available on your system:

In [None]:
from mewpy.simulation import solvers
solvers

## Optimization problem

Before running any optimization let us verify the production rate as well as the theoretic maximum rate of succinate in anaerobic conditions.
To ease the script reutilization we will define some reaction identifiers as constants:

In [None]:
BIOMASS = 'BIOMASS_Ecoli_core_w_GAM'
PRODUCT = 'EX_succ_e'
O2 = 'EX_o2_e'
GLC = 'EX_glc__D_e'

In [None]:
anaerobic = {O2: (0, 0)}

In [None]:
from mewpy.simulation import get_simulator
sim = get_simulator(model,envcond=anaerobic)
sim.FVA(reactions=[PRODUCT],format='df')

In [None]:
sim.simulate(method='pFBA').find([BIOMASS,PRODUCT])

The optimization problem requires the definition of one or more objective functions.
In the next example, we define as objective the maximization of the Biomass-Product coupled yield and the maximization of the targeted product. MEWpy makes available other optimization objectives, please refer to the documentation.

In [None]:
from mewpy.optimization.evaluation import BPCY, TargetFlux

objs = [ 
    BPCY(BIOMASS, PRODUCT),
    TargetFlux(PRODUCT)
]

We can now define the growth medium and select the type of modification.
We will begin by performing gene deletion (GKO) by selecting the appropriate problem instance, GKOProblem. 

In [None]:
from mewpy.problems import GKOProblem, GOUProblem

problem = GOUProblem(model, objs, envcond=anaerobic, candidate_max_size=6)

Other optimization methods/strategies may be used:
* Reaction deletion: *RKOProblem*
* Reaction under or over expression: *ROUProblem*
* Gene under or over expression: *GOUProblem*

A problem may include other parameters, such as the maximum number of genetic modification, a list of specitic targets, or non-targets (reactions/genes not to be modified). 

## Optimization algorithm

An optimization engine to solve the problem need to be instanciated. MEWpy uses Evolutionary Algorithms (EA) for this task.

EAs are algorithms that mimic the Darwinian evolutionary process, where a population of solutions evolves generation after generation. In the example we define a maximum of 50 generations.  

In [None]:
from mewpy.optimization import EA
ea = EA(problem, max_generations=50)

For informations on additional options you may use the `help` command

In [None]:
help(EA)

The EA may contemplate aditional parameters such as the identification of a specific algorithm (e.g. SPEA2, NSGAII, NSGAIII, GDE3, etc.), multiprocessing options, initial seeding, etc.

To start the optimization process invoke the run method:

In [None]:
solutions = ea.run()

We can now list the set of solutions:

In [None]:
df=ea.dataframe()
df

Or view the best solutions on the objective space, the so called Pareto front:

In [None]:
ea.plot()

We can save the solutions to file:

In [None]:
df.to_csv('solutions.csv')

or performe additional analysis on the solutions by retreiving the simulator used during the optimization.
 

## Working with solutions

Let us select the first solution:

In [None]:
solution = solutions[0]
solution

The solution is converted to metabolict constraints to be applied to the model. We can access these constrains using *solution.constraints*

In [None]:
solution.constraints

We may run phenotypic simulations with the solution using those constraints. Note that there is no need to redefine the medium as they are persistent in the simulator instance.

In [None]:
sim = problem.simulator
res=sim.simulate(constraints=solution.constraints,method='ROOM')
res

In [None]:
res.dataframe

In [None]:
res.find([PRODUCT,BIOMASS])

In [None]:
from mewpy.visualization.envelope import plot_flux_envelope

plot_flux_envelope(sim,BIOMASS,PRODUCT,constraints = solution.constraints)

## Exercice 1

Alter the notebook to run a gene over/under-expression (GOUProblem) optimization task. You may also try other optimization objectives (replacing or adding new objectives) such as `CandidateSize`, `ModificationType` or `BPCY_FVA`.

## Exercice 2

Alter the notebook to find possible genetic modifications for the increased production of ethanol (EX_etoh_e)

# Simulating user defined modifications

Genetic modifications at the gene, enzyme, transcription or regulatory levels need to be translated to the (pseudo) reaction level. This task is problem dependent and consequently requires the instantiation of a problem. If we do not intend run any optimization task, there is no need to define optimization objectives.

In [None]:
problem = GOUProblem(model,[], envcond=anaerobic)
sim = problem.simulator

A solution for a GOUProblem is a dictionary of genes' expression folds, e.g.:

In [None]:
solution = {'b3956': 8, 'b2914': 4}

You may get information on the genes using the simulator find functions

In [None]:
sim.find_genes('b3956|b2914')

The gene expressions are converted by the problem into reaction constraints:

In [None]:
constraints = problem.solution_to_constraints(solution)
constraints

that are used to run phenotypic simulations:

In [None]:
sim.simulate(constraints=constraints).find()

This functionality allows, for example, to simulate a same genetic modification on different methods or using other GEMs.