# MEWpy Community Modeling

Author: Vitor Pereira, inspired on the work by Daniel Machado. 

License: [CC BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)

-------

In this tutorial:

- You will learn how to perform flux balance analysis of microbial communities
using a model of the [central carbon metabolism of *E. coli*](https://journals.asm.org/doi/10.1128/ecosalplus.10.2.1).


## Install requirements 
To run this notebook we firstly need to install the required packages

In [1]:
! pip install -U -q mewpy cplex escher

[31mERROR: Could not find a version that satisfies the requirement cplex (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for cplex[0m[31m
[0m

Verify the instalation

In [2]:
import mewpy
mewpy.info()

MEWpy version: 0.1.29
Author: BiSBII CEB University of Minho
Contact: vpereira@ceb.uminho.pt 

Available LP solvers: gurobi glpk
Default LP solver: gurobi 

Available ODE solvers: scipy
Default ODE solver: scipy 

Optimization Problems: AbstractKOProblem AbstractOUProblem CofactorSwapProblem CommunityKOProblem ETFLGKOProblem ETFLGOUProblem GKOProblem GOUProblem GeckoKOProblem GeckoOUProblem KcatOptProblem KineticKOProblem KineticOUProblem MediumProblem OptORFProblem OptRamProblem RKOProblem ROUProblem 

Available EA engines: inspyred jmetal
Default EA engine: jmetal
Available EAs: GA NSGAII NSGAIII SA SPEA2 



IMPORTANT: The notebooks require a MEWpy version >= 0.1.26

### Run in Google colab

If you are running this notebook in Colab, you need to perform the following steps, otherwise skip.

In [3]:
%%bash
[[ ! -e /colabtools ]] && exit
! pip install -U -q PyDrive

In [4]:
if 'google.colab' in str(get_ipython()):
    from pydrive.auth import GoogleAuth
    from pydrive.drive import GoogleDrive
    from google.colab import auth
    from oauth2client.client import GoogleCredentials

    auth.authenticate_user()
    gauth = GoogleAuth()
    gauth.credentials = GoogleCredentials.get_application_default()
    drive = GoogleDrive(gauth)

    model_file = drive.CreateFile({'id':'1o0XthuEOs28UJ4XTa9SfFSFofazV-2nN'})
    model_file.GetContentFile('e_coli_core.xml.gz')

## Setting up a community

We will create a synthetic microbial consortium with two *E. coli* mutants growing in minimal medium. In one of the mutants we will knockout the glucose transporter and in the other we will knockout the ammonium transporter.

In [5]:
from cobra.io import read_sbml_model
from mewpy import get_simulator

model = read_sbml_model('models/ec/e_coli_core.xml.gz')
wildtype = get_simulator(model)
solution = wildtype.simulate()
print(solution)
solution.find('EX')

Set parameter Username
Academic license - for non-commercial use only - expires 2023-10-30
objective: 0.8739215069684301
Status: OPTIMAL
Method:SimulationMethod.FBA


Unnamed: 0_level_0,Flux rate
Reaction ID,Unnamed: 1_level_1
EX_co2_e,22.809833
EX_glc__D_e,-10.0
EX_h_e,17.530865
EX_h2o_e,29.175827
EX_nh4_e,-4.765319
EX_o2_e,-21.799493
EX_pi_e,-3.214895


Now we create our two mutants (`glc_ko` and `nh4_ko`):

In [6]:
glc_ko = wildtype.copy()
glc_ko.id = 'glc_ko'
glc_ko.set_reaction_bounds('GLCpts', 0, 0)

Read LP format model from file /var/folders/fw/kbs61_l15j587pjbwf3_y8780000gn/T/tmpmtknqeks.lp
Reading time = 0.00 seconds
: 72 rows, 190 columns, 720 nonzeros


In [7]:
nh4_ko = wildtype.copy()
nh4_ko.id = 'nh4_ko'
nh4_ko.set_reaction_bounds('NH4t', 0, 0)

Read LP format model from file /var/folders/fw/kbs61_l15j587pjbwf3_y8780000gn/T/tmppn2_q41w.lp
Reading time = 0.00 seconds
: 72 rows, 190 columns, 720 nonzeros


## Comparing models

Community models require that metabolites have the same identifiers accros all models. MEWpy offers some functions tho that end, computing the metabolites, reactions and uptakes overlaps between a list models.

In [8]:
from mewpy.cobra.com import *
mets, rxns, over = jaccard_similarity_matrices([glc_ko, nh4_ko])

In [9]:
mets

Unnamed: 0,glc_ko,nh4_ko
glc_ko,1.0,1.0
nh4_ko,1.0,1.0


In [10]:
rxns

Unnamed: 0,glc_ko,nh4_ko
glc_ko,1.0,0.978947
nh4_ko,0.978947,1.0


In [11]:
over

Unnamed: 0,glc_ko,nh4_ko
glc_ko,1.0,1.0
nh4_ko,1.0,1.0


## Building communities

**MEWpy** has some basic functionality for working with microbial communities, one is the `CommunityModel` class to create microbial communities from a list of models of individual species: 

In [12]:
from mewpy.model import CommunityModel
community = CommunityModel([glc_ko, nh4_ko],flavor='cobra')

In [13]:
sim = community.get_community_model()

This community model ignores the environmental conditions that were specified in the original models (since these could be very different). 

To make our life easier, we will extract the nutrient composition specified in the wild-type model to use later.

In [14]:
from mewpy.simulation import Environment
M9 = Environment.from_model(wildtype)
M9

EX_ac_e	0.0	1000.0
EX_acald_e	0.0	1000.0
EX_akg_e	0.0	1000.0
EX_co2_e	-1000.0	1000.0
EX_etoh_e	0.0	1000.0
EX_for_e	0.0	1000.0
EX_fru_e	0.0	1000.0
EX_fum_e	0.0	1000.0
EX_glc__D_e	-10.0	1000.0
EX_gln__L_e	0.0	1000.0
EX_glu__L_e	0.0	1000.0
EX_h_e	-1000.0	1000.0
EX_h2o_e	-1000.0	1000.0
EX_lac__D_e	0.0	1000.0
EX_mal__L_e	0.0	1000.0
EX_nh4_e	-1000.0	1000.0
EX_o2_e	-1000.0	1000.0
EX_pi_e	-1000.0	1000.0
EX_pyr_e	0.0	1000.0
EX_succ_e	0.0	1000.0

## Simulation using FBA

A very simple way to simulate a microbial community is to merge the individual models into a single model that mimics a "super organism", where each microbe lives inside its own compartment, and run a (conventional) FBA simulation for this *super organism*.

In [15]:
solution = sim.simulate(constraints=M9)

print(solution)
solution.find('EX')

objective: 0.8311955501858121
Status: OPTIMAL
Method:SimulationMethod.FBA


Unnamed: 0_level_0,Flux rate
Reaction ID,Unnamed: 1_level_1
EX_glc__D_e,-10.0
EX_h2o_e,30.682819
EX_h_e,16.673783
EX_nh4_e,-4.532343
EX_o2_e,-23.667113
EX_pi_e,-3.057719
EX_co2_e,24.628058


We can see that the model predicts a growth rate (total biomass per hour) similar to the wild-type, with an efficient consumption of glucose and ammonia that results in respiratory metabolism.

But what is each organism doing, and are both organisms actually growing at the same rate?

Let's print the biomass flux for each organism:

In [16]:
solution.find('BIOMASS', sort=True,show_nulls=True)

Unnamed: 0_level_0,Flux rate
Reaction ID,Unnamed: 1_level_1
BIOMASS_Ecoli_core_w_GAM_glc_ko,0.0
BIOMASS_Ecoli_core_w_GAM_nh4_ko,0.831196


and all non null fluxes by organism:

In [17]:
sim.find_metabolites()

Unnamed: 0_level_0,name,compartment,formula
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
community_biomass,Total community biomass,e,
glc__D_e,D-Glucose,e,C6H12O6
gln__L_c_glc_ko,L-Glutamine,c_glc_ko,C5H10N2O3
gln__L_e,L-Glutamine,e,C5H10N2O3
glu__L_c_glc_ko,L-Glutamate,c_glc_ko,C5H8NO4
...,...,...,...
fdp_c_nh4_ko,"D-Fructose 1,6-bisphosphate",c_nh4_ko,C6H10O12P2
for_c_nh4_ko,Formate,c_nh4_ko,CH1O2
fum_c_nh4_ko,Fumarate,c_nh4_ko,C4H2O4
g3p_c_nh4_ko,Glyceraldehyde 3-phosphate,c_nh4_ko,C3H5O6P


In [19]:
solution.find('nh4_ko')

Unnamed: 0_level_0,Flux rate
Reaction ID,Unnamed: 1_level_1
ACKr_nh4_ko,-1.810642
ACONTa_nh4_ko,2.354571
ACONTb_nh4_ko,2.354571
ACt2r_nh4_ko,-1.810642
AKGDH_nh4_ko,1.457794
AKGt2r_nh4_ko,-4.532343
ATPM_nh4_ko,8.39
ATPS4r_nh4_ko,44.074005
BIOMASS_Ecoli_core_w_GAM_nh4_ko,0.831196
CO2t_nh4_ko,-12.84167


Actually it seems that only one of the organisms is growing while the other has an active metabolism (it exchanges metabolites with the environment and with the other organism) performing the role of a bioconverter, but none of the flux is used for growth. 

> Do you think this would be a stable consortium ?

## Community Simulation with SteadyCom

**SteadyCom** by [Chan, et al (2017)](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005539) is a recent community simulation method that takes into account the fact that to reach a stable composition the organisms need to grow at the same *specific growth rate* (1/h), which means that the *absolute growth rate* (gDW/h) of each organism is proportional to its *abundance* at steady-state (gDW).

Let's simulate the same community using SteadyCom:

In [20]:
solution = SteadyCom(community, constraints=M9)

Set parameter FeasibilityTol to value 1e-09
Set parameter OptimalityTol to value 1e-09


In this case the solution object shows the overall community growth rate and the relative abundance of each species:

In [21]:
solution

Community growth: 0.873046875
glc_ko	0.019267933674315795
nh4_ko	0.9807320663256842

The `solution` object for community simulations implements a few additional features, such as enumerating all the cross-feeding interactions:

In [22]:
solution.cross_feeding(as_df=True).dropna().sort_values('rate', ascending=False)

Unnamed: 0,donor,receiver,compound,rate
4,nh4_ko,glc_ko,h_e,24.191611
14,glc_ko,nh4_ko,acald_e,19.267934
13,nh4_ko,glc_ko,ac_e,15.019305
6,nh4_ko,glc_ko,lac__D_e,7.13333
15,nh4_ko,glc_ko,akg_e,4.717029
1,glc_ko,nh4_ko,glu__L_e,4.668824
12,nh4_ko,glc_ko,pyr_e,2.280013


We can plot the fluxes of each mutant in a map to help with interpretation of the results:

In [23]:
from mewpy.visualization.escher import build_escher
if 'google.colab' in str(get_ipython()):
    from google.colab import output
    output.enable_custom_widget_manager()

build_escher(fluxes=solution.internal['glc_ko'])

Downloading Map from https://escher.github.io/1-0-0/6/maps/Escherichia%20coli/e_coli_core.Core%20metabolism.json


Builder(reaction_data={'ACALD': -19.267933674315827, 'ACALDt': -19.267933674315827, 'ACKr': 15.019304841008928…

In [24]:
build_escher(fluxes=solution.internal['nh4_ko'])

Downloading Map from https://escher.github.io/1-0-0/6/maps/Escherichia%20coli/e_coli_core.Core%20metabolism.json


Builder(reaction_data={'ACALD': 19.267933674315827, 'ACALDt': 19.267933674315827, 'ACKr': -15.019304841008928,…

## Explore alternative solutions

Unfortunately, one limitation of **SteadyCom**, which is exemplified by [Chan, et al (2017)](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005539) in Figure 3 (reproduced below), is the variability in the solution space when the community is not growing at the maximum (theoretical) growth rate.

> Would you expect a synthetic community to grow at its maximum growth rate?

**MEWpy** implements a variability analysis function for the SteadyCom solution space, let's see what happens if the community is growing at 90% of the theoretical maximum:

In [25]:
from mewpy.cobra.com import SteadyComVA
variability = SteadyComVA(community, obj_frac=0.9, constraints=M9)

print('Strain\tMin\tMax')
for strain, (lower, upper) in variability.items():
    print(f'{strain}\t{lower:.1%}\t{upper:.1%}')

Set parameter FeasibilityTol to value 1e-09
Set parameter OptimalityTol to value 1e-09
Strain	Min	Max
glc_ko	0.4%	98.3%
nh4_ko	1.7%	99.6%


As you can see, there is a really large variability in this solution space. This means that we know in theory the two mutants **can** cooperate and survive in minimal media, but there is still a lot of uncertainty with regard to **how** they will achieve a stable consortium.

> How do you think we can reduce this uncertainty?

Firstly, lets set the environment conditions:

In [26]:
sim.set_environmental_conditions(M9)

We may now impose constraints on each organism growth, such as stating that each organism need to grow at least 0.1/h

In [27]:
constraints={community.organisms_biomass['nh4_ko']:(0.1,1000), 
             community.organisms_biomass['glc_ko']:(0.1,1000)}
solution = sim.simulate(constraints=constraints)
solution

objective: 0.828309078247319
Status: OPTIMAL
Method:SimulationMethod.FBA

In [28]:
solution.find('BIOMASS')

Unnamed: 0_level_0,Flux rate
Reaction ID,Unnamed: 1_level_1
BIOMASS_Ecoli_core_w_GAM_glc_ko,0.1
BIOMASS_Ecoli_core_w_GAM_nh4_ko,0.728309


Alternatively, we might choose to impose relative growth rates for each of the organisms:

In [29]:
community = CommunityModel([glc_ko, nh4_ko],
                           add_compartments=True,
                           merge_biomasses=True,
                           flavor='cobra')

In [30]:
sim = community.get_community_model()
sim.set_environmental_conditions(M9)

In [31]:
solution = sim.simulate()
print(solution)
solution.find('BIOMASS')

objective: 0.40757209363986213
Status: OPTIMAL
Method:SimulationMethod.FBA


Unnamed: 0_level_0,Flux rate
Reaction ID,Unnamed: 1_level_1
BIOMASS_Ecoli_core_w_GAM_glc_ko,0.407572
BIOMASS_Ecoli_core_w_GAM_nh4_ko,0.407572


In [32]:
sim.find(community.biomass)

Unnamed: 0_level_0,name,lb,ub,stoichiometry,gpr,annotations
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
community_growth,Community growth rate,0,inf,"{'Biomass_glc_ko': -1, 'Biomass_nh4_ko': -1}",,{}


The relative abundance (relative growth rates) are by default equal. We may though change these ratios:  

In [33]:
community.set_abundance({'glc_ko':1,'nh4_ko':2.5})
sim.simulate().find('BIOMASS')

Unnamed: 0_level_0,Flux rate
Reaction ID,Unnamed: 1_level_1
BIOMASS_Ecoli_core_w_GAM_glc_ko,0.235022
BIOMASS_Ecoli_core_w_GAM_nh4_ko,0.587554


## SMETANA

**SMETANA** implements several algorithms to analyse cross-feeding interactions in microbial communities. These have been describe in [Zelezniak et al, PNAS (2015)](https://www.pnas.org/doi/abs/10.1073/pnas.1421834112). Please read the paper for a more detailed explanation.

SCS (species coupling score): measures the dependency of one species in the presence of the others to survive

In [34]:
sc_score(community)

Set parameter FeasibilityTol to value 1e-09
Set parameter OptimalityTol to value 1e-09


                     value
attribute                 
glc_ko     {'nh4_ko': 1.0}
nh4_ko     {'glc_ko': 1.0}

MUS (metabolite uptake score): measures how frequently a species needs to uptake a metabolite to survive

In [35]:
MUS = mu_score(community)
MUS

Set parameter FeasibilityTol to value 1e-09
Set parameter OptimalityTol to value 1e-09


                                                       value
attribute                                                   
glc_ko     {'ac_e': 0.02, 'acald_e': 0.35, 'akg_e': 0.23,...
nh4_ko     {'ac_e': 0.0, 'acald_e': 0.0, 'akg_e': 0.0, 'c...

In [36]:
MUS.glc_ko

{'ac_e': 0.02,
 'acald_e': 0.35,
 'akg_e': 0.23,
 'co2_e': 0.0,
 'etoh_e': 0.17,
 'for_e': 0.0,
 'fru_e': 0.0,
 'fum_e': 0.0,
 'glc__D_e': 0.0,
 'gln__L_e': 0.0,
 'glu__L_e': 0.0,
 'h_e': 0.05,
 'h2o_e': 0.07,
 'lac__D_e': 0.24,
 'mal__L_e': 0.0,
 'nh4_e': 1.0,
 'o2_e': 0.93,
 'pi_e': 1.0,
 'pyr_e': 0.25,
 'succ_e': 0.08}

In [37]:
MUS.nh4_ko

{'ac_e': 0.0,
 'acald_e': 0.0,
 'akg_e': 0.0,
 'co2_e': 0.0,
 'etoh_e': 0.0,
 'for_e': 0.0,
 'fru_e': 0.0,
 'fum_e': 0.0,
 'glc__D_e': 1.0,
 'gln__L_e': 0.0,
 'glu__L_e': 1.0,
 'h_e': 0.0,
 'h2o_e': 0.0,
 'lac__D_e': 0.0,
 'mal__L_e': 0.0,
 'nh4_e': 0.0,
 'o2_e': 0.0,
 'pi_e': 1.0,
 'pyr_e': 0.0,
 'succ_e': 0.0}

MPS (metabolite production score): measures the ability of a species to produce a metabolite

In [38]:
MPS = mp_score(community,environment=M9)
MPS

Set parameter FeasibilityTol to value 1e-09
Set parameter OptimalityTol to value 1e-09


                                                       value
attribute                                                   
glc_ko     {'etoh_e': 1, 'for_e': 1, 'h2o_e': 1, 'pyr_e':...
nh4_ko     {'etoh_e': 1, 'for_e': 1, 'h2o_e': 1, 'pyr_e':...

MRO (metabolic resource overlap): calculates how much the species compete for the same metabolites.

In [39]:
score, MRO = mro_score(community,environment=M9)
print(score)
MRO

Set parameter FeasibilityTol to value 1e-09
Set parameter OptimalityTol to value 1e-09
Set parameter FeasibilityTol to value 1e-09
Set parameter OptimalityTol to value 1e-09
Set parameter FeasibilityTol to value 1e-09
Set parameter OptimalityTol to value 1e-09
1.0


                                                              value
attribute                                                          
community_medium                                     {pi, gln, fru}
individual_media  {'glc_ko': {'pi', 'gln', 'fru'}, 'nh4_ko': {'p...

In [40]:
MRO.individual_media.glc_ko

{'fru', 'gln', 'pi'}

In [41]:
MRO.individual_media.nh4_ko

{'fru', 'gln', 'pi'}