# Modeling-derived indices for ecosystem assessment


Assessing the funcitonality of a microbial community is crucial to monitoring the status of marine ecosystems. Community Genome-scale Metabolic models (cGEMs) can help us interrogate the functional potential of a microbial community, as well as to predict functional changes upon perturbations to community structure. In the following, we will develop several indices to assess the functional potential of a microbial community, and to predict functional changes upon perturbations to community structure.


## Community genome-scale metabolic models

Community genome-scale metabolic models (cGEMs) are meta-models which integrate individual genome-scale metabolic models (GEMs) of species in a community. To this end, individual models shared the same external environment (with its compounds), providing a way to couple the individual model flux spaces. The following toy community model depicts three interacting species $A-C$ (i.e., GEMs) and three shared, external metabolites, $e_1-e_3$. Individual GEMs interact through boundaty reactions that export/import external metabolites into/from the shared environment. The community model is thus composed of the internal reactions of each individual model, the boundary reactions connecting the external metabolites to the internal reactions of each model, and the external compounds present in the environment. Note that an additional exchange reaction is added for each external compound, to account for other processes that may affect the concentration of external compounds, e.g., diffusion, degradation, etc.

<div style="text-align:center; width:100%"><img src="images/toy_community_model.png" style="max-width:30%;"></div>


## Construction of the community stoichiometric matrix

Let's split each individual stoichiometric matrix into the part containing its internal, $S^i_k$, and boundary reactions, $S^b_k$, connecting external to internal metabolites. Additionally, since the pool of external metabolites is shared among models, we will extract echange reactions from each individual stoichiometric matrix, and collect them into a single stoichiometric matrix, let's name it $E$. We can then represent $S^i$ as a block diagonal matrix, composed of the internal parts of each species' stoichiometric matrix:

$$
S^i = \begin{bmatrix}
S_1^i & 0 & \cdots & 0 \\
0 & S_2^i & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & S_k^i \\
\end{bmatrix}
$$

and construct a block matrix $M$ containing $S^i$ and $E$,i.e., all shared external reactions and metabolites:

$$
M = \begin{bmatrix}
S^i & 0 \\
0 & E \\
\end{bmatrix}.
$$

Finally, we can construct the full, community stoichiometric matrix $S_c$ like so

$$
S_c = \begin{bmatrix}
M & B \\
\end{bmatrix}.
$$


In [1]:
import cobra


cgem = cobra.io.read_sbml_model("../results/merged_community.xml")
cgem

Set parameter Username
Academic license - for non-commercial use only - expires 2023-11-05


0,1
Name,merged_community
Memory address,7f46d81943d0
Number of metabolites,11377
Number of reactions,17668
Number of genes,5363
Number of groups,0
Objective expression,1.0*community_growth - 1.0*community_growth_reverse_7473b
Compartments,"extracellular environment, cytoplasm, periplasm, cytoplasm, periplasm, cytoplasm, periplasm, cytoplasm, periplasm, cytoplasm, periplasm, cytoplasm, periplasm, cytoplasm, periplasm"


## Metabolic Robustness Index I (MRI-I)

This index measures the capacity of adaptation of a microbial community to perturbations in their composition, i.e. removal of one or several community members. Adaptability is related to functional redundancy among community members. The more redundant the community, the more robust to perturbations in their composition. Here, by functional redundancy, we mean that the community, $C = {M_1, \dots, M_k}$ form by a collection of interacting GEMs, is able to perform a set of fundamental metabolic functions that are required to sustain the ecosystem. For instance, nitrogen fixation, primary carbon fixation, sulfate reduction, etc. We can define the collection of fundamental ecosystem metabolic tasks as a set of reactions, $F = {r_1, r_2, \dots, r_n}$, which must be always present and potentially active in the community model. Additionally, to ensure that community members can still grow upon the perturbation, we will enforce that the biomass pseudo-reaciton of every member of the community is able to carry flux, i.e., the community is self-sufficient.

We could test robustness by randomly eliminating each member in the community and then testing if the remaining community is able to sustain flux through all reactions in $F$ and still be self-sufficient under a given environment. However, by eliminating random members one by one, we are missing possible higher-order interactions, that is, submodules of more than two members that operate together to perform a required function for community growth. To account for this, we can follow a different approach, in which we will count the number of subcommunities of $C$ that meet our two requirements, since this figure is related to the robustness of the community as a whole. At one end, we have a scenario where only the entire community is able to sustain all reactions in $F$ and be self-suficient. In this case, the community as a robustness of 0, since the loss of even a single member would break our requirements. At the other end, every single species is able two meet our requirements by itself, thus interactions are not required for survival, and the robustness is at its maximum.

Thus, we can define the robustness of the community as

$$
MRI_I = \frac{1}{2^{|C|}} |C_F|,
$$

where $C_F$ is the set of all subcommunities of $C$ that meet our requirements, and $|C|$ is the number of species in the community. This index is normalized to 1, and it is 0 when the community is not robust at all, and 1 when the community is maximally robust.

<div style="text-align:center; width:100%"><img src="images/subcommunities.png" style="max-width:30%;"></div>

## Metabolic Robustness Index II (MRI-II)

The problem above may be time-consuming since it requires solving an LP for every possible subset, which scales as $2^{|C|}$, which would become unfeasible for large community sizes. We can think of a simpler index that only requires solving a single MILP. Specifically, we can aim at finding the minimum subset size among all possible subsets conditional to self-suficiency. This index would then range between 1 and $|C|$, the size of the community. Index values closer to $|C|$ would indicate a larger number of metabolic interdependencies and, hence, more fragility upon perturbations in community composition. The index would be normalized by $|C|$:

$$
MRI_{II} = \frac{1}{|C|} \min_{A \subseteq C} |A|.
$$

To find the minimum subset size, we can use a MILP formulation. Let $y_i$ be a binary variable that indicates whether species $i$ is in the subset $A$ or not. Thus we have 

$$
\min_{A \subseteq C} |A| = \min \sum_{i \in C} y_i
$$,

and the following MILP formulation can be used:

$$
\begin{align}
\min \quad & \sum_{i \in C} y_i \\
\text{s.t.} \quad
& S^i v^i = 0 \\
& v^i_{min} \leq v^i \leq v^i_{max} \\
& v^i_{bio} \geq y_i\delta_i \\
& y_i E^i_{x}min \leq E^i_x \leq y_i E^i_{x}max \\
& y_i \in \{0,1\} \\
& \forall i \in C
\end{align}
$$

where $S$ is the stoichiometric matrix of the community model, $v$ is the vector of fluxes, $v_{min}$ and $v_{max}$ are the lower and upper bounds on fluxes, $v^i_{bio}$ is the biomass flux of species $i$, $\delta_i$ is a minimum biomass threshold for species $i$, and $E^i_x$ corresponds to the exchange flux of metabolite $x$ and species $i$. The the third constraint ensures that biomass fluxes are above a minimum threshold, thus guarantiing self-suficiency, while the fourth constraint ensures that exchange fluxes are zero for species that are not in the subset.

## Identifying keynote species

We could employ the two conditions defined above, i.e., performance of a minimal set of required community metabolic tasks (reactions), $F$, and self-sufficiency, i.e., positive growth of each community member, to find keynote species in the community. Keynote species would be those that are essential for the community to perform its required metabolic tasks, and maitain self-sufficiency. Thus we could remove each species, once at a time, and see if the community is still self-sufficient and able of performing all tasks in $F$. Alternative, we could rank species by the fraction of the tasks that gets lost upon removal of each species.

## Evaluating biodiversity alongside functional redundancy

See: https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-018-0425-4

## Ecological contextual information to identify BGCs

a) Fully identified BGCs with representation in GEM

b) Partially or not represented in GEM (only taxonomy + BGC structure)


How can cGEMs help?

* Delve into secondary metabolism capabilities of BGC-containing GEM -> hints at possible precursors 
* Analyses into BGC-GEM survival: sensitiviy analyses to removing community members