# SPEED UP: test metabolite productions in memote

The objective is to speed up the metabolite production test of memote, which is currently slow. This test checks the number of metabolites that can be produced by the model.
**Current implementation**:
    1. Add a demand for every metabolite.
    2. Optimize the model.
    3. Check for flux in demand.
    4. Remove demand (when the model goes out of context).
There are 3 possible ways of speeding up the process:

1. **Change logic** of the test. Adding a new boundary for the problem is very slow. The first optimization would consist on the following steps:  
   * Add a variable just once (bounds in 0,1000);
   * for each metabolite:  
        ▪ get the linear coefficients of the constraint of each metabolite;  
        ▪ add a “-1” coefficient to the variable in the constraint;  
        ▪ solve;  
        ▪ restore constraint;  
        ▪ If not solution, add the metabolite to the returned output.
            
2. **Multiprocessing**: see FVA in cobrapy.
3. **Remove orphan metabolites** of the test (checked by a previous test). This would need refactoring and should be avoided if possible.

The tests will be performed on an small BiGG model.

In [1]:
![ ! -f "iAB_RBC_283.xml" ] && curl -L -O 'http://bigg.ucsd.edu/static/models/iAB_RBC_283.xml'
VERBOSITY = False
PROCESSES = 4

## 1. Change logic

The new logic will be compared to the old in this section.

In [2]:
from time import time

import cobra
import numpy as np
from copy import deepcopy
from cobra.exceptions import Infeasible
from tqdm import tqdm
import sys

Old functions in memote (`consistency.py`).

In [3]:
def run_fba(model, rxn_id, direction="max", single_value=True):
    model.objective = model.reactions.get_by_id(rxn_id)
    model.objective_direction = direction
    if single_value:
        return model.slim_optimize()
    else:
        try:
            solution = model.optimize()
        except Infeasible:
            return solution
            return np.nan


def open_exchanges(model):
    for rxn in model.exchanges:
        rxn.bounds = (-1000, 1000)


def test_old(model):
    """
    old_test in consistency.py
    """
    mets_not_produced = list()
    open_exchanges(model)
    pbar = tqdm(total=len(model.metabolites))
    for met in model.metabolites:
        with model:
            exch = model.add_boundary(
                met, type="irrex", reaction_id="IRREX", lb=0, ub=1000
            )
            solution = run_fba(model, exch.id)
            if np.isnan(solution) or solution < model.tolerance:
                mets_not_produced.append(met)
        pbar.update(1)
    return mets_not_produced

New logic.

In [4]:
def solve_boundary(metabolite, rxn, val=-1):
    """
    Solves the model when some reaction `rxn` has been added to the `metabolite`'s contraints.
    """
    metabolite.constraint.set_linear_coefficients({rxn: val})
    solution = metabolite.model.slim_optimize()
    # TODO: it seems like with context doesn't catch these changes, need to check
    # restore constraint
    metabolite.constraint.set_linear_coefficients({rxn: 0})
    return solution


def test_new(model):
    """
    New test
    """
    mets_not_produced = list()
    open_exchanges(model)
    irr = model.problem.Variable("irr", lb=0, ub=1000)
    with model:
        model.add_cons_vars(irr)
        # helper.run_fba() only accepts reactions in the model
        model.objective = irr
        pbar = tqdm(total=len(model.metabolites))
        for met in model.metabolites:
            solution = solve_boundary(met, irr)
            if np.isnan(solution) or solution < model.tolerance:
                mets_not_produced.append(met)
            pbar.update(1)
    return mets_not_produced

Comparison between the two.

In [7]:
model = cobra.io.read_sbml_model("iAB_RBC_283.xml")
model.solver = "glpk"
if VERBOSITY:
    model.solver.interface.Configuration.verbosity = model.solver.interface.Configuration(
        verbosity=3
    )

print(f"Number of metabolites in the model: {len(model.metabolites)}")
start = time()
old = set(test_old(model))
print(
    f"Identified metabolites by the old version: {len(old)} in {time() - start} s"
)
# old is 99 for this model in 12.536982536315918 s
start = time()
new = set(test_new(model))
print(
    f"Identified metabolites by the new version: {len(new)} in {time() - start} s"
)
print(old ^ new)

assert old == new

  2%|▏         | 8/342 [00:00<00:04, 77.18it/s]

Number of metabolites in the model: 342


100%|██████████| 342/342 [00:02<00:00, 122.88it/s]
  9%|▉         | 31/342 [00:00<00:01, 159.86it/s]

Identified metabolites by the old version: 22 in 2.7938179969787598 s


100%|██████████| 342/342 [00:02<00:00, 158.13it/s]

Identified metabolites by the new version: 22 in 2.174250602722168 s
set()





## 2. Multiprocessing

It will be implemented in a similar fashion as in [cobrapy](https://github.com/opencobra/cobrapy/blob/devel/cobra/flux_analysis/variability.py) in the FVA.

First, we need a model that it's shared for all the process.

In [8]:
import multiprocessing

In [9]:
def _init_worker(model, irr, val):
    """Initialize a global model object for multiprocessing.

    Parameters
    ----------
    model : cobra.Model
        The metabolic model under investigation.
    irr: optlang.Variable || cobra.Reaction
        the reaction to be added to the linear coefficients. It must be in the
        variables of the model.
    val: int
        value of the linear coefficient (1 for consumption, -1 for production)
    """
    global _model
    global _irr
    global _val
    _model = model
    _model.objective = irr
    _irr = irr
    _val = val


def _solve_metabolite_production(metabolite):
    """
    Solves the model when some reaction has been added to a `metabolite`'s
    contraints. The reaction and the model are passed as globals.

    Parameters
    ----------
    metabolite: cobra.Metabolite
        the reaction will be added to this metabolite as a linear coefficient

    Returns
    -------
    solution: float
        the value of the solution of the LP problem, *NaN* if infeasible.
    metabolite: cobra.Metabolite
        metabolite passed as argument (to use map as a filter)
    """
    global _model
    global _irr
    global _val
    constraint = _model.metabolites.get_by_id(metabolite.id).constraint
    constraint.set_linear_coefficients({_irr: _val})
    solution = _model.slim_optimize()
    constraint.set_linear_coefficients({_irr: 0})
    return solution, metabolite

In [10]:
def find_metabolites_not_produced_with_open_bounds(model, processes=None, prod = True):
    """
    Return metabolites that cannot be produced with open exchange reactions.

    A demand reaction is set as the objective. Then, it is sequentally added as
    a coefficient for every metabolite and the solution is inspected.

    A perfect model should be able to produce each and every metabolite when
    all medium components are available.

    Parameters
    ----------
    model : cobra.Model
        The metabolic model under investigation.
    processes: int
        Number of processes to be used (Default to `cobra.Configuration()`).
    prod: bool
        If False, it checks for consumption instead of production. Default True

    Returns
    -------
    list
        Those metabolites that could not be produced.

    """
    if processes is None:
        # For now, borrow the number of processes from cobra's configuration
        processes = Configuration().processes
    n_mets = len(model.metabolites)
    processes = min(processes, n_mets)
    # manage the value of the linear coefficient to be added to each metabolite
    val = -1 # production
    if not prod:
        val = 1 # consumption
    open_exchanges(model)
    irr = model.problem.Variable("irr", lb=0, ub=1000)

    if processes > 1:
        chunk_s = n_mets // processes
        pool = multiprocessing.Pool(
            processes,
            initializer=_init_worker,
            initargs=(model, irr, val),
        )
        # use map as filter
        mets_not_produced = [met for solution, met in pool.imap_unordered(
            _solve_metabolite_production, model.metabolites, chunksize=chunk_s
        ) if np.isnan(solution) or solution < model.tolerance]
        pool.close()
        pool.join()
    else:
        _init_worker(model, irr)
        # use map as filter
        mets_not_produced = [met for solution, met in map(
            _solve_metabolite_production, model.metabolites
        ) if np.isnan(solution) or solution < model.tolerance]
    return mets_not_produced

Finally, we can do the comparison.

In [13]:
if VERBOSITY:
    model.solver.interface.Configuration.verbosity = model.solver.interface.Configuration(
        verbosity=3
    )

print(f"Number of metabolites in the model: {len(model.metabolites)}")
start = time()
old = set([met.id for met in test_old(model)])
print(
    f"Identified metabolites by the old version: {len(old)} in {time() - start} s"
)
start = time()
new = set([met.id for met in find_metabolites_not_produced_with_open_bounds(model, PROCESSES)])
print(
    f"Identified metabolites by the new version: {len(new)} in {time() - start} s"
)
print(old ^ new)

assert old == new

  3%|▎         | 10/342 [00:00<00:03, 99.40it/s]

Number of metabolites in the model: 342


100%|██████████| 342/342 [00:02<00:00, 121.59it/s]


Identified metabolites by the old version: 22 in 2.826913356781006 s
Identified metabolites by the new version: 22 in 0.7438373565673828 s
set()


# SPEED UP: test metabolite consumptions in memote
The opposite test should also incorporate the new logic and the multiprocessing feature.  
First, we need the old test for consumption to compare and, of course, the new function (reusing the production test implementation).

In [14]:
def consumed_old(model):
    """
    Return metabolites that cannot be consumed with open boundary reactions.
    When all metabolites can be secreted, it should be possible for each and
    every metabolite to be consumed in some form.
    Parameters
    ----------
    model : cobra.Model
        The metabolic model under investigation.
    Returns
    -------
    list
        Those metabolites that could not be consumed.
    """
    mets_not_consumed = list()
    open_exchanges(model)
    for met in model.metabolites:
        with model:
            exch = model.add_boundary(
                met, type="irrex", reaction_id="IRREX", lb=-1000, ub=0)
            solution = run_fba(model, exch.id, direction="min")
            if np.isnan(solution) or abs(solution) < model.tolerance:
                mets_not_consumed.append(met)
    return mets_not_consumed

def consumed_new(model, processes=None):
    return find_metabolites_not_produced_with_open_bounds(model, processes=processes, prod=False)

In [16]:
if VERBOSITY:
    model.solver.interface.Configuration.verbosity = model.solver.interface.Configuration(
        verbosity=3
    )

print(f"Number of metabolites in the model: {len(model.metabolites)}")
start = time()
old = set(consumed_old(model))
print(
    f"Identified metabolites by the old version: {len(old)} in {time() - start} s"
)
old = set([met.id for met in old])
# old is 99 for this model in 12.536982536315918 s
start = time()
new = set(consumed_new(model, PROCESSES))
print(
    f"Identified metabolites by the new version: {len(new)} in {time() - start} s"
)
new = set([met.id for met in new])
print(old ^ new)

assert old == new

Number of metabolites in the model: 342
Identified metabolites by the old version: 20 in 2.8499839305877686 s
Identified metabolites by the new version: 20 in 0.8409318923950195 s
set()
