# Inspecting pyTFA package
The integrations of metabolomics data and thermodynamics information could be summarized with a Thermodynamics-Based Metabolic Flux Balance Analysis (TFA). This was nicely implemented in the pyTFA package.
```
Thermodynamics-based Flux Analysis, in Python. Paper : Pierre Salvy, Georgios Fengos, Meric Ataman, Thomas Pathier, Keng C Soh, Vassily Hatzimanikatis. "pyTFA and matTFA: a Python package and a Matlab toolbox for Thermodynamics-based Flux Analysis" Bioinformatics (2018), bty499, DOI: https://doi.org/10.1093/bioinformatics/bty499
```
The first step is to evaluate the perfomance to make sure the integration is possible. Five different operations has to be consider:

1. Translate the model from cobrapy to pyTFA.
2. Adding of user-provided thermodynamics data.
3. Adding of user-provided metabolomics data.
4. Compute group contributions. This is done in the package only for SEED IDs annotations.
5. Solve the resulting LP problem.

These operations will be tested for the [tutorial to reproduce the figure of the publication](https://github.com/EPFL-LCSB/pytfa/blob/master/tutorials/figure_paper.py).

In [1]:
import os
import errno
import pytfa

from pytfa.io import import_matlab_model, load_thermoDB

from pytfa.optim.variables import DeltaG,DeltaGstd,ThermoDisplacement
from pytfa.analysis import  variability_analysis,           \
                            apply_reaction_variability,     \
                            apply_generic_variability,       \
                            apply_directionality

from cobra.flux_analysis.variability import flux_variability_analysis

from math import log

Academic license - for non-commercial use only


In [2]:
def apply_concentration_bound(measure, tmodel):
    met, lb, ub = measure
    the_conc_var = tmodel.log_concentration.get_by_id(met)
    # Do not forget the variables in the model are logs !
    the_conc_var.ub = log(ub)
    the_conc_var.lb = log(lb)

CPLEX = 'optlang-cplex'
GUROBI = 'optlang-gurobi'
GLPK = 'optlang-glpk'

metabolomics_data = [
    ('atp_c', 1e-3, 1e-2),
#    ('atp_c', 1e-4, 1e-3),
    ('adp_c', 4e-4, 7e-4),
    ('atp_c', 2e-4, 3e-4)
]

In [3]:
# Load the cobra_model
# I haven't been able to curl these files from the repo, 
# I guess it's something related to git lfs
cobra_model = import_matlab_model('small_ecoli.mat')
cobra_model.solver = "glpk"

In [4]:
%%time
# Load reaction DB
thermo_data = load_thermoDB('thermo_data.thermodb') 

CPU times: user 334 ms, sys: 35.8 ms, total: 369 ms
Wall time: 367 ms


In [5]:
%%time
# Initialize the thermodynamic model
tmodel = pytfa.ThermoModel(thermo_data, cobra_model)
tmodel.solver = GLPK
tmodel.prepare()
tmodel.convert(add_displacement = False)
tmodel.print_info()

2019-11-20 19:05:30,647 - thermomodel_new - INFO - # Model initialized with units kcal/mol and temperature 298.15 K
2019-11-20 19:05:30,647 - thermomodel_new - INFO - # Model preparation starting...




2019-11-20 19:05:31,358 - thermomodel_new - INFO - # Model preparation done.
2019-11-20 19:05:31,359 - thermomodel_new - INFO - # Model conversion starting...




2019-11-20 19:05:33,075 - thermomodel_new - INFO - # Model conversion done.
2019-11-20 19:05:33,076 - thermomodel_new - INFO - # Updating cobra_model variables...
2019-11-20 19:05:33,091 - thermomodel_new - INFO - # cobra_model variables are up-to-date


                value
key                  
name              new
description       new
num constraints  3349
num variables    3482
num metabolites   304
num reactions     599
                           value
key                             
num metabolites(thermo)      300
num reactions(thermo)        418
pct metabolites(thermo)  98.6842
pct reactions(thermo)     69.783
CPU times: user 2.65 s, sys: 53.7 ms, total: 2.7 s
Wall time: 2.68 s


In [6]:
sol_nomet = tmodel.optimize()
sol_nomet

Unnamed: 0,fluxes,reduced_costs
DM_4CRSOL,0.000181,
DM_5DRIB,0.000187,
DM_AMOB,0.000002,
DM_MTHTHF,0.001087,
Ec_biomass_iJO1366_WT_53p95M,0.810997,
...,...,...
LMPD_250_trp-L_c,0.044795,
LMPD_251_tyr-L_c,0.108668,
LMPD_252_udcpdp_c,0.000045,
LMPD_253_utp_c,0.113622,


In [7]:
%%time
[apply_concentration_bound(measure, tmodel) for measure in metabolomics_data]

CPU times: user 31 µs, sys: 2 µs, total: 33 µs
Wall time: 37 µs


[None, None, None]

Negligible.

In [8]:
%%time
sol_ori = tmodel.optimize()
sol_ori

CPU times: user 10.6 s, sys: 28.3 ms, total: 10.6 s
Wall time: 10.5 s


In [9]:
print(f"Sum of fluxes: {sum(sol_nomet.fluxes)}")
print(f"Sum of fluxes with metabolomics: {sum(sol_ori.fluxes)}")

Sum of fluxes: 1141.4288279868517
Sum of fluxes with metabolomics: 1141.4288279868517


In [10]:
(sol_ori.fluxes != sol_nomet.fluxes).any()

False

Metabolomics hasn't changed the solution at all.

In [11]:
%%time
# FVA cobra
fva_fluxes = flux_variability_analysis(cobra_model)

CPU times: user 113 ms, sys: 153 ms, total: 265 ms
Wall time: 674 ms


In [12]:
# Perform variability analysis again
# gets stuck in my computer
#tva_fluxes_lc = variability_analysis(tmodel, kind='reactions')

### Results
Looks like the LP solving is the only demanding operation but it should be below 10 s running in the platform. The variability_analysis seems to be discarded for now: it perfroms a LP problem for every reaction in the model as an objective.

The metabolomics constraints in the tutorial doesn't change at all the behaviour of the optimization.

In my case, it takes longer than it should be because my computer freaks out with some infeasible solutions... Gurobi doesn't seem to work.

# Can we use pyTFA just with $\Delta G_r$ ?
Test it really fast using the small_ecoli model.  

__TL;DR__: the `ThermoModel` returns a similar solution but not the same as the previous one. Also, the changes before and after the metabolic constraints are appreciated in this case.

In [13]:
tmodel.reactions.ACALD.thermo

{'isTrans': False,
 'computed': True,
 'deltaGR': -8.173542457764427,
 'deltaGRerr': 1.5972194589348077}

In [14]:
reac_thermo = {reac.id: reac.thermo for reac in tmodel.reactions}
reac_thermo

{'DM_4CRSOL': {'isTrans': False,
  'computed': False,
  'deltaGR': 1000.0,
  'deltaGRerr': 1000.0},
 'DM_5DRIB': {'isTrans': False,
  'computed': False,
  'deltaGR': 1000.0,
  'deltaGRerr': 1000.0},
 'DM_AMOB': {'isTrans': False,
  'computed': False,
  'deltaGR': 1000.0,
  'deltaGRerr': 1000.0},
 'DM_MTHTHF': {'isTrans': False,
  'computed': False,
  'deltaGR': 1000.0,
  'deltaGRerr': 1000.0},
 'Ec_biomass_iJO1366_WT_53p95M': {'isTrans': False,
  'computed': False,
  'deltaGR': 1000.0,
  'deltaGRerr': 1000.0},
 'DM_5mtr_e': {'isTrans': False,
  'computed': False,
  'deltaGR': 1000.0,
  'deltaGRerr': 1000.0},
 'DM_ac_e': {'isTrans': False,
  'computed': False,
  'deltaGR': 1000.0,
  'deltaGRerr': 1000.0},
 'DM_acald_e': {'isTrans': False,
  'computed': False,
  'deltaGR': 1000.0,
  'deltaGRerr': 1000.0},
 'DM_acser_e': {'isTrans': False,
  'computed': False,
  'deltaGR': 1000.0,
  'deltaGRerr': 1000.0},
 'DM_akg_e': {'isTrans': False,
  'computed': False,
  'deltaGR': 1000.0,
  'deltaGR

In [15]:
def convert_reac_data(model, reac_data):
    """
    Convert the cobrapy `model` with pyTFA just by using 
    reaction data `reac_data`.
    """
    thermodata = {
        'units': 'kcal/mol',
        'cues' : None,
        'metabolites' : None
    }
    tmodel = pytfa.ThermoModel(thermodata, model = cobra_model)

    # similarly to .prepare()
    for met in tmodel.metabolites:
        met.thermo = pytfa.thermo.MetaboliteThermo(None, None, None)
    for reaction in tmodel.reactions:
        reaction.thermo = reac_thermo[reaction.id]
    tmodel.convert()

    return tmodel

In [16]:
t_testm = convert_reac_data(cobra_model, reac_thermo)

2019-11-20 19:05:56,462 - thermomodel_new - INFO - # Model initialized with units kcal/mol and temperature 298.15 K
2019-11-20 19:05:56,465 - thermomodel_new - INFO - # Model conversion starting...
2019-11-20 19:05:58,297 - thermomodel_new - INFO - # Model conversion done.
2019-11-20 19:05:58,298 - thermomodel_new - INFO - # Updating cobra_model variables...
2019-11-20 19:05:58,314 - thermomodel_new - INFO - # cobra_model variables are up-to-date


In [17]:
sol = t_testm.optimize()
sol

Unnamed: 0,fluxes,reduced_costs
DM_4CRSOL,0.000181,
DM_5DRIB,0.000187,
DM_AMOB,0.000002,
DM_MTHTHF,0.001087,
Ec_biomass_iJO1366_WT_53p95M,0.810956,
...,...,...
LMPD_250_trp-L_c,0.044792,
LMPD_251_tyr-L_c,0.108662,
LMPD_252_udcpdp_c,0.000045,
LMPD_253_utp_c,0.113616,


In [18]:
sum(sol.fluxes)

1190.6526819344149

In [19]:
[apply_concentration_bound(measure, t_testm) for measure in metabolomics_data]
sol_mets = t_testm.optimize()
sol_mets

Unnamed: 0,fluxes,reduced_costs
DM_4CRSOL,0.000181,
DM_5DRIB,0.000187,
DM_AMOB,0.000002,
DM_MTHTHF,0.001087,
Ec_biomass_iJO1366_WT_53p95M,0.810956,
...,...,...
LMPD_250_trp-L_c,0.044792,
LMPD_251_tyr-L_c,0.108662,
LMPD_252_udcpdp_c,0.000045,
LMPD_253_utp_c,0.113616,


In [20]:
print(f"Custom preparation\n------------------\nSum of fluxes: {sum(sol.fluxes)}")
print(f"Sum of fluxes with metabolomics: {sum(sol_mets.fluxes)}")
print(f"\nBuilt-in prepare method\n-----------------------\nSum of fluxes: {sum(sol_nomet.fluxes)}")
print(f"Sum of fluxes with metabolomics: {sum(sol_ori.fluxes)}")

Custom preparation
------------------
Sum of fluxes: 1190.6526819344149
Sum of fluxes with metabolomics: 1190.3929131283878

Built-in prepare method
-----------------------
Sum of fluxes: 1141.4288279868517
Sum of fluxes with metabolomics: 1141.4288279868517


In [21]:
(sol == sol_mets.fluxes).any()

False

Before and after the metabolomics integration, the solutions are different for every parameter and a bit lower. Furthermore, both solutions are different than those of the tmodel with the built-in `.prepare()` method. Still, the new solutions are more similar to the tmodel's than to the plain cobra model.

In [22]:
sol_plain = cobra_model.optimize()
sum(sol_plain.fluxes)

143.13836864239119