# Compatibility between Python and Matlab

Trying to make the g-thermo model compatible between different versions of COBRA ([cobrapy](https://github.com/opencobra/cobrapy) and [cobratoolbox](https://github.com/opencobra/cobratoolbox)).

Benjamín J. Sánchez, 2020-04-07

## 1. Sort annotations

The first noteworthy difference between the two different standards is that cobrapy will always respect the order of annotations in each reaction and stored them like that, whereas cobratoolbox will pre-sort them. This is something that could be improved in the `write_sbml_model` function of cobrapy (as the order doesn't matter anyways), so we will fix that in [this branch](https://github.com/BenjaSanchez/cobrapy/tree/refactor/sort-sbml) of my cobrapy's fork. Now we can just read/write the model to confirm that everything is from now on sorted:

In [1]:
# Before:
import cobra
model = cobra.io.read_sbml_model("../model/g-thermo.xml")
for annotation in model.reactions[0].annotation:
    print(annotation)
cobra.io.write_sbml_model(model,"../model/g-thermo.xml")

sbo
kegg.reaction
metanetx.reaction
rhea
ec-code


In [2]:
# After:
model = cobra.io.read_sbml_model("../model/g-thermo.xml")
for annotation in model.reactions[0].annotation:
    print(annotation)

sbo
ec-code
kegg.reaction
metanetx.reaction
rhea


Without counting SBO (which is a separate SBML object), we see that all annotations are now alphabetically ordered.

## 2. Redundant reaction fields

Let's take a look at the reaction notes:

In [3]:
model = cobra.io.read_sbml_model("../model/g-thermo.xml")
print(model.reactions[0].notes)

{'SUBSYSTEM': 'X', 'GENE_ASSOCIATION': '( RTMO00131 or RTMO05104 )', 'KEGG ID': 'R00004', 'ENZYME': '3.6.1.1', 'NAME': 'diphosphate phosphohydrolase; pyrophosphate phosphohydrolase', 'DEFINITION': 'Diphosphate + H2O ⇌ 2 Orthophosphate'}


Both the `SUBSYSTEM` and `GENE_ASSOCIATION` fields are already included in the model (as `groups` and `reaction.gene_reaction_rule` fields, respectively), so we can remove them (as cobratoolbox removes them by default):

In [4]:
for reaction in model.reactions:
    reaction.notes.pop("SUBSYSTEM", None)
    reaction.notes.pop("GENE_ASSOCIATION", None)
cobra.io.write_sbml_model(model,"../model/g-thermo.xml")

## 3. Missing SBO terms

When exporting, cobratoolbox complains that some SBO terms are missing for reactions, let's add those:

In [5]:
model = cobra.io.read_sbml_model("../model/g-thermo.xml")
for reaction in model.reactions:
    if "sbo" not in reaction.annotation:
        print(f"{reaction.id}: {reaction}")

CMTEPISO: CMTEPISO: cmtdepp_c <=> cthzp_c
DXYL5PTST: DXYL5PTST: dhgly_c + dxyl5p_c + h_c + tcscp_c <=> cmtdepp_c + 2.0 h2o_c + scpgg_c
LCTST: LCTST: cys__L_c + enzcys_c <=> ala__L_c + enzscys_c
ATPTAT: ATPTAT: atp_c + h_c + scpgg_c <=> ascp_c + ppi_c
LPROQOR: LPROQOR: pro__L_c + ubiquin_c --> pyr5c_c + qh2_c
BTNLIG: BTNLIG: atp_c + btn_c + h_c --> b5amp_c + ppi_c
BTN5AMPL: BTN5AMPL: b5amp_c + h2o_c <=> amp_c + btn_c + 2.0 h_c
MDH2: MDH2: mal__L_c + ubiquin_c --> oaa_c + qh2_c
OBO2OR: OBO2OR: 2obut_c + 2.0 h_c + o2_c + pi_c --> co2_c + h2o2_c + ppap_c
LALDPOR: LALDPOR: lald__L_c + 2.0 nad_c <=> mthgxl_c + 2.0 nadh_c
PHEAOR: PHEAOR: h2o_c + phe__D_c --> 2.0 h_c + nh4_c + phpyr_c
PYRLLOR: PYRLLOR: h_c + lpam_c + pyr_c --> adhlam_c + co2_c
MOX: MOX: mal__L_c + o2_c --> h2o2_c + oaa_c
TRPS3: TRPS3: 3ig3p_c --> g3p_c + indole_c
PGL: PGL: 6pgl_c + h2o_c --> 6pgc_c
MAHMPDC: MAHMPDC: 2mahmp_c + cthzp_c --> co2_c + h_c + ppi_c + thmmp_c
GSPMDS: GSPMDS: atp_c + gthrd_c + spmd_c --> adp_c + gtspmd

They will be split into 3 groups:
* SBO:0000185 (translocation reaction): All reactions ending in either `t` or `t2`.
* SBO:0000655 (transport reaction): All reactions ending in either `tpts` or `tabc`.
* SBO:0000176 (biochemical reaction): All the rest.

In [6]:
for reaction in model.reactions:
    if "sbo" not in reaction.annotation:
        if reaction.id.endswith("t") or reaction.id.endswith("t2"):
            reaction.annotation["sbo"] = "SBO:0000185"
        elif reaction.id.endswith("tpts") or reaction.id.endswith("tabc"):
            reaction.annotation["sbo"] = "SBO:0000655"
        else:
            reaction.annotation["sbo"] = "SBO:0000176"
cobra.io.write_sbml_model(model,"../model/g-thermo.xml")

## 4. Sort groups and group members

Another difference that can be improved on the cobrapy's side is sorting the groups & the members within each group, the former alphabetically and the latter respecting the reaction order. We'll modify that as well in the writing function of cobrapy and test if it worked:

In [1]:
# Before:
import cobra
model = cobra.io.read_sbml_model("../model/g-thermo.xml")
for group in model.groups:
    print(group)
print(" ")
for member in model.groups.get_by_id("X").members[0:10]:
    print(member)
cobra.io.write_sbml_model(model,"../model/g-thermo.xml")

X
00770
Central Carbon Metabolism
00290
00280
_C
00010
00280, 00290
00030
Pyrimidine metabolism
240
Fatty Acid Biosynthesis
Butanoate Metabolism
NGAM reaction
S_
 
HIBDm: 3hmp_c + nad_c <=> 2mop_c + h_c + nadh_c
SHSL1: cys__L_c + suchms_c --> cyst__L_c + h_c + succ_c
ACCOAC: accoa_c + atp_c + hco3_c --> adp_c + malcoa_c + pi_c
PC6AR: h_c + nadph_c + pre6a_c --> nadp_c + pre6b_c
ACOAD4f: dcacoa_c + fad_c + 3.0 h_c <=> dc2coa_c + fadh2_c
NTD3: dcmp_c + h2o_c --> dcyt_c + h_c + pi_c
UPPDC1: 4.0 co2_c + cpppg3_c <-- 4.0 h_c + uppg3_c
DTPCNOPPCT: dtpcnopp_c + ipdp_c --> decdp_c + h_c + ppi_c
GLYCLTDx: glyclt_c + nad_c <-- glx_c + h_c + nadh_c
CELLBHY: cellb_c + h2o_c --> 2.0 glc__D_c


In [2]:
# After:
model = cobra.io.read_sbml_model("../model/g-thermo.xml")
for group in model.groups:
    print(group)
print(" ")
for member in model.groups.get_by_id("X").members[0:10]:
    print(member)

00010
00030
00280
00280, 00290
00290
00770
240
Butanoate Metabolism
Central Carbon Metabolism
Fatty Acid Biosynthesis
NGAM reaction
Pyrimidine metabolism
S_
X
_C
 
IDPh: h2o_c + ppi_c --> h_c + 2.0 pi_c
CAT: 2.0 h2o2_c --> 2.0 h2o_c + o2_c
CCP: 2.0 focytCB_c + h2o2_c + 2.0 h_c <=> 2.0 focytCA_c + 2.0 h2o_c
HYDA: 2.0 fdxrd_c + 2.0 h_c <=> 2.0 fdxox_c + h2_c
MALHYDRO: h2o_c + malt_c --> 2.0 glc__D_c
PPBNGS: 2.0 5aop_c --> 2.0 h2o_c + h_c + ppbng_c
RBFSb: 2.0 dmlz_c --> 4r5au_c + ribflv_c
FERO: 4.0 fe2_c + 4.0 h_c + o2_c --> 4.0 fe3_c + 2.0 h2o_c
FOCYCTCOR: 4.0 focytCB_c + 4.0 h_c + o2_c --> 4.0 focytCA_c + 2.0 h2o_c
FOCYTCCOR: 4.0 focytcc553_c + o2_c --> 4.0 ficytcc553_c + 2.0 h2o_c
