#  Using the automatic cross-reference expansion

CobraMod has functions that can add missing cross-references to the metabolites and reactions of a model. This is possible as long as the corresponding object contains at least one valid identifier that can be resolved using MetanetX <cite data-cite="Moretti2020,Moretti2015,Ganter2013,Bernard2012">MetaNetX Ref</cite>.

These identifiers must either be used as the id of the object or be included in the annotations. For all of the identifiers, available cross-references are gathered. The missing ones are then added to the given object.

Entire models, groups, reactions, or metabolites can be extended this way. The extension functions of CobraMod use this function internally before adding objects to the model to ensure that the new objects have all available cross-references.

### The internal procedure

Internally, the cross-references are retrieved using MetaNetX. The EC codes are produced using the 'reac_prop' file provided by MetaNetX. This file contains all reactions listed in MetanetX, the corresponding MetaNetX IDs, and their EC numbers. This file is downloaded and cached once per Python instance. 

MetaNetX.org contains cross-references for the following databases <cite data-cite="Moretti2020,Moretti2015,Ganter2013,Bernard2012">MetaNetX Ref</cite>:

| Database | Reactions | Reactions |
| --- | --- | --- |
| BIGG | x | x |
| ChEBI | x | |
| enviPath | x | |
| HMDB | x |  |
| KEGG | x | x |
| LipidMaps | x | |
| MetaCyc | x | x |
| Reactome | x | |
| Rhea | | x |
| SABIO-RK | x | x |
| SwissLipids | x | |
| The SEED | x | x |

It is accordingly necessary that at least one valid ID from one of these databases is available in the CobraPy object. In addition, the InChI, InChIKey, and EC numbers are also added using MetaNetX. 

The 'pubchem.compound' references are retrieved directly from PubChem <cite data-cite="Kim2020">Kim2020 Ref</cite>. For this, an InChIKey must exist in the object as an annotation.

Finally, the Brenda <cite data-cite="Chang2020">Brenda Ref</cite> identifiers are added. These are derived from the EC numbers associated with the respective object. 

### Automatische hinzufügung von fehlenden CrossReferenzen

add_pathway
add_reaktion

## Extending the annotations of different CobraPy objects.

In the following, we first extend a metabolite and then a reaction with 'add_crossreferences'. Then the annotations of the default model provided by Memote are expanded.

### Metabolite

First, we generate a metabolite that we want to expand and look at its existing cross-references.

In [12]:
from cobramod.test import textbook
from cobramod.core.crossreferences import add_crossreferences

directory = "/home/jan/arbeit/memote_test/memote-model-repository_forXRef"
%cd $directory

model = textbook.copy()
metabolite = textbook.metabolites[0]
metabolite

/home/jan/arbeit/memote_test/memote-model-repository_forXRef


0,1
Metabolite identifier,13dpg_c
Name,3-Phospho-D-glyceroyl phosphate
Memory address,0x07f1b40fe57d0
Formula,C3H4O10P2
Compartment,c
In 2 reaction(s),"PGK, GAPD"


In [2]:
metabolite.annotation

{'bigg.metabolite': '13dpg',
 'biocyc': 'DPG',
 'chebi': ['CHEBI:16001',
  'CHEBI:1658',
  'CHEBI:20189',
  'CHEBI:57604',
  'CHEBI:11881'],
 'hmdb': 'HMDB01270',
 'kegg.compound': 'C00236',
 'pubchem.substance': '3535',
 'reactome': 'REACT_29800',
 'seed.compound': 'cpd00203',
 'unipathway.compound': 'UPC00236'}

Now we execute the function 'add_crossreferences' and display the cross-references again. The argument 'consider_sub_elements' has no influence on the function at this point because unlike reactions and whole models, metabolites do not consist of further reactions or metabolites. The 'include_metanetx_specific_ec' argument specifies whether MetaNetX specific EC numbers should be included or not.

In [3]:
add_crossreferences(metabolite,
                    consider_sub_elements: bool = True,
                    include_metanetx_specific_ec: bool = False,)

In [4]:
metabolite.annotation

{'bigg.metabolite': '13dpg',
 'biocyc': 'DPG',
 'chebi': ['CHEBI:89363',
  'CHEBI:57604',
  'CHEBI:1658',
  'CHEBI:11881',
  'CHEBI:16001',
  'CHEBI:20189'],
 'hmdb': ['HMDB0062758', 'HMDB01270', 'HMDB62758', 'HMDB0001270'],
 'kegg.compound': 'C00236',
 'pubchem.substance': '3535',
 'reactome': ['REACT_29800', 'R-ALL-29800'],
 'seed.compound': 'cpd00203',
 'unipathway.compound': 'UPC00236',
 'reactomem': 'R-ALL-29800',
 'sabiork.compound': ['29', '21215'],
 'biggm': ['M_13dpg', '13dpg'],
 'sabiorkm': ['29', '21215'],
 'keggc': ['M_C00236', 'C00236'],
 'metacyc.compound': 'DPG',
 'seedm': ['M_cpd00203', 'cpd00203'],
 'inchikey': ['LJQLQCAXBUHEAZ-UHFFFAOYSA-N', 'LJQLQCAXBUHEAZ-UWTATZPHSA-J'],
 'inchi': ['InChI=1S/C3H8O10P2/c4-2(1-12-14(6,7)8)3(5)13-15(9,10)11/h2,4H,1H2,(H2,6,7,8)(H2,9,10,11)',
  'InChI=1S/C3H8O10P2/c4-2(1-12-14(6,7)8)3(5)13-15(9,10)11/h2,4H,1H2,(H2,6,7,8)(H2,9,10,11)/p-4/t2-/m1/s1'],
 'metacycm': 'DPG',
 'pubchem.compound': ['683', '46878409']}

### Reaction

Here we repeat the procedure from before but this time we use a reaction instead of a metabolite.

In [5]:
reaction = model.reactions[0]
reaction

0,1
Reaction identifier,ACALD
Name,acetaldehyde dehydrogenase (acetylating)
Memory address,0x07f1b40d0ffd0
Stoichiometry,acald_c + coa_c + nad_c <=> accoa_c + h_c + nadh_c  Acetaldehyde + Coenzyme A + Nicotinamide adenine dinucleotide <=> Acetyl-CoA + H+ + Nicotinamide adenine dinucleotide - reduced
GPR,b0351 or b1241
Lower bound,-1000.0
Upper bound,1000.0


In [6]:
reaction.annotation

{'bigg.reaction': 'ACALD'}

This time the argument 'consider_sub_elements' would influence the function, since this determines whether the annotations of the metabolites of this reaction should also be expanded.

In [7]:
add_crossreferences(reaction,
                    consider_sub_elements: bool = True,
                    include_metanetx_specific_ec: bool = False,)
reaction.annotation

{'bigg.reaction': ['ACALDh', 'R_ACALDh', 'R_ACALD', 'ACALD'],
 'rhea': ['23288', '23289', '23290', '23291'],
 'biggr': ['ACALDh', 'R_ACALD', 'R_ACALDh', 'ACALD'],
 'seedr': ['rxn32711', 'rxn27656', 'rxn00171', 'rxn32710'],
 'sabiorkr': '163',
 'sabiork.reaction': '163',
 'metacyc.reaction': 'ACETALD-DEHYDROG-RXN',
 'kegg.reaction': 'R00228',
 'metacycr': 'ACETALD-DEHYDROG-RXN',
 'rhear': ['23288', '23289', '23290', '23291'],
 'seed.reaction': ['rxn32711', 'rxn27656', 'rxn00171', 'rxn32710'],
 'keggr': 'R00228'}

### Model

In [21]:
directory = "/home/jan/arbeit/memote_test/memote-model-repository_forXRef"
%cd $directory

from cobra.io import write_sbml_model, validate_sbml_model
from cobramod.core.crossreferences import add_crossreferences

model, errors = validate_sbml_model("model.xml")

/home/jan/arbeit/memote_test/memote-model-repository_forXRef


Again we run 'add_crossreferes'. This time we get a progress bar because the cross-reference extension of a whole model generally takes some time.

In [22]:
add_crossreferences(model)

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 172/172 [02:12<00:00,  1.30it/s]


In [23]:
write_sbml_model(model,"model_with_Xref.xml")

Now we use Memote to create a report that compares the model before and after expanding the annotations. We will then display this report.

In [37]:
!memote report diff --filename _static/xref.html model.xml model_with_Xref.xml 

Scaling...
 A: min|aij| =  1.000e+00  max|aij| =  1.000e+00  ratio =  1.000e+00
Problem data seem to be well scaled
The current solver interface glpk doesn't support setting the optimality tolerance.
The current solver interface glpk doesn't support setting the optimality tolerance.
platform linux -- Python 3.7.4, pytest-4.6.11, py-1.10.0, pluggy-0.13.1
rootdir: /home/jan
plugins: anyio-3.3.0
platform linux -- Python 3.7.4, pytest-4.6.11, py-1.10.0, pluggy-0.13.1
rootdir: /home/jan
plugins: anyio-3.3.0
collected 146 items / 1 skipped / 145 selected                                 [0m

collected 146 items / 1 skipped / 145 selected                                 [0mst_annotation.py [31mF[0m[1m
[36m [  0%]
[0m
../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_annotation.py [31mF[0m[36m [  0%]
[0m[31mF[0m[31mF[0m[31mF[0m[31mF[0m[31mF[0m[31mF[0m[31mF[0m[31mF[0m[31mF[0m[31mF[0m[31mF[0m[31mF[0m[31mF[0m[31mF[0m[31mF

[0m[31mF[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[36m                                                                      [ 91%][0m
../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_sbml.py [32m.[0m[36m [ 91%]
[0m[32m.[0m[36m                                                                        [ 92%][0m
../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_sbo.py [32m.[0m[36m [ 93%]
[0m[31mF[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[31mF[0m[33ms[0m[31mF[0m[32m.[0m[36m                                                               [100%][0m

[31mF[0m[33ms[0m[31mF[0m[32m.[0m[36m                                                               [100%][0m

Writing diff report to '_static/xref.html'.


We move the file index.html to prevent problems with files that have the same name.

In [36]:
from IPython.display import IFrame
IFrame(src='./_static/xref.html', width="100%", height=800)

It should be noted that for the "Metabolite Annotation Conformity Per Database", both the InChI and HMDB can perform comparatively poorly. 

The reason for this is that Cobramod adds all available InChIs based on the existing IDs. These can also be InChIs consisting of several compounds, which have a common InChI.

In HMDB it is because Memote limits the IDs to 5 digits. However, the current definition of identifiers.org does not limit the number of numbers and there exist valid identifiers on HMDB that have more than 5 digits. For those reasons, CobraMod adds the HMDB IDs even if this lowers the score in Memote.

.. bibliography::