# Genome scale model annotation
The goal of this notebook is to acheive high annotation scores from [MEMOTE](https://memote.io/).

This notebook annotates a preliminary Rhodococcus PD630 genome scale model reconstruction that was generated by [CarveMe](https://academic.oup.com/nar/article/46/15/7542/5042022?login=false) using the [2021 genome from the University of Bologna](https://www.ncbi.nlm.nih.gov/assembly/GCF_020542785.1).<br>

### Annotation Methods (repeated for metabolites, reactions, and genes)
<ol>
<li>Get for annotations for the components of the R. opacus model from the Bigg Universal model</li>
<li>Convert the Bigg Models list of lists data structure to a dictionary</li>
<li>Relabel the keys of the annotation dictionary to match MEMOTE's expectations</li>
</ol>

# Setup imports and initial models

In [1]:
import pandas as pd
import cobra
from IPython.display import IFrame

In [2]:
model = cobra.io.read_sbml_model("../models/ropacus_model_B.xml")

In [3]:
multiple_formula_metabolites = [metabolite for metabolite in model.metabolites if ';' in metabolite.formula]

print(f'There are {len(multiple_formula_metabolites)} metabolites with multiple formulas')

There are 0 metabolites with multiple formulas


Get BiGG universal model (this step can take about a minute)

In [4]:
%%time
bigg_universal = cobra.io.load_json_model("../models/universal_model.json")
bigg_universal

CPU times: user 38 s, sys: 731 ms, total: 38.8 s
Wall time: 38.8 s


0,1
Name,bigg_universal
Memory address,0x07f697dfad490
Number of metabolites,15638
Number of reactions,28301
Number of groups,0
Objective expression,0
Compartments,


Check initial MEMOTE Performance (This runs on Jupyterhub but not on GitHub)

In [5]:
# IFrame('../memotes/model_B.html', 1500, 800)

In [6]:
print(f'The model has {len(model.reactions)} reactions, {len(model.metabolites)} metabolites, and {len(model.genes)} genes.')

The model has 3019 reactions, 1951 metabolites, and 1773 genes.


# Annotate Metabolites

Check if any metabolites in R. opacus model are not in the universal model

In [7]:
for m in model.metabolites:
    if m.id not in [m.id for m in bigg_universal.metabolites]:
        print(f'The metabolite with name {m.name} and id {m.id} is not in the universal model')

The metabolite with name Guaiacol and id guaiacol_e is not in the universal model
The metabolite with name Guaiacol and id guaiacol_c is not in the universal model


Check if all the metatbolites in the R. opacus model are now in the BiGG model

The remaining metabolites were added custom in Notebook A, and are therefore not expected to be in the model. These metabolites are added to a list to be ignored.

In [8]:
custom_metabolites = ['guaiacol_e', 'guaiacol_c', 'tag_c']

Check how metabolites are annotated in Bigg universal model

In [9]:
bigg_universal.metabolites.get_by_id('glc__D_c').annotation

[['KEGG Compound', 'http://identifiers.org/kegg.compound/C00031'],
 ['CHEBI', 'http://identifiers.org/chebi/CHEBI:12965'],
 ['CHEBI', 'http://identifiers.org/chebi/CHEBI:17634'],
 ['CHEBI', 'http://identifiers.org/chebi/CHEBI:20999'],
 ['CHEBI', 'http://identifiers.org/chebi/CHEBI:4167'],
 ['KEGG Drug', 'http://identifiers.org/kegg.drug/D00009'],
 ['Human Metabolome Database', 'http://identifiers.org/hmdb/HMDB00122'],
 ['Human Metabolome Database', 'http://identifiers.org/hmdb/HMDB06564'],
 ['BioCyc', 'http://identifiers.org/biocyc/META:Glucopyranose'],
 ['MetaNetX (MNX) Chemical',
  'http://identifiers.org/metanetx.chemical/MNXM41'],
 ['InChI Key', 'https://identifiers.org/inchikey/WQZGKKKJIJFFOK-GASJEMHNSA-N'],
 ['SEED Compound', 'http://identifiers.org/seed.compound/cpd00027'],
 ['SEED Compound', 'http://identifiers.org/seed.compound/cpd26821']]

We seee that the BiGG annotations are lists of lists. We need to convert them to dictionaries. <br>
First we test a method for casting to the list of lists to a dictionary

In [10]:
dict(bigg_universal.metabolites.get_by_id('glc__D_c').annotation)

{'KEGG Compound': 'http://identifiers.org/kegg.compound/C00031',
 'CHEBI': 'http://identifiers.org/chebi/CHEBI:4167',
 'KEGG Drug': 'http://identifiers.org/kegg.drug/D00009',
 'Human Metabolome Database': 'http://identifiers.org/hmdb/HMDB06564',
 'BioCyc': 'http://identifiers.org/biocyc/META:Glucopyranose',
 'MetaNetX (MNX) Chemical': 'http://identifiers.org/metanetx.chemical/MNXM41',
 'InChI Key': 'https://identifiers.org/inchikey/WQZGKKKJIJFFOK-GASJEMHNSA-N',
 'SEED Compound': 'http://identifiers.org/seed.compound/cpd26821'}

Now we apply this method to all non-custom metabolites in the R. opacus GSM

In [11]:
for m in model.metabolites:
    if m.id not in custom_metabolites:
        m.annotation = dict(bigg_universal.metabolites.get_by_id(m.id).annotation)

Check annotations on R. opacus model metabolites

In [12]:
model.metabolites.get_by_id('glc__D_c').annotation

{'KEGG Compound': 'http://identifiers.org/kegg.compound/C00031',
 'CHEBI': 'http://identifiers.org/chebi/CHEBI:4167',
 'KEGG Drug': 'http://identifiers.org/kegg.drug/D00009',
 'Human Metabolome Database': 'http://identifiers.org/hmdb/HMDB06564',
 'BioCyc': 'http://identifiers.org/biocyc/META:Glucopyranose',
 'MetaNetX (MNX) Chemical': 'http://identifiers.org/metanetx.chemical/MNXM41',
 'InChI Key': 'https://identifiers.org/inchikey/WQZGKKKJIJFFOK-GASJEMHNSA-N',
 'SEED Compound': 'http://identifiers.org/seed.compound/cpd26821'}

Convert keys in metabolite annotation dictionaries to be memote compatable. <br>
Also convert values from urls to only the portion of the url after the final '/'

In [13]:
memote_key_converter = dict({'BioCyc': 'biocyc',
                     'CHEBI': 'chebi',
                     'Human Metabolome Database': 'hmdb',
                     'InChI Key': 'inchikey',
                     'KEGG Compound': 'kegg.compound',
                     'KEGG Drug': 'kegg.drug',
                     'KEGG Glycan': 'kegg.glycan',
                     'LipidMaps': 'lipidmaps',
                     'MetaNetX (MNX) Chemical': 'metanetx.chemical',
                     'Reactome Compound': 'reactome',
                     'SEED Compound': 'seed.compound'})

for m in model.metabolites:
    if m.annotation:
        m.annotation = dict((memote_key_converter[k], v.rsplit('/',1)[-1]) for k, v in m.annotation.items())
    m.annotation['bigg.metabolite'] = m.id

Add systems biology ontology values to metbolites. <br>
[https://www.ebi.ac.uk/sbo/main/SBO:0000247](https://www.ebi.ac.uk/sbo/main/SBO:0000247)

In [14]:
for m in model.metabolites:
    m.annotation['sbo'] = 'SBO:0000247'

Check how metabolite annotations look now

In [15]:
model.metabolites.get_by_id('glc__D_c').annotation

{'kegg.compound': 'C00031',
 'chebi': 'CHEBI:4167',
 'kegg.drug': 'D00009',
 'hmdb': 'HMDB06564',
 'biocyc': 'META:Glucopyranose',
 'metanetx.chemical': 'MNXM41',
 'inchikey': 'WQZGKKKJIJFFOK-GASJEMHNSA-N',
 'seed.compound': 'cpd26821',
 'bigg.metabolite': 'glc__D_c',
 'sbo': 'SBO:0000247'}

# Annotate Reactions

Check which reactions is the R. opacus model are not in the universal model

In [16]:
for r in model.reactions:
    if r.id not in [r.id for r in bigg_universal.reactions]:
        print(f'{r.name} with the id, {r.id}, is not in the universal model')

R_sink_2ohph_c with the id, sink_2ohph_c, is not in the universal model
R_sink_amob_c with the id, sink_amob_c, is not in the universal model
R_sink_hemeO_c with the id, sink_hemeO_c, is not in the universal model
R_sink_sheme_c with the id, sink_sheme_c, is not in the universal model
R_EX_2ameph_e with the id, EX_2ameph_e, is not in the universal model
R_EX_5drib_e with the id, EX_5drib_e, is not in the universal model
R_EX_5mdru1p_e with the id, EX_5mdru1p_e, is not in the universal model
R_EX_ala_L_thr__L_e with the id, EX_ala_L_thr__L_e, is not in the universal model
R_EX_glc__aD_e with the id, EX_glc__aD_e, is not in the universal model
R_EX_gly_gln_e with the id, EX_gly_gln_e, is not in the universal model
R_EX_gly_leu_e with the id, EX_gly_leu_e, is not in the universal model
R_EX_glyglygln_e with the id, EX_glyglygln_e, is not in the universal model
R_EX_lysglugly_e with the id, EX_lysglugly_e, is not in the universal model
R_EX_prohisglu_e with the id, EX_prohisglu_e, is not i

Check the current reaction annotations in the R. opacus model

In [17]:
model.reactions.get_by_id('PPS').annotation

{'sbo': 'SBO:0000176',
 'rhea': ['11364', '11365', '11366', '11367'],
 'ec-code': '2.7.9.2',
 'biocyc': 'META:PEPSYNTH-RXN',
 'metanetx.reaction': 'MNXR103140',
 'kegg.reaction': 'R00199',
 'seed.reaction': 'rxn00147'}

In [18]:
# for r in model.reactions:
#     if r.annotation != {}:
#         print(r.id)

No output indicates that all reactions have no annotation <br>
Get reaction annotations from BiGG model

In [19]:
for r in model.reactions:
    if r.id in bigg_universal.reactions:
        r.annotation = dict(bigg_universal.reactions.get_by_id(r.id).annotation)

Check reaction annotation format

In [20]:
model.reactions.get_by_id('PPS').annotation

{'RHEA': 'http://identifiers.org/rhea/11367',
 'EC Number': 'http://identifiers.org/ec-code/2.7.9.2',
 'BioCyc': 'http://identifiers.org/biocyc/META:PEPSYNTH-RXN',
 'MetaNetX (MNX) Equation': 'http://identifiers.org/metanetx.reaction/MNXR103140',
 'KEGG Reaction': 'http://identifiers.org/kegg.reaction/R00199',
 'SEED Reaction': 'http://identifiers.org/seed.reaction/rxn00147'}

Convert keys in reaction annotation dictionaries to be memote compatable. <br>
Also convert values from urls to only the portion of the url after the final '/'

In [21]:
memote_key_converter = dict({'BioCyc': 'biocyc',
                     'EC Number': 'ec-code',
                     'ec-code': 'ec-code',
                     'KEGG Reaction': 'kegg.reaction',
                     'kegg.reaction': 'kegg.reaction',
                     'MetaNetX (MNX) Equation': 'metanetx.reaction',
                     'metanetx.reaction': 'metanetx.reaction',
                     'RHEA': 'rhea',
                     'rhea': 'rhea',
                     'Reactome Reaction': 'reactome',
                     'reactome':'reactome',
                     'SBO': 'sbo',
                     'sbo': 'sbo',
                     'SEED Reaction': 'seed.reaction',
                     'seed.reaction': 'seed.reaction',
                     'biocyc': 'biocyc'
                    })

for r in model.reactions:
    if r.annotation:
        r.annotation = dict((memote_key_converter[k], v.rsplit('/',1)[-1]) for k, v in r.annotation.items())

Check reaction annotation in R. opacus model

In [22]:
model.reactions.get_by_id('PPS').annotation

{'rhea': '11367',
 'ec-code': '2.7.9.2',
 'biocyc': 'META:PEPSYNTH-RXN',
 'metanetx.reaction': 'MNXR103140',
 'kegg.reaction': 'R00199',
 'seed.reaction': 'rxn00147'}

Add systems biology ontology for reactions <br>
exchange reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000627](http://www.ebi.ac.uk/sbo/main/SBO:0000627) <br>
sink reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000632](http://www.ebi.ac.uk/sbo/main/SBO:0000632) <br>
growth reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000629](http://www.ebi.ac.uk/sbo/main/SBO:0000629) <br>
demand reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000628](http://www.ebi.ac.uk/sbo/main/SBO:0000628) <br>
transport reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000655](http://www.ebi.ac.uk/sbo/main/SBO:0000655) <br>
biochemical reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000176](http://www.ebi.ac.uk/sbo/main/SBO:0000176) <br>

In [23]:
for r in model.reactions:
    if r.id.startswith('EX_'):
        r.annotation['sbo'] = 'SBO:0000627'
    elif r.id.startswith('sink_'):
        r.annotation['sbo'] = 'SBO:0000632'
    elif r.id.startswith('Growth'):
        r.annotation['sbo'] = 'SBO:0000629'
    elif r.id.startswith('ATPM'):
        r.annotation['sbo'] = 'SBO:0000628'
    elif len(r.compartments) > 1:
        r.annotation['sbo'] = 'SBO:0000655'
    else:
        r.annotation['sbo'] = 'SBO:0000176'

In [24]:
model.reactions.get_by_id('PPS').annotation

{'rhea': '11367',
 'ec-code': '2.7.9.2',
 'biocyc': 'META:PEPSYNTH-RXN',
 'metanetx.reaction': 'MNXR103140',
 'kegg.reaction': 'R00199',
 'seed.reaction': 'rxn00147',
 'sbo': 'SBO:0000176'}

In [25]:
for r in model.reactions:
    if r.annotation == {}:
        print(r.id)

No output indicates that all reactions have at least some annotation

# Annotate Genes
Check how many genes are in R. opacus model

In [26]:
print(f'There are {len(model.genes)} genes in the model')

There are 1773 genes in the model


Add sbo to gene annotations
gene: [http://www.ebi.ac.uk/sbo/main/SBO:0000176](http://www.ebi.ac.uk/sbo/main/SBO:0000243)

In [27]:
for g in model.genes:
    g.annotation['sbo'] = 'SBO:0000243'

Check that all genes have an annotation

In [28]:
for g in model.genes:
    if g.annotation == {}:
        print(g.id)

No output indicates that all genes are annotated <br>
Check what gene annotations in R. opacus model look like

In [29]:
model.genes.get_by_id('WP_005239747_1').annotation

{'sbo': 'SBO:0000243'}

# Export annotated model

In [30]:
model.id = 'ropacus_annotated'
model.name = 'Rhodococcus opacus PD630 annotated'
model.description = 'Rhodococcus opacus PD630 model with metabolite, reaction, and gene annotations. Model metabolite formulas have not been curated'

In [31]:
cobra.io.write_sbml_model(model, "../models/Ropacus_annotated.xml")

Check MEMOTE output of annotated model

In [32]:
# IFrame('../data/memotes/ropacus_annotated.html', 1500, 800)