# Genome scale model annotation
The goal of this notebook is to acheive high annotation scores from [MEMOTE](https://memote.io/).

This notebook annotates a preliminary Rhodococcus PD630 genome scale model reconstruction that was generated by [CarveMe](https://www.ncbi.nlm.nih.gov/assembly/GCF_000234335.1) using the [2011 genome from the Broad Institute](https://pubmed.ncbi.nlm.nih.gov/30192979/).<br>
### CarveMe Instructions
The draft reconstruction (Ropacus_carveme_grampos.xml) was generated with the command line commands:

<ol>
<li>carve --refseq GCF_000234335.1 -o Ropacus_carveme.xml </li>
<li>gapfill Ropacus_carveme.xml -m M9,LB -o new_model.xml</li>
</ol>

### Annotation Methods (repeated for metabolites, reactions, and genes)
<ol>
<li>Get for annotations for the components of the R. opacus model from the Bigg Universal model</li>
<li>Convert the Bigg Models list of lists data structure to a dictionary</li>
<li>Relabel the keys of the annotation dictionary to match MEMOTE's expectations</li>
</ol>

# Setup imports and initial models

In [1]:
import pandas as pd
import cobra
from IPython.display import IFrame

In [2]:
model = cobra.io.read_sbml_model("../GSMs/ropacus_model_B.xml")

In [3]:
# multiple_formula_metabolites = [metabolite for metabolite in model.metabolites if ';' in metabolite.formula]

# print(f'There are {len(multiple_formula_metabolites)} metabolites with multiple formulas')

Get BiGG universal model (this step can take about 45 seconds)

In [4]:
%%time
bigg_universal = cobra.io.load_json_model("../GSMs/universal_model.json")
bigg_universal

CPU times: user 41.6 s, sys: 1.24 s, total: 42.9 s
Wall time: 42.9 s


0,1
Name,bigg_universal
Memory address,0x07f15ac1e75d0
Number of metabolites,15638
Number of reactions,28301
Number of groups,0
Objective expression,0
Compartments,


Check initial MEMOTE Performance (This runs on Jupyterhub but not on GitHub)

In [5]:
IFrame('../memotes/model_B.html', 1500, 800)

In [6]:
print(f'The model has {len(model.reactions)} reactions, {len(model.metabolites)} metabolites, and {len(model.genes)} genes.')

The model has 1763 reactions, 1237 metabolites, and 811 genes.


# Annotate Metabolites

Check if any metabolites in R. opacus model are not in the universal model

In [7]:
for m in model.metabolites:
    if m.id not in [m.id for m in bigg_universal.metabolites]:
        print(f'The metabolite with name {m.name} and id {m.id} is not in the universal model')

The metabolite with name Guaiacol and id guaiacol_e is not in the universal model
The metabolite with name Guaiacol and id guaiacol_c is not in the universal model


Check if all the metatbolites in the R. opacus model are now in the BiGG model

The remaining metabolites were added custom in Notebook A, and are therefore not expected to be in the model. These metabolites are added to a list to be ignored.

In [8]:
custom_metabolites = ['guaiacol_e', 'guaiacol_c', 'tag_c']

Check how metabolites are annotated in Bigg universal model

In [9]:
bigg_universal.metabolites.get_by_id('glc__D_c').annotation

[['KEGG Compound', 'http://identifiers.org/kegg.compound/C00031'],
 ['CHEBI', 'http://identifiers.org/chebi/CHEBI:12965'],
 ['CHEBI', 'http://identifiers.org/chebi/CHEBI:17634'],
 ['CHEBI', 'http://identifiers.org/chebi/CHEBI:20999'],
 ['CHEBI', 'http://identifiers.org/chebi/CHEBI:4167'],
 ['KEGG Drug', 'http://identifiers.org/kegg.drug/D00009'],
 ['Human Metabolome Database', 'http://identifiers.org/hmdb/HMDB00122'],
 ['Human Metabolome Database', 'http://identifiers.org/hmdb/HMDB06564'],
 ['BioCyc', 'http://identifiers.org/biocyc/META:Glucopyranose'],
 ['MetaNetX (MNX) Chemical',
  'http://identifiers.org/metanetx.chemical/MNXM41'],
 ['InChI Key', 'https://identifiers.org/inchikey/WQZGKKKJIJFFOK-GASJEMHNSA-N'],
 ['SEED Compound', 'http://identifiers.org/seed.compound/cpd00027'],
 ['SEED Compound', 'http://identifiers.org/seed.compound/cpd26821']]

We seee that the BiGG annotations are lists of lists. We need to convert them to dictionaries. <br>
First we test a method for casting to the list of lists to a dictionary

In [10]:
dict(bigg_universal.metabolites.get_by_id('glc__D_c').annotation)

{'KEGG Compound': 'http://identifiers.org/kegg.compound/C00031',
 'CHEBI': 'http://identifiers.org/chebi/CHEBI:4167',
 'KEGG Drug': 'http://identifiers.org/kegg.drug/D00009',
 'Human Metabolome Database': 'http://identifiers.org/hmdb/HMDB06564',
 'BioCyc': 'http://identifiers.org/biocyc/META:Glucopyranose',
 'MetaNetX (MNX) Chemical': 'http://identifiers.org/metanetx.chemical/MNXM41',
 'InChI Key': 'https://identifiers.org/inchikey/WQZGKKKJIJFFOK-GASJEMHNSA-N',
 'SEED Compound': 'http://identifiers.org/seed.compound/cpd26821'}

Now we apply this method to all non-custom metabolites in the R. opacus GSM

In [11]:
for m in model.metabolites:
    if m.id not in custom_metabolites:
        m.annotation = dict(bigg_universal.metabolites.get_by_id(m.id).annotation)

Check annotations on R. opacus model metabolites

In [12]:
model.metabolites.get_by_id('glc__D_c').annotation

{'KEGG Compound': 'http://identifiers.org/kegg.compound/C00031',
 'CHEBI': 'http://identifiers.org/chebi/CHEBI:4167',
 'KEGG Drug': 'http://identifiers.org/kegg.drug/D00009',
 'Human Metabolome Database': 'http://identifiers.org/hmdb/HMDB06564',
 'BioCyc': 'http://identifiers.org/biocyc/META:Glucopyranose',
 'MetaNetX (MNX) Chemical': 'http://identifiers.org/metanetx.chemical/MNXM41',
 'InChI Key': 'https://identifiers.org/inchikey/WQZGKKKJIJFFOK-GASJEMHNSA-N',
 'SEED Compound': 'http://identifiers.org/seed.compound/cpd26821'}

Convert keys in metabolite annotation dictionaries to be memote compatable. <br>
Also convert values from urls to only the portion of the url after the final '/'

In [13]:
memote_key_converter = dict({'BioCyc': 'biocyc',
                     'CHEBI': 'chebi',
                     'Human Metabolome Database': 'hmdb',
                     'InChI Key': 'inchikey',
                     'KEGG Compound': 'kegg.compound',
                     'KEGG Drug': 'kegg.drug',
                     'KEGG Glycan': 'kegg.glycan',
                     'LipidMaps': 'lipidmaps',
                     'MetaNetX (MNX) Chemical': 'metanetx.chemical',
                     'Reactome Compound': 'reactome',
                     'SEED Compound': 'seed.compound'})

for m in model.metabolites:
    if m.annotation:
        m.annotation = dict((memote_key_converter[k], v.rsplit('/',1)[-1]) for k, v in m.annotation.items())
    m.annotation['bigg.metabolite'] = m.id

Add systems biology ontology values to metbolites. <br>
[https://www.ebi.ac.uk/sbo/main/SBO:0000247](https://www.ebi.ac.uk/sbo/main/SBO:0000247)

In [14]:
for m in model.metabolites:
    m.annotation['sbo'] = 'SBO:0000247'

Check how metabolite annotations look now

In [15]:
model.metabolites.get_by_id('glc__D_c').annotation

{'kegg.compound': 'C00031',
 'chebi': 'CHEBI:4167',
 'kegg.drug': 'D00009',
 'hmdb': 'HMDB06564',
 'biocyc': 'META:Glucopyranose',
 'metanetx.chemical': 'MNXM41',
 'inchikey': 'WQZGKKKJIJFFOK-GASJEMHNSA-N',
 'seed.compound': 'cpd26821',
 'bigg.metabolite': 'glc__D_c',
 'sbo': 'SBO:0000247'}

# Annotate Reactions

Check which reactions is the R. opacus model are not in the universal model

In [16]:
for r in model.reactions:
    if r.id not in [r.id for r in bigg_universal.reactions]:
        print(f'{r.name} with the id, {r.id}, is not in the universal model')

R_sink_mobd_c with the id, sink_mobd_c, is not in the universal model
Biomass reaction with the id, Growth, is not in the universal model
R_sink_4hba_c with the id, sink_4hba_c, is not in the universal model
fe2 transport with the id, fe2_transport, is not in the universal model
Phenol transport with the id, phenol_transport, is not in the universal model
Phenol monooxygenase with the id, PHEMOX, is not in the universal model
Vanillate transport with the id, vanlt_transport, is not in the universal model
Hydroxybenzoate transport with the id, 4_hydroxybenzoate_transport, is not in the universal model
Guaiacol exchange with the id, EX_guaiacol_e, is not in the universal model
Guaiacol transport with the id, guaiacol_transport, is not in the universal model
Guaicaol O-demethylase with the id, GUADEM, is not in the universal model
Benzoate transport with the id, bz_transport, is not in the universal model
4-hydroxybenzoate 3-monooxygenase with the id, HBHYOX, is not in the universal model

Check the current reaction annotations in the R. opacus model

In [21]:
model.reactions.get_by_id('12DGR120tipp').annotation

{'MetaNetX (MNX) Equation': 'http://identifiers.org/metanetx.reaction/MNXR94675'}

In [25]:
for r in model.reactions:
    if r.annotation != {}:
        print(r.id)

12DGR120tipp
12DGR140tipp
12DGR141tipp
12DGR160tipp
12DGR161tipp
12DGR180tipp
12DGR181tipp
13PPDH
2AGPA120tipp
2AGPA141tipp
2AGPA161tipp
2AGPA180tipp
2AGPA181tipp
2AGPE120tipp
2AGPE140tipp
2AGPE141tipp
2AGPE160tipp
2AGPE161tipp
2AGPE180tipp
2AGPE181tipp
2AGPEAT120
2AGPEAT140
2AGPEAT141
2AGPEAT160
2AGPEAT161
2AGPEAT180
2AGPEAT181
2AGPG120tipp
2AGPG141tipp
2AGPG160tipp
2AGPG161tipp
2AGPG180tipp
2AGPGAT120
2AGPGAT141
2AGPGAT160
2AGPGAT161
2AGPGAT180
2HH24DDH
2INSD
2S6HCCi
3AMACHYD
3HCINNMH
3HOXPACt2pp
3HPAOX
3HPPPNH
3MBt2pp
3MBt4pp
3MBtex
3OADPCOAT
3OXCOAT
3PEPTabcpp
3PEPTtex
3hoxpactex
4H2KPILY
4HBCOAH
4HOXPACMOF
4HOXPACt2pp
4PCP
4PEPTabcpp
5DGLCNR
AACPS1
AACPS2
AACPS3
AACPS4
AACPS5
AACPS6
AACPS7
ABTA
ABTt
ABTt_1
ABUTD
ABUTt2pp
ABUTt2r
ACACCT
ACACT1r
ACACT2r
ACACT3r
ACACT4r
ACACT5r
ACACT5r_1
ACACT6r
ACACT6r_1
ACACT7r
ACACT8r
ACACt2pp
ACACtex
ACALD
ACALDt
ACCOAC
ACCOAL
ACGApts
ACGAptspp
ACGAtex
ACGK
ACGS
ACHBS
ACKr
ACLS
ACLS_a
ACNAMt2pp
ACNAMtex
ACNML
ACOAD1
ACOAD1f
ACOAD2
ACOAD20
ACOAD2f

No output indicates that all reactions have no annotation <br>
Get reaction annotations from BiGG model

In [23]:
for r in model.reactions:
    if r.id in bigg_universal.reactions:
        r.annotation = dict(bigg_universal.reactions.get_by_id(r.id).annotation)

Check reaction annotation format

In [26]:
model.reactions.get_by_id('PPS').annotation

{'RHEA': 'http://identifiers.org/rhea/11367',
 'EC Number': 'http://identifiers.org/ec-code/2.7.9.2',
 'BioCyc': 'http://identifiers.org/biocyc/META:PEPSYNTH-RXN',
 'MetaNetX (MNX) Equation': 'http://identifiers.org/metanetx.reaction/MNXR103140',
 'KEGG Reaction': 'http://identifiers.org/kegg.reaction/R00199',
 'SEED Reaction': 'http://identifiers.org/seed.reaction/rxn00147'}

Convert keys in reaction annotation dictionaries to be memote compatable. <br>
Also convert values from urls to only the portion of the url after the final '/'

In [37]:
memote_key_converter = dict({'BioCyc': 'biocyc',
                     'EC Number': 'ec-code',
                     'ec-code': 'ec-code',
                     'KEGG Reaction': 'kegg.reaction',
                     'kegg.reaction': 'kegg.reaction',
                     'MetaNetX (MNX) Equation': 'metanetx.reaction',
                     'metanetx.reaction': 'metanetx.reaction',
                     'RHEA': 'rhea',
                     'rhea': 'rhea',
                     'Reactome Reaction': 'reactome',
                     'reactome':'reactome',
                     'SBO': 'sbo',
                     'sbo': 'sbo',
                     'SEED Reaction': 'seed.reaction',
                     'seed.reaction': 'seed.reaction',
                     'biocyc': 'biocyc'
                    })

for r in model.reactions:
    if r.annotation:
        r.annotation = dict((memote_key_converter[k], v.rsplit('/',1)[-1]) for k, v in r.annotation.items())

Check reaction annotation in R. opacus model

In [38]:
model.reactions.get_by_id('PPS').annotation

{'rhea': '11367',
 'ec-code': '2.7.9.2',
 'biocyc': 'META:PEPSYNTH-RXN',
 'metanetx.reaction': 'MNXR103140',
 'kegg.reaction': 'R00199',
 'seed.reaction': 'rxn00147'}

Add systems biology ontology for reactions <br>
exchange reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000627](http://www.ebi.ac.uk/sbo/main/SBO:0000627) <br>
sink reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000632](http://www.ebi.ac.uk/sbo/main/SBO:0000632) <br>
growth reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000629](http://www.ebi.ac.uk/sbo/main/SBO:0000629) <br>
demand reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000628](http://www.ebi.ac.uk/sbo/main/SBO:0000628) <br>
transport reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000655](http://www.ebi.ac.uk/sbo/main/SBO:0000655) <br>
biochemical reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000176](http://www.ebi.ac.uk/sbo/main/SBO:0000176) <br>

In [39]:
for r in model.reactions:
    if r.id.startswith('EX_'):
        r.annotation['sbo'] = 'SBO:0000627'
    elif r.id.startswith('sink_'):
        r.annotation['sbo'] = 'SBO:0000632'
    elif r.id.startswith('Growth'):
        r.annotation['sbo'] = 'SBO:0000629'
    elif r.id.startswith('ATPM'):
        r.annotation['sbo'] = 'SBO:0000628'
    elif len(r.compartments) > 1:
        r.annotation['sbo'] = 'SBO:0000655'
    else:
        r.annotation['sbo'] = 'SBO:0000176'

In [40]:
model.reactions.get_by_id('PPS').annotation

{'rhea': '11367',
 'ec-code': '2.7.9.2',
 'biocyc': 'META:PEPSYNTH-RXN',
 'metanetx.reaction': 'MNXR103140',
 'kegg.reaction': 'R00199',
 'seed.reaction': 'rxn00147',
 'sbo': 'SBO:0000176'}

In [41]:
for r in model.reactions:
    if r.annotation == {}:
        print(r.id)

No output indicates that all reactions have at least some annotation

# Annotate Genes
Check how many genes are in R. opacus model

In [42]:
print(f'There are {len(model.genes)} genes in the model')

There are 811 genes in the model


Add sbo to gene annotations
gene: [http://www.ebi.ac.uk/sbo/main/SBO:0000176](http://www.ebi.ac.uk/sbo/main/SBO:0000243)

In [43]:
for g in model.genes:
    g.annotation['sbo'] = 'SBO:0000243'

Check that all genes have an annotation

In [44]:
for g in model.genes:
    if g.annotation == {}:
        print(g.id)

No output indicates that all genes are annotated <br>
Check what gene annotations in R. opacus model look like

In [45]:
model.genes.get_by_id('WP_005239747_1').annotation

{'sbo': 'SBO:0000243'}

# Export annotated model

In [46]:
model.id = 'ropacus_annotated'
model.name = 'Rhodococcus opacus PD630 annotated'
model.description = 'Rhodococcus opacus PD630 model with metabolite, reaction, and gene annotations. Model metabolite formulas have not been curated'

In [47]:
cobra.io.write_sbml_model(model, "../GSMs/Ropacus_annotated.xml")

Check MEMOTE output of annotated model

In [48]:
IFrame('../memotes/ropacus_annotated.html', 1500, 800)