# R. opacus model annotation and curation
The goal of this notebook is to acheive high annotation scores from [MEMOTE](https://memote.io/).

This notebook annotates a preliminary Rhodococcus PD630 genome scale model reconstruction that was generated by [CarveMe](https://academic.oup.com/nar/article/46/15/7542/5042022?login=false) using the [2021 genome from the University of Bologna](https://www.ncbi.nlm.nih.gov/assembly/GCF_020542785.1).<br>

### Method
<ol>
<li>Setup imports and load models</li>
<li>Add oxidative phosphorylation reactions from BIGG model to the model</li>
<li>Annotate metabolites</li>
<li>Annotate reactions</li>
<li>Annotate genes</li>
<li>Curate the model to improve model performace</li>
<li>Export annotated model</li>
</ol>

 ### 1. Setup imports and load models

In [2]:
import pandas as pd
import cobra
from IPython.display import IFrame

Load model from notebook B

In [2]:
model = cobra.io.read_sbml_model("../models/ropacus_model_B.xml")

Load BiGG universal model (this step can take about a minute)

In [3]:
%%time
bigg_universal = cobra.io.load_json_model("../models/universal_model.json")
bigg_universal

CPU times: user 38.1 s, sys: 1.08 s, total: 39.2 s
Wall time: 39.1 s


0,1
Name,bigg_universal
Memory address,0x07f84f4fa0f90
Number of metabolites,15638
Number of reactions,28301
Number of groups,0
Objective expression,0
Compartments,


### 2. Add oxidative phosphorylation reactions from BIGG model to the model

In [4]:
rxns_to_add = ['CBFC2pp', 'NDH1_4pp', 'CYO1bpp_syn', 'CYTBDpp']

for rxn_id in rxns_to_add:
    rxn = bigg_universal.reactions.get_by_id(rxn_id)
    print(rxn)

CBFC2pp: 2.0 ficytc6_p + 2.0 h_c + pqh2_p --> 2.0 focytc6_p + 4.0 h_p + pq_p
NDH1_4pp: co2_p + h2o_c + 3.0 h_c + nadph_c + pq_p --> 3.0 h_p + hco3_c + nadp_c + pqh2_p
CYO1bpp_syn: 2.0 focytc6_p + 4.0 h_c + 0.5 o2_p --> 2.0 ficytc6_p + h2o_p + 2.0 h_p
CYTBDpp: 2.0 h_c + 0.5 o2_c + q8h2_c --> h2o_c + 2.0 h_p + q8_c


In [5]:
# These reactions allow hydrogen ions to travel to periplasm, and then to drive ATP synthase flux
rxns_to_add = ['CBFC2pp', 'NDH1_4pp', 'CYO1bpp_syn', 'CYTBDpp']

for rxn_id in rxns_to_add:
    
    # add reactions to the model
    rxn = bigg_universal.reactions.get_by_id(rxn_id)
    model.add_reactions([rxn])
    
    # format annotations as dictionaries
    model.reactions.get_by_id(rxn.id).annotation = dict(rxn.annotation)
    for m in rxn.metabolites:
        model.metabolites.get_by_id(m.id).annotation = dict(m.annotation)
        
        # add compartment info for each metabolite
        if m.id.endswith('_c'):
            m.compartment = 'C_c'
        elif m.id.endswith('_p'):
            m.compartment = 'C_p'
        else:
            m.compartment = 'C_e'
    

### 3. Annotate metabolites

Check if any metabolites in R. opacus model are not in the universal model

In [7]:
for m in model.metabolites:
    if m.id not in [m.id for m in bigg_universal.metabolites]:
        print(f'The metabolite with name {m.name} and id {m.id} is not in the universal model')

The metabolite with name Guaiacol and id guaiacol_e is not in the universal model
The metabolite with name Guaiacol and id guaiacol_c is not in the universal model
The metabolite with name triacylglycerol (16:0/16:0/16.0) and id tag_c is not in the universal model


Check if all the metatbolites in the R. opacus model are now in the BiGG model

The remaining metabolites were added custom in Notebook A, and are therefore not expected to be in the model. These metabolites are added to a list to be ignored.

In [8]:
custom_metabolites = ['guaiacol_e', 'guaiacol_c', 'tag_c']

Check how metabolites are annotated in Bigg universal model

In [9]:
bigg_universal.metabolites.get_by_id('glc__D_c').annotation

[['KEGG Compound', 'http://identifiers.org/kegg.compound/C00031'],
 ['CHEBI', 'http://identifiers.org/chebi/CHEBI:12965'],
 ['CHEBI', 'http://identifiers.org/chebi/CHEBI:17634'],
 ['CHEBI', 'http://identifiers.org/chebi/CHEBI:20999'],
 ['CHEBI', 'http://identifiers.org/chebi/CHEBI:4167'],
 ['KEGG Drug', 'http://identifiers.org/kegg.drug/D00009'],
 ['Human Metabolome Database', 'http://identifiers.org/hmdb/HMDB00122'],
 ['Human Metabolome Database', 'http://identifiers.org/hmdb/HMDB06564'],
 ['BioCyc', 'http://identifiers.org/biocyc/META:Glucopyranose'],
 ['MetaNetX (MNX) Chemical',
  'http://identifiers.org/metanetx.chemical/MNXM41'],
 ['InChI Key', 'https://identifiers.org/inchikey/WQZGKKKJIJFFOK-GASJEMHNSA-N'],
 ['SEED Compound', 'http://identifiers.org/seed.compound/cpd00027'],
 ['SEED Compound', 'http://identifiers.org/seed.compound/cpd26821']]

We seee that the BiGG annotations are lists of lists. We need to convert them to dictionaries. <br>
First we test a method for casting to the list of lists to a dictionary

In [10]:
dict(bigg_universal.metabolites.get_by_id('glc__D_c').annotation)

{'KEGG Compound': 'http://identifiers.org/kegg.compound/C00031',
 'CHEBI': 'http://identifiers.org/chebi/CHEBI:4167',
 'KEGG Drug': 'http://identifiers.org/kegg.drug/D00009',
 'Human Metabolome Database': 'http://identifiers.org/hmdb/HMDB06564',
 'BioCyc': 'http://identifiers.org/biocyc/META:Glucopyranose',
 'MetaNetX (MNX) Chemical': 'http://identifiers.org/metanetx.chemical/MNXM41',
 'InChI Key': 'https://identifiers.org/inchikey/WQZGKKKJIJFFOK-GASJEMHNSA-N',
 'SEED Compound': 'http://identifiers.org/seed.compound/cpd26821'}

Now we apply this method to all non-custom metabolites in the R. opacus GSM

In [11]:
for m in model.metabolites:
    if m.id not in custom_metabolites:
        m.annotation = dict(bigg_universal.metabolites.get_by_id(m.id).annotation)

Check annotations on R. opacus model metabolites

In [12]:
model.metabolites.get_by_id('glc__D_c').annotation

{'KEGG Compound': 'http://identifiers.org/kegg.compound/C00031',
 'CHEBI': 'http://identifiers.org/chebi/CHEBI:4167',
 'KEGG Drug': 'http://identifiers.org/kegg.drug/D00009',
 'Human Metabolome Database': 'http://identifiers.org/hmdb/HMDB06564',
 'BioCyc': 'http://identifiers.org/biocyc/META:Glucopyranose',
 'MetaNetX (MNX) Chemical': 'http://identifiers.org/metanetx.chemical/MNXM41',
 'InChI Key': 'https://identifiers.org/inchikey/WQZGKKKJIJFFOK-GASJEMHNSA-N',
 'SEED Compound': 'http://identifiers.org/seed.compound/cpd26821'}

Convert keys in metabolite annotation dictionaries to be memote compatable. <br>
Also convert values from urls to only the portion of the url after the final '/'

In [13]:
memote_key_converter = dict({
    'BioCyc': 'biocyc',
    'CHEBI': 'chebi',
    'Human Metabolome Database': 'hmdb',
    'InChI Key': 'inchikey',
    'KEGG Compound': 'kegg.compound',
    'KEGG Drug': 'kegg.drug',
    'KEGG Glycan': 'kegg.glycan',
    'LipidMaps': 'lipidmaps',
    'MetaNetX (MNX) Chemical': 'metanetx.chemical',
    'Reactome Compound': 'reactome',
    'reactome': 'reactome',
    'SEED Compound': 'seed.compound',
    'sbo': 'sbo'
})

for m in model.metabolites:
    if m.annotation:
        m.annotation = dict((memote_key_converter[k], v.rsplit('/',1)[-1]) for k, v in m.annotation.items())
    m.annotation['bigg.metabolite'] = m.id

Add systems biology ontology values to metbolites. <br>
[https://www.ebi.ac.uk/sbo/main/SBO:0000247](https://www.ebi.ac.uk/sbo/main/SBO:0000247)

In [14]:
for m in model.metabolites:
    try:
        m.annotation['sbo'] = 'SBO:0000247'
    except:
        print(f'Need to convert the annotation of {m} to a dictionary')
        m.annotation = dict(m.annotation)
        m.annotation['sbo'] = 'SBO:0000247'

Check how metabolite annotations look now

In [15]:
model.metabolites.get_by_id('glc__D_c').annotation

{'kegg.compound': 'C00031',
 'chebi': 'CHEBI:4167',
 'kegg.drug': 'D00009',
 'hmdb': 'HMDB06564',
 'biocyc': 'META:Glucopyranose',
 'metanetx.chemical': 'MNXM41',
 'inchikey': 'WQZGKKKJIJFFOK-GASJEMHNSA-N',
 'seed.compound': 'cpd26821',
 'bigg.metabolite': 'glc__D_c',
 'sbo': 'SBO:0000247'}

### 4. Annotate reactions

Check which reactions is the R. opacus model are not in the universal model

In [16]:
for r in model.reactions:
    if r.id not in [r.id for r in bigg_universal.reactions]:
        print(f'{r.name} with the id, {r.id}, is not in the universal model')

R_sink_2ohph_c with the id, sink_2ohph_c, is not in the universal model
R_sink_amob_c with the id, sink_amob_c, is not in the universal model
R_sink_hemeO_c with the id, sink_hemeO_c, is not in the universal model
R_sink_sheme_c with the id, sink_sheme_c, is not in the universal model
R_EX_2ameph_e with the id, EX_2ameph_e, is not in the universal model
R_EX_5drib_e with the id, EX_5drib_e, is not in the universal model
R_EX_5mdru1p_e with the id, EX_5mdru1p_e, is not in the universal model
R_EX_ala_L_thr__L_e with the id, EX_ala_L_thr__L_e, is not in the universal model
R_EX_glc__aD_e with the id, EX_glc__aD_e, is not in the universal model
R_EX_gly_gln_e with the id, EX_gly_gln_e, is not in the universal model
R_EX_gly_leu_e with the id, EX_gly_leu_e, is not in the universal model
R_EX_glyglygln_e with the id, EX_glyglygln_e, is not in the universal model
R_EX_lysglugly_e with the id, EX_lysglugly_e, is not in the universal model
R_EX_prohisglu_e with the id, EX_prohisglu_e, is not i

Check the current reaction annotations in the R. opacus model

In [17]:
model.reactions.get_by_id('PPS').annotation

{'sbo': 'SBO:0000176',
 'rhea': ['11364', '11365', '11366', '11367'],
 'ec-code': '2.7.9.2',
 'biocyc': 'META:PEPSYNTH-RXN',
 'metanetx.reaction': 'MNXR103140',
 'kegg.reaction': 'R00199',
 'seed.reaction': 'rxn00147'}

Get reaction annotations from BiGG model

In [18]:
for r in model.reactions:
    if r.id in bigg_universal.reactions:
        r.annotation = dict(bigg_universal.reactions.get_by_id(r.id).annotation)

Check reaction annotation format

In [19]:
model.reactions.get_by_id('PPS').annotation

{'RHEA': 'http://identifiers.org/rhea/11367',
 'EC Number': 'http://identifiers.org/ec-code/2.7.9.2',
 'BioCyc': 'http://identifiers.org/biocyc/META:PEPSYNTH-RXN',
 'MetaNetX (MNX) Equation': 'http://identifiers.org/metanetx.reaction/MNXR103140',
 'KEGG Reaction': 'http://identifiers.org/kegg.reaction/R00199',
 'SEED Reaction': 'http://identifiers.org/seed.reaction/rxn00147'}

Convert keys in reaction annotation dictionaries to be memote compatable. <br>
Also convert values from urls to only the portion of the url after the final '/'

In [20]:
memote_key_converter = dict({'BioCyc': 'biocyc',
                     'EC Number': 'ec-code',
                     'ec-code': 'ec-code',
                     'KEGG Reaction': 'kegg.reaction',
                     'kegg.reaction': 'kegg.reaction',
                     'MetaNetX (MNX) Equation': 'metanetx.reaction',
                     'metanetx.reaction': 'metanetx.reaction',
                     'RHEA': 'rhea',
                     'rhea': 'rhea',
                     'Reactome Reaction': 'reactome',
                     'reactome':'reactome',
                     'SBO': 'sbo',
                     'sbo': 'sbo',
                     'SEED Reaction': 'seed.reaction',
                     'seed.reaction': 'seed.reaction',
                     'biocyc': 'biocyc'
                    })

for r in model.reactions:
    if r.annotation:
        r.annotation = dict((memote_key_converter[k], v.rsplit('/',1)[-1]) for k, v in r.annotation.items())

Check reaction annotation in R. opacus model

In [21]:
model.reactions.get_by_id('PPS').annotation

{'rhea': '11367',
 'ec-code': '2.7.9.2',
 'biocyc': 'META:PEPSYNTH-RXN',
 'metanetx.reaction': 'MNXR103140',
 'kegg.reaction': 'R00199',
 'seed.reaction': 'rxn00147'}

Add systems biology ontology for reactions <br>
exchange reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000627](http://www.ebi.ac.uk/sbo/main/SBO:0000627) <br>
sink reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000632](http://www.ebi.ac.uk/sbo/main/SBO:0000632) <br>
growth reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000629](http://www.ebi.ac.uk/sbo/main/SBO:0000629) <br>
demand reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000628](http://www.ebi.ac.uk/sbo/main/SBO:0000628) <br>
transport reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000655](http://www.ebi.ac.uk/sbo/main/SBO:0000655) <br>
biochemical reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000176](http://www.ebi.ac.uk/sbo/main/SBO:0000176) <br>

In [22]:
for r in model.reactions:
    if r.id.startswith('EX_'):
        r.annotation['sbo'] = 'SBO:0000627'
    elif r.id.startswith('sink_'):
        r.annotation['sbo'] = 'SBO:0000632'
    elif r.id.startswith('Growth'):
        r.annotation['sbo'] = 'SBO:0000629'
    elif r.id.startswith('ATPM'):
        r.annotation['sbo'] = 'SBO:0000628'
    elif len(r.compartments) > 1:
        r.annotation['sbo'] = 'SBO:0000655'
    else:
        r.annotation['sbo'] = 'SBO:0000176'

In [23]:
model.reactions.get_by_id('PPS').annotation

{'rhea': '11367',
 'ec-code': '2.7.9.2',
 'biocyc': 'META:PEPSYNTH-RXN',
 'metanetx.reaction': 'MNXR103140',
 'kegg.reaction': 'R00199',
 'seed.reaction': 'rxn00147',
 'sbo': 'SBO:0000176'}

In [24]:
for r in model.reactions:
    if r.annotation == {}:
        print(r.id)

No output indicates that all reactions have at least some annotation

### 5. Annotate genes
Check how many genes are in R. opacus model

In [25]:
print(f'There are {len(model.genes)} genes in the model')

There are 1773 genes in the model


Add sbo to gene annotations
gene: [http://www.ebi.ac.uk/sbo/main/SBO:0000176](http://www.ebi.ac.uk/sbo/main/SBO:0000243)

In [26]:
for g in model.genes:
    g.annotation['sbo'] = 'SBO:0000243'

Check that all genes have an annotation

In [27]:
for g in model.genes:
    if g.annotation == {}:
        print(g.id)

No output indicates that all genes are annotated <br>
Check what gene annotations in R. opacus model look like

In [28]:
model.genes.get_by_id('WP_005239747_1').annotation

{'sbo': 'SBO:0000243'}

### 6. Curate the model to improve model performace

Define helper functions for model curation

In [29]:
def should_be_balanced(r):
    return not (r.id.startswith('EX_') or r.id.startswith('sink_') or r.id.startswith('Growth'))

def balanced_or_only_hydrogen_unbalanced(r):
    return should_be_balanced(r) and list(r.check_mass_balance().keys()) == ['H']

def fix_unbalanced_hydrogen(r):
    hydrogen_error = int(r.check_mass_balance()['H'])
    r.subtract_metabolites({model.metabolites.get_by_id("h_c"): hydrogen_error})

Remove charge from the model

In [30]:
for m in model.metabolites:
    m.charge = 0

Fix reactions that only have hydrogen errors

In [31]:
unbalanced_rxns = [r for r in model.reactions if should_be_balanced(r) and r.check_mass_balance() != {}]

for r in unbalanced_rxns:
    if balanced_or_only_hydrogen_unbalanced(r):
        fix_unbalanced_hydrogen(r)

Check that there are only very minor mass balance errors

In [32]:
unbalanced_rxns = [r for r in model.reactions if should_be_balanced(r) and r.check_mass_balance() != {}]

for r in unbalanced_rxns:
    mass_error = r.check_mass_balance()
    metabolites = [m.id for m in r.metabolites]
    print(r.id, mass_error)

AGPATr_BS {'C': -7.105427357601002e-15, 'H': -1.4210854715202004e-14, 'O': -3.552713678800501e-15, 'N': -8.881784197001252e-16}
G3POA_BS {'C': -7.105427357601002e-15, 'N': -8.881784197001252e-16, 'O': -3.552713678800501e-15}


### 7. Export annotated model

In [33]:
model.id = 'ropacus_annotated_curated'
model.name = 'Rhodococcus opacus PD630 annotated and curated'
model.description = 'Rhodococcus opacus PD630 annotated curated'

cobra.io.write_sbml_model(model, "../models/Ropacus_annotated_curated.xml")

In [34]:
model

0,1
Name,ropacus_annotated_curated
Memory address,0x07f8465ca93d0
Number of metabolites,1956
Number of reactions,3025
Number of groups,0
Objective expression,1.0*Growth - 1.0*Growth_reverse_699ae
Compartments,"cytosol, periplasm, extracellular space"


Check MEMOTE output of annotated model

In [4]:
IFrame('../data/memotes/Ropacus_annotated_curated.html', 1500, 800)