# Genome scale model annotation
The goal of this notebook is to acheive high annotation scores from [MEMOTE](https://memote.io/).

This notebook annotates a preliminary Rhodococcus PD630 genome scale model reconstruction that was generated by [CarveMe](https://www.ncbi.nlm.nih.gov/assembly/GCF_000234335.1) using the [2011 genome from the Broad Institute](https://pubmed.ncbi.nlm.nih.gov/30192979/).<br>
### CarveMe Instructions
The draft reconstruction (Ropacus_carveme_grampos.xml) was generated with the command line commands:

<ol>
<li>carve --refseq GCF_000234335.1 -o Ropacus_carveme.xml </li>
<li>gapfill Ropacus_carveme.xml -m M9,LB -o new_model.xml</li>
</ol>

### Annotation Methods (repeated for metabolites, reactions, and genes)
<ol>
<li>Get for annotations for the components of the R. opacus model from the Bigg Universal model</li>
<li>Convert the Bigg Models list of lists data structure to a dictionary</li>
<li>Relabel the keys of the annotation dictionary to match MEMOTE's expectations</li>
</ol>

# Setup imports and initial models

In [1]:
import pandas as pd
import cobra
from IPython.display import IFrame

In [3]:
model_B = cobra.io.read_sbml_model("../GSMs/ropacus_model_B.xml")
model_inital = cobra.io.read_sbml_model("../GSMs/Ropacus_carveme_grampos.xml")

In [6]:
multiple_formula_metabolites_B = [metabolite for metabolite in model_B.metabolites if ';' in metabolite.formula]

print(f'There are {len(multiple_formula_metabolites_B)} metabolites with multiple formulas')

# for m in model.metabolites:
#     if ';' in m.formula:
#         print(m.formula)

There are 162 metabolites with multiple formulas


Get initial R. opacus model

In [None]:
model = cobra.io.read_sbml_model("../GSMs/Ropacus_carveme_grampos.xml")
model

Get BiGG universal model (this step can take about 45 seconds)

In [None]:
rxns = 0

for r in model.reactions:
    rxns += 1
    
print(rxns)

In [None]:
multiple_formula_metabolites = [metabolite for metabolite in model.metabolites if ';' in metabolite.formula]

print(f'There are {len(multiple_formula_metabolites)} metabolites with multiple formulas')

for m in model.metabolites:
    if ';' in m.formula:
        print(m.formula)

Check initial MEMOTE Performance (This runs on Jupyterhub but not on GitHub)

In [None]:
IFrame('../memotes/ropacus_carveme_grampos.html', 1500, 800)

In [None]:
print(f'The model has {len(model.reactions)} reactions, {len(model.metabolites)} metabolites, and {len(model.genes)} genes.')

# Annotate Metabolites

Check if any metabolites in R. opacus model are not in the universal model

In [None]:
for m in model.metabolites:
    if m.id not in [m.id for m in bigg_universal.metabolites]:
        print(f'The metabolite with name {m.name} and id {m.id} is not in the universal model')

Check if Cystathionine is duplicated in the R. opacus model

In [None]:
for m in model.metabolites:
    if 'Cystathionine' in m.name:
        print(f'There is a metabolite with name {m.name} and {m.id} in the model involved in {len(m.reactions)} reactions')

Decide which  L Cystathionine to remove

In [None]:
for r in model.metabolites.get_by_id('cyst__L_c').reactions:
    print (f'{r.id}: {r.reaction}')
    
print()

for r in model.metabolites.get_by_id('cysth__L_c').reactions:
    print (f'{r.id}: {r.reaction}')

We see that the metabolite that was not in the universal model is involved in redundant reactions, and remove those reactions

In [None]:
model.reactions.get_by_id('CYSTGL_1').remove_from_model(remove_orphans=True)
model.reactions.get_by_id('CYSTS_2').remove_from_model(remove_orphans=True)

Check if all the metatbolites in the R. opacus model are now in the BiGG model

In [None]:
for m in model.metabolites:
    if m.id not in [m.id for m in bigg_universal.metabolites]:
        print(f'The metabolite with name {m.name} and id {m.id} is not in the universal model')

No output indicates that all the metabolites in the R. opacus model are also in the BiGG universal model

Check how metabolites are annotated in Bigg universal model

In [None]:
bigg_universal.metabolites.get_by_id('glc__D_c').annotation

We seee that the BiGG annotations are lists of lists. We need to convert them to dictionaries. <br>
First we test a method for casting to the list of lists to a dictionary

In [None]:
dict(bigg_universal.metabolites.get_by_id('glc__D_c').annotation)

Now we apply this method to all metabolites in the R. opacus GSM

In [None]:
for m in model.metabolites:
    m.annotation = dict(bigg_universal.metabolites.get_by_id(m.id).annotation)

Check annotations on R. opacus model metabolites

In [None]:
model.metabolites.get_by_id('glc__D_c').annotation

Convert keys in metabolite annotation dictionaries to be memote compatable. <br>
Also convert values from urls to only the portion of the url after the final '/'

In [None]:
memote_key_converter = dict({'BioCyc': 'biocyc',
                     'CHEBI': 'chebi',
                     'Human Metabolome Database': 'hmdb',
                     'InChI Key': 'inchikey',
                     'KEGG Compound': 'kegg.compound',
                     'KEGG Drug': 'kegg.drug',
                     'KEGG Glycan': 'kegg.glycan',
                     'LipidMaps': 'lipidmaps',
                     'MetaNetX (MNX) Chemical': 'metanetx.chemical',
                     'Reactome Compound': 'reactome',
                     'SEED Compound': 'seed.compound'})

for m in model.metabolites:
    if m.annotation:
        m.annotation = dict((memote_key_converter[k], v.rsplit('/',1)[-1]) for k, v in m.annotation.items())
    m.annotation['bigg.metabolite'] = m.id

Add systems biology ontology values to metbolites. <br>
[https://www.ebi.ac.uk/sbo/main/SBO:0000247](https://www.ebi.ac.uk/sbo/main/SBO:0000247)

In [None]:
for m in model.metabolites:
    m.annotation['sbo'] = 'SBO:0000247'

Check how metabolite annotations look now

In [None]:
model.metabolites.get_by_id('glc__D_c').annotation

# Annotate Reactions

Check which reactions is the R. opacus model are not in the universal model

In [None]:
for r in model.reactions:
    if r.id not in [r.id for r in bigg_universal.reactions]:
        print(f'{r.name} with the id, {r.id}, is not in the universal model')

Check the current reaction annotations in the R. opacus model

In [None]:
for r in model.reactions:
    if r.annotation != {}:
        print(r.id)

No output indicates that all reactions have no annotation <br>
Get reaction annotations from BiGG model

In [None]:
for r in model.reactions:
    if r.id in bigg_universal.reactions:
        r.annotation = dict(bigg_universal.reactions.get_by_id(r.id).annotation)

Check reaction annotation format

In [None]:
model.reactions.get_by_id('PGI').annotation

Convert keys in reaction annotation dictionaries to be memote compatable. <br>
Also convert values from urls to only the portion of the url after the final '/'

In [None]:
memote_key_converter = dict({'BioCyc': 'biocyc',
                     'EC Number': 'ec-code',
                     'KEGG Reaction': 'kegg.reaction',
                     'MetaNetX (MNX) Equation': 'metanetx.reaction',
                     'RHEA': 'rhea',
                     'Reactome Reaction': 'reactome',
                     'SBO': 'sbo',
                     'SEED Reaction': 'seed.reaction'})

for r in model.reactions:
    if r.annotation:
        r.annotation = dict((memote_key_converter[k], v.rsplit('/',1)[-1]) for k, v in r.annotation.items())

Check reaction annotation in R. opacus model

In [None]:
model.reactions.get_by_id('PGI').annotation

Add systems biology ontology for reactions <br>
exchange reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000627](http://www.ebi.ac.uk/sbo/main/SBO:0000627) <br>
sink reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000632](http://www.ebi.ac.uk/sbo/main/SBO:0000632) <br>
growth reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000629](http://www.ebi.ac.uk/sbo/main/SBO:0000629) <br>
demand reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000628](http://www.ebi.ac.uk/sbo/main/SBO:0000628) <br>
transport reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000655](http://www.ebi.ac.uk/sbo/main/SBO:0000655) <br>
biochemical reaction: [http://www.ebi.ac.uk/sbo/main/SBO:0000176](http://www.ebi.ac.uk/sbo/main/SBO:0000176) <br>

In [None]:
for r in model.reactions:
    if r.id.startswith('EX_'):
        r.annotation['sbo'] = 'SBO:0000627'
    elif r.id.startswith('sink_'):
        r.annotation['sbo'] = 'SBO:0000632'
    elif r.id.startswith('Growth'):
        r.annotation['sbo'] = 'SBO:0000629'
    elif r.id.startswith('ATPM'):
        r.annotation['sbo'] = 'SBO:0000628'
    elif len(r.compartments) > 1:
        r.annotation['sbo'] = 'SBO:0000655'
    else:
        r.annotation['sbo'] = 'SBO:0000176'

In [None]:
model.reactions.get_by_id('PGI').annotation

In [None]:
for r in model.reactions:
    if r.annotation == {}:
        print(r.id)

No output indicates that all reactions have at least some annotation

# Annotate Genes
Check how many genes are in R. opacus model

In [None]:
print(f'There are {len(model.genes)} genes in the model')

Import gene_converting.csv as pandas dataframe, and display the first 15 values. <br>
The index is the current gene name in the model, and geneID is the ncbi gene id.

In [None]:
# gene_converter = pd.read_csv('../gene_converter/r_opacus_gene_converter.csv', index_col = 0)
# gene_converter.head(15)

Apply gene ids to genes in the model

In [None]:
# for g in model.genes:
#     if g.id in gene_converter.index:
#         try:
#             g.annotation['ncbiprotein'] = str(int(gene_converter.loc[g.id]['GeneID']))
#         except:
#             print(f'Problem with gene: {g.id}')
#     else:
#         print(f'gene {g.id} not in gene_converter.csv')

In [None]:
# model.genes.get_by_id('spontaneous').annotation = {'ncbiprotein': 'spontaneous'}

Add sbo to gene annotations
gene: [http://www.ebi.ac.uk/sbo/main/SBO:0000176](http://www.ebi.ac.uk/sbo/main/SBO:0000243)

In [None]:
for g in model.genes:
    g.annotation['sbo'] = 'SBO:0000243'

Check that all genes have an annotation

In [None]:
for g in model.genes:
    if g.annotation == {}:
        print(g.id)

No output indicates that all genes are annotated <br>
Check what gene annotations in R. opacus model look like

In [None]:
model.genes.get_by_id('WP_005239747_1').annotation

# Export annotated model

In [None]:
model.id = 'ropacus_annotated'
model.name = 'Rhodococcus opacus PD630 annotated'
model.description = 'Rhodococcus opacus PD630 model with metabolite, reaction, and gene annotations. Model reactions have not been curated'

In [None]:
cobra.io.write_sbml_model(model, "../GSMs/Ropacus_annotated.xml")

Check MEMOTE output of annotated model

In [None]:
IFrame('../memotes/ropacus_annotated.html', 1500, 800)