# Introduction
Now that we have fixed the metabolite IDs, we can add annotations to the model to improve it's workability further. Mostly because all the information is stored in notes now 
For general importance it would be good to make sure each metabolite has a Kegg, CheBi and MetaNetX IDs. 

This notebook will cover that. 

In [1]:
import cobra
import pandas as pd
import cameo

In [2]:
#import models
model = cobra.io.read_sbml_model("../model/g-thermo.xml")

In [3]:
#import the bacillus and e. coli models too
model_e_coli = cameo.load_model("iML1515")
model_b_sub = cameo.load_model("iYO844")

In [30]:
#copy annotations from the e. coli model
unannotated_met = []
for met in model.metabolites:
    try:
        BiGGID = model.metabolites.get_by_any(met)[0].id #get BiGG ID for gth metabolite
        ann_old = model.metabolites.get_by_any(BiGGID)[0].annotation #this is the old annotation associated to that metabolite
        ann_new = ann_old.update(model_e_coli.metabolites.get_by_any(BiGGID)[0].annotation) #change the annotation to the E. coli annotation for that metabolite
    except KeyError:
        unannotated_met.append(met)
        continue 
len(unannotated_met)

425

In [31]:
# for the left over metabolites, copy from the b. sub model
unannotated_met_bsub = []
for met in model.metabolites:
    if met in unannotated_met:
        try:
            BiGGID = model.metabolites.get_by_any(met)[0].id #get BiGG ID for gth metabolite
            ann_old = model.metabolites.get_by_any(BiGGID)[0].annotation #this is the old annotation associated to that metabolite
            ann_new = ann_old.update(model_b_sub.metabolites.get_by_any(BiGGID)[0].annotation) #change the annotation to the E. coli annotation for that metabolite
        except KeyError:
            unannotated_met_bsub.append(met)
        continue
    else: continue
len(unannotated_met_bsub)

389

There are still 389 metabolites without annotations from the automatic pipeline. For these, we will try to add the Kegg, CheBI and MetanetX IDs as annotations. 

In [6]:
#save and commit
cobra.io.write_sbml_model(model,"../model/g-thermo.xml")

## Adding manual annotations

To the unannotated metabolites, we want to add the Kegg, Chebi and MetanetX IDs. This has to be done semi-manually here. 

In [13]:
#import
model = cobra.io.read_sbml_model("../model/g-thermo.xml")

In [34]:
#first add the Kegg 
unannotated_met_bsub_kegg = []
for met in model.metabolites:
    if met in unannotated_met_bsub:
        try:
            met.annotation["kegg.compound"] = met.notes["KEGG"]
        except KeyError:
            unannotated_met_bsub_kegg.append(met)
    else:
        continue
len(unannotated_met_bsub_kegg)

37

There are 37 metabolites without kegg annotation, for these we hope to add Chebi or metanetx ids.

In [35]:
#save and commit
cobra.io.write_sbml_model(model,"../model/g-thermo.xml")

In [7]:
#now try to add the CheBi annotation
unannotated_met_bsub_chebi = []
for met in model.metabolites:
    if met in unannotated_met_bsub:
        try:
            met.annotation["chebi"] = "CHEBI:" + met.notes["ChEBI"]
        except:
            unannotated_met_bsub_chebi.append(met)
    else:
        continue
len(unannotated_met_bsub_chebi)

59

There are 57 metabolites without a CheBI.

In [28]:
#import dataframe from MetaNetX
ch_df = pd.read_csv("../../Databases\chem_xref.tsv", sep="\t", skiprows=385)
ch_df.sample(5)

Unnamed: 0,#XREF,MNX_ID,Evidence,Description
553275,slm:000010688,MNXM266879,reference,"1-(5Z,8Z,11Z,14Z-eicosatetraenoyl)-2-(11-methy..."
763959,slm:000118654,MNXM465303,reference,"1-tridecanoyl-2-(11Z,14Z,17Z-eicoastrienoyl)-s..."
1103006,MNXM427977,MNXM427977,identity,
1304116,MNXM573009,MNXM573009,identity,
418074,lipidmaps:LMGP02010705,MNXM71793,reference,"1-(6Z,9Z,12Z-octadecatrienoyl)-2-docosanoyl-gl..."


In [29]:
# also try to add the metanetX ID for the ones possible
#for this, we have to couple back to the dataframe with all metanetX IDs.

unannotated_met_bsub_meta = []
for met in model.metabolites:
     if met in unannotated_met_bsub:
        try:
            kegg_id = "kegg:"+ met.notes["KEGG"]
        except KeyError:
            unannotated_met_bsub_meta.append(met)
            continue 
        #find metanetX for this compound
        meta_net_id = ch_df.loc[ch_df["#XREF"] == kegg_id,"MNX_ID"].values[0]
        met.annotation["metanetx.chemical"] = meta_net_id
len(unannotated_met_bsub_meta)

35

There are: 35 metabolites without Kegg, 57 without CheBI and 35 without MetanetX ID. Lets see if there is overlap between these lists, and if so if there are any metabolites with no annotations. 

In [30]:
#define an intersection function
def intersection(lst1, lst2): 
    lst3 = [value for value in lst1 if value in lst2] 
    return lst3 

In [32]:
#metabolites without KEGG or MetanetX
len (intersection(unannotated_met_bsub_kegg, unannotated_met_bsub_meta))

35

So all metabolites without a Kegg also dont have a metanetX annotation. This makes sense as the metaNetX annotation comes via the kegg number assigned to each metabolite. 

In [33]:
#metabolites without Kegg/metaNetX and CheBI
len(intersection(unannotated_met_bsub_kegg, unannotated_met_bsub_chebi))

35

These 35 metabolites without kegg and metanetX thus also do not have a CheBI ID assigned to them. Let's inspect them and decide if we should add these manually. 

In [34]:
intersection(unannotated_met_bsub_kegg, unannotated_met_bsub_chebi)

[<Metabolite myinp_c at 0x20e20b5d988>,
 <Metabolite uamagdgll_c at 0x20e20b5f6c8>,
 <Metabolite sucrose_c at 0x20e20b63d88>,
 <Metabolite pant_c at 0x20e20b68048>,
 <Metabolite maltose_c at 0x20e20b68508>,
 <Metabolite ftethplg_c at 0x20e20b6c388>,
 <Metabolite homocarn_c at 0x20e20b6c648>,
 <Metabolite isomalt_c at 0x20e20b6cf48>,
 <Metabolite carsine_c at 0x20e20b71348>,
 <Metabolite treh6p_c at 0x20e20b731c8>,
 <Metabolite rpantholcys_c at 0x20e20b80488>,
 <Metabolite focytB561_c at 0x20e20b87108>,
 <Metabolite capra_c at 0x20e20b8c988>,
 <Metabolite bdxyl_c at 0x20e20b8e708>,
 <Metabolite sus6p_c at 0x20e20b8ea48>,
 <Metabolite a4oxopent_c at 0x20e20b97248>,
 <Metabolite xylan_c at 0x20e20b9b4c8>,
 <Metabolite epimelbio_c at 0x20e20b9b7c8>,
 <Metabolite abmaagapc_c at 0x20e20ba2b88>,
 <Metabolite naneura_c at 0x20e20bea348>,
 <Metabolite raff_c at 0x20e20c06788>,
 <Metabolite betaine_c at 0x20e20c1d108>,
 <Metabolite ino1p_c at 0x20e20c30608>,
 <Metabolite Biomass_c at 0x20e20c724

# Conclusion
There are still 35 metabolites without anny annotation. These appear to be more distant metabolites and so maybe we can leave them without annotations. As in general they still have names, so if needed one can search them in a general databse or search engine. 

In [35]:
cobra.io.write_sbml_model(model,'../model/Beata_model_orig_g-thermo.xml')