# Introduction
Now that we have fixed the metabolite IDs, we can add annotations to the model to improve it's workability further. Mostly because all the information is stored in notes now, where it is much more proper to store the information as annotation. 
For general importance it would be good to make sure each metabolite has a Kegg, CheBi and MetaNetX IDs. 

This notebook will cover that. 

In [2]:
import cobra
import pandas as pd
import cameo

In [2]:
#import models
model = cobra.io.read_sbml_model("../../model/g-thermo.xml")

In [6]:
#import the bacillus and e. coli models too
model_e_coli = cameo.load_model("iML1515")
model_b_sub = cameo.load_model("iYO844")

In [14]:
#copy annotations from the e. coli model
unannotated_met = []
for met in model.metabolites:
    try:
        BiGGID = model.metabolites.get_by_any(met)[0].id #get BiGG ID for gth metabolite
        ann_old = model.metabolites.get_by_any(BiGGID)[0].annotation #this is the old annotation associated to that metabolite
        ann_new = ann_old.update(model_e_coli.metabolites.get_by_any(BiGGID)[0].annotation) #change the annotation to the E. coli annotation for that metabolite
    except KeyError:
        unannotated_met.append(met)
        continue 
len(unannotated_met)

425

In [15]:
# for the left over metabolites, copy from the b. sub model
unannotated_met_bsub = []
for met in model.metabolites:
    if met in unannotated_met:
        try:
            BiGGID = model.metabolites.get_by_any(met)[0].id #get BiGG ID for gth metabolite
            ann_old = model.metabolites.get_by_any(BiGGID)[0].annotation #this is the old annotation associated to that metabolite
            ann_new = ann_old.update(model_b_sub.metabolites.get_by_any(BiGGID)[0].annotation) #change the annotation to the E. coli annotation for that metabolite
        except KeyError:
            unannotated_met_bsub.append(met)
        continue
    else: continue
len(unannotated_met_bsub)

389

There are still 389 metabolites without annotations from the automatic pipeline. For these, we will try to add the Kegg, CheBI and MetanetX IDs as annotations. 

In [6]:
#save and commit
cobra.io.write_sbml_model(model,"../../model/g-thermo.xml")

## Adding manual annotations

To the unannotated metabolites, we want to add the Kegg, Chebi and MetanetX IDs. This has to be done semi-manually here. 

In [38]:
#import
model = cobra.io.read_sbml_model("../../model/g-thermo.xml")

In [19]:
#first add the Kegg 
unannotated_met_bsub_kegg = []
for met in model.metabolites:
    if met in unannotated_met_bsub:
        try:
            met.annotation["kegg.compound"] = met.notes["KEGG"]
        except KeyError:
            unannotated_met_bsub_kegg.append(met)
    else:
        continue
len(unannotated_met_bsub_kegg)

37

There are 37 metabolites without kegg annotation, for these we hope to add Chebi or metanetx ids.

In [35]:
#save and commit
cobra.io.write_sbml_model(model,"../../model/g-thermo.xml")

In [3]:
#import
model = cobra.io.read_sbml_model("../../model/g-thermo.xml")

In [20]:
#now try to add the CheBi annotation
unannotated_met_bsub_chebi = []
for met in model.metabolites:
    if met in unannotated_met_bsub:
        try:
            met.annotation["chebi"] = "CHEBI:" + met.notes["ChEBI"]
        except:
            unannotated_met_bsub_chebi.append(met)
    else:
        continue
len(unannotated_met_bsub_chebi)

59

There are 59 metabolites without a CheBI ID annotated. Finally, we will try to add a MetanetX ID for the metabolites. 

In [10]:
#save and commit
cobra.io.write_sbml_model(model,"../../model/g-thermo.xml")

In [11]:
#import
model = cobra.io.read_sbml_model("../../model/g-thermo.xml")

In [12]:
#import dataframe from MetaNetX
ch_df = pd.read_csv("../../../Databases\chem_xref.tsv", sep="\t", skiprows=385)
ch_df.sample(5)

Unnamed: 0,#XREF,MNX_ID,Evidence,Description
812602,slm:000143816,MNXM423510,reference,1-O-hexadecyl-2-(13-methyltetradecanoyl)-3-(5Z...
801140,MNXM436446,MNXM436446,identity,
873481,slm:000175024,MNXM301072,reference,"1-(8Z,11Z,14Z,17Z-eicosatetraenoyl)-2-(13-meth..."
1071260,MNXM312652,MNXM312652,identity,
1517721,MNXM629889,MNXM629889,identity,


In [16]:
# also try to add the metanetX ID for the ones possible
#for this, we have to couple back to the dataframe with all metanetX IDs.

unannotated_met_bsub_meta = []
for met in model.metabolites:
     if met in unannotated_met_bsub:
        try:
            kegg_id = "kegg:"+ met.notes["KEGG"]
        except KeyError:
            unannotated_met_bsub_meta.append(met)
            continue 
        #find metanetX for this compound
        meta_net_id = ch_df.loc[ch_df["#XREF"] == kegg_id,"MNX_ID"].values[0]
        met.annotation["metanetx.chemical"] = meta_net_id
len(unannotated_met_bsub_meta)

37

There are: 37 metabolites without Kegg, 59 without CheBI and 37 without MetanetX ID. Lets see if there is overlap between these lists, and if so if there are any metabolites with no annotations. 

In [17]:
#define an intersection function
def intersection(lst1, lst2): 
    lst3 = [value for value in lst1 if value in lst2] 
    return lst3 

In [21]:
#metabolites without KEGG or MetanetX
len (intersection(unannotated_met_bsub_kegg, unannotated_met_bsub_meta))

37

So all metabolites without a Kegg also dont have a metanetX annotation. This makes sense as the metaNetX annotation comes via the kegg number assigned to each metabolite. 

In [22]:
#metabolites without Kegg/metaNetX and CheBI
len(intersection(unannotated_met_bsub_kegg, unannotated_met_bsub_chebi))

37

These 37 metabolites without kegg and metanetX thus also do not have a CheBI ID assigned to them. Let's inspect them and decide if we should add these manually. 

In [23]:
intersection(unannotated_met_bsub_kegg, unannotated_met_bsub_chebi)

[<Metabolite myinp_c at 0x269d280a688>,
 <Metabolite uamagdgll_c at 0x269d2806848>,
 <Metabolite sucrose_c at 0x269d2801188>,
 <Metabolite pant_c at 0x269d27fd208>,
 <Metabolite maltose_c at 0x269d774a8c8>,
 <Metabolite ftethplg_c at 0x269d27f4d88>,
 <Metabolite homocarn_c at 0x269d27f5688>,
 <Metabolite isomalt_c at 0x269d27efd08>,
 <Metabolite carsine_c at 0x269d27e5f08>,
 <Metabolite treh6p_c at 0x269d27dd288>,
 <Metabolite rpantholcys_c at 0x269d27b14c8>,
 <Metabolite focytB561_c at 0x269d27a7ac8>,
 <Metabolite capra_c at 0x269d2794c08>,
 <Metabolite bdxyl_c at 0x269d278fd08>,
 <Metabolite sus6p_c at 0x269d278ee48>,
 <Metabolite a4oxopent_c at 0x269d2772788>,
 <Metabolite xylan_c at 0x269d276b9c8>,
 <Metabolite epimelbio_c at 0x269d276b748>,
 <Metabolite malt6p2_c at 0x269d276c808>,
 <Metabolite Glycan_2_c at 0x269d276bf48>,
 <Metabolite abmaagapc_c at 0x269d2758b48>,
 <Metabolite naneura_c at 0x269d3a13748>,
 <Metabolite raff_c at 0x269d36ca108>,
 <Metabolite betaine_c at 0x269d36

# Conclusion
There are still 37 metabolites without any annotation. These appear to be more distant metabolites and so maybe we can leave them without annotations. As in general they still have names, so if needed one can search them in a general databse or search engine. 

In [24]:
cobra.io.write_sbml_model(model,'../../model/g-thermo.xml')