# Introduction
Now that all the reactions have proper names, we can add all the needed annotations to the reactions to make further searching easier, and from there start fixing stochiometric problems. Right now, each reaction has no annotation at all. For some of them we can lift it from the notes.

For the reactions, we would want to have KEGG IDs, MetanetX IDs and E.C. numbers where ever possible.

In [1]:
import cobra
import pandas as pd
import cameo

In [2]:
model = cobra.io.read_sbml_model('../model/g-thermo.xml')

In [4]:
matteo_model = cobra.io.read_sbml_model("../databases/g-thermo-Matteo.xml")

## KEGG IDs
First we will add KEGG IDs. This can be done by lifting the Kegg reaction information from the reaction name or reaction notes.

In [5]:
#first try to add the KEGG reaction IDs
unannotated_rct_kegg = []
for rct in model.reactions:
    try:
        rct.annotation["kegg.reaction"] = rct.notes["KEGG ID"]
    except:
        unannotated_rct_kegg.append(rct)
len(unannotated_rct_kegg)
#the majority of the unannotated reactions are exchange and transport, as expected. 

339

In [6]:
cobra.io.write_sbml_model(model,"../model/g-thermo.xml")

## MetaNetX IDs
Now I will try to annotate each compound with MetaNetX IDs, where possible, based on the KEGG ID's found. 

In [7]:
#load metaNetX dataframe
rct_df = pd.read_csv("../../databases/reac_xref.tsv", sep="\t", skiprows=385)
#NOTE: an extra colum head was added to the file on line 386 called 'note' to fix the headings being aligned further

In [8]:
rct_df

Unnamed: 0,#XREF,MNX_ID,Note
0,MNXR01,MNXR01,Synthetic reaction
1,bigg:10FTHF5GLUtl,MNXR94668,1 10fthf5glu@l = 1 10fthf5glu@c
2,bigg:R_10FTHF5GLUtl,MNXR94668,1 10fthf5glu@l = 1 10fthf5glu@c
3,MNXR94668,MNXR94668,
4,bigg:10FTHF5GLUtm,MNXR94668,1 10fthf5glu@c = 1 10fthf5glu@m
...,...,...,...
238202,deprecated:MNXR56427,MNXR116132,
238203,deprecated:MNXR56426,MNXR108670,
238204,deprecated:MNXR56425,MNXR108723,
238205,deprecated:MNXR56424,MNXR106728,


In [9]:
unannotated_rct_meta = []
for rct in model.reactions:
    try:
        rct_id = "kegg:"+ rct.notes["KEGG ID"]
    except KeyError:
        unannotated_rct_meta.append(rct)
        continue
    #find metanetX ID for this compound
    try:
        rct_new_ann = rct_df.loc[rct_df["#XREF"] == rct_id,"MNX_ID"].values[0]
    except IndexError:
        unannotated_rct_meta.append(rct)
        continue
    rct.annotation["metanetx.reaction"] = rct_new_ann

len(unannotated_rct_meta)

374

In [10]:
cobra.io.write_sbml_model(model,"../model/g-thermo.xml")

## E.C. numbers
First we will try to add E.C. numbers based on the E.C. numbers present in the iML1515 model. To do so, we first make a dictionary of these values and then use this to match our reactions to it.

As a second attempt, we will annotate all reactions with a Rhea database annotation, based on the reaction KEGG ID. Then from here, for all metabolites without an E.C. number from the iML1515 model, we will add the E.C. annotation, on the basis of the Rhea annotation

In [11]:
#load the iML1515 model
model_e_coli = cameo.load_model("iML1515")

In [12]:
#making the e. coli iML1515 based dictionary of EC numbers based on BiGG ID
EC_dict = {reaction.id : reaction.annotation["ec-code"] for reaction in model_e_coli.reactions if "ec-code" in reaction.annotation}

In [13]:
len(EC_dict)

1096

In [14]:
#copy E.C. numbers from the E. coli model on the basis of the BIGG IDs.
no_ec_rct = []
for reaction in model.reactions:
    try:
        reaction.id in EC_dict
        reaction.annotation["ec-code"] = EC_dict[reaction.id]
    except:
        no_ec_rct.append(reaction)
len(no_ec_rct)

904

So not many reactions have an E.C. number attached to them in this way. For the (non-exchange or -transport) reactions we can try to add them by using the Rhea database. 

For all reactions we first add a rhea number, based on the kegg reaction ID the reaction has.
Then we can couple this to an E.C. number for the reactions without the E.C. already.

Files used for this section can be found here: https://www.rhea-db.org/download

In [15]:
#load the data base that converts the kegg ID to rhea ID
rhea2kegg_df = pd.read_csv("../../databases/rhea2kegg_reaction.tsv", sep="\t")
rhea2kegg_df

Unnamed: 0,RHEA_ID,DIRECTION,MASTER_ID,ID
0,10003,BI,10000,R02938
1,10007,BI,10004,R04010
2,10011,BI,10008,R07180
3,10015,BI,10012,R07170
4,10019,BI,10016,R02381
...,...,...,...,...
6456,61319,BI,61316,R12130
6457,61343,BI,61340,R00923
6458,61439,BI,61436,R00122
6459,61503,BI,61500,R06536


In [16]:
#add rhea annotation to all reactions
unannotated_rct_rhea = []
for reaction in model.reactions:
    try:
        rct_kegg = reaction.annotation["kegg.reaction"]
    except KeyError:
        unannotated_rct_rhea.append(reaction)
        continue
    #find rhea ID for this compound
    try:
        rhea_ann = rhea2kegg_df.loc[rhea2kegg_df["ID"]== rct_kegg,"MASTER_ID"].values[0]
    except IndexError:
        unannotated_rct_rhea.append(reaction)
        continue
    reaction.annotation["rhea"] = str(rhea_ann)

len(unannotated_rct_rhea)
#quite a lot of unannotated, the large majority of these are the Exchange and transport reactions, as they have no Kegg ID.

545

now with the rhea annotation added, for the reactions in no_ec_rct, add the corresponding EC number from the Rhea2EC database provided on their website.

In [37]:
#load the dataframe
rhea2ec_df = pd.read_csv("../../databases/rhea2ec.tsv", sep="\t")
rhea2ec_df

Unnamed: 0,RHEA_ID,DIRECTION,MASTER_ID,ID
0,10000,UN,10000,3.5.1.50
1,10004,UN,10004,5.99.1.1
2,10008,UN,10008,1.11.1.15
3,10012,UN,10012,1.5.3.6
4,10016,UN,10016,3.1.1.49
...,...,...,...,...
6930,61392,UN,61392,2.7.1.172
6931,61396,UN,61396,2.7.1.172
6932,61500,UN,61500,2.4.1.115
6933,61504,UN,61504,2.4.1.115


In [65]:
#add the E.C. numbers based on the rhea IDs given
unannotated_rct_ec = []
for reaction in model.reactions:
    if reaction in no_ec_rct:
        try:
            rct_rhea = reaction.annotation["rhea"]
        except KeyError:
            unannotated_rct_ec.append(reaction)
            continue
        try:
            ec_ann_1 = rhea2ec_df.loc[rhea2ec_df["MASTER_ID"]== int(rct_rhea),"ID"].values[0]
        except IndexError:
            unannotated_rct_ec.append(reaction)
            continue
        try:
            ec_ann_2 =  rhea2ec_df.loc[rhea2ec_df["MASTER_ID"]== int(rct_rhea),"ID"].values[1]
        except IndexError:
            reaction.annotation["ec-code"] = [ec_ann_1]
            continue
        try: 
            reaction.annotation["ec-code"] = str([ec_ann_1, ec_ann_2])
        except IndexError:
            continue
    else: 
        continue

len(unannotated_rct_ec)

602

So we have now annotated a bit more than half of the reactions with an E.C. number. Not everything is completely annotated, but most reactions will have some annotation and otherwise additional information in notes that allows one to identify the reaction. Therefore I will not manually go through each and make sure they have correct Kegg, metaNetX and E.C. numbers. (also because for transport and exchange there are no numbers allocated, so there will always be a lrge fraction of unannotated numbers.

In [68]:
cobra.io.write_sbml_model(model,"../model/g-thermo.xml")