# Introduction
In the previous notebooks we have modified the metabolite IDs, annotations and ensured that most metabolites have some mass associated to it. Now we will start to tackle the reactions, to make them more insightfull.

In this notebook we will change the reaction IDs to make them easier to work with and allow improved further curation. 
To do so, first we need to copy the notes that Matteo has assigned in his version of the model to this model. Then when the reactions contain a Kegg ID, we can use this to rename them in a more insightfull way. (This process is similar to what was done for the metabolites.) 

In [1]:
import cobra
import pandas as pd
import cameo

In [2]:
model = cobra.io.read_sbml_model("../../model/g-thermo.xml")

In [5]:
matteo_model = cobra.io.read_sbml_model("../../databases/g-thermo-Matteo.xml")

## Copying the notes from Matteo model

We can now try to copy the relevant notes from the matteo model to the original model. 

In [6]:
#copying the desired notes from the matteo model
no_notes_KEGG = []
no_notes_ec = []
no_notes_rct = []
for rct in model.reactions:
    id = rct.id
    if rct in matteo_model.reactions:
        rct_matteo = matteo_model.reactions.get_by_id(id)
        try:
            rct.notes["KEGG ID"] = rct_matteo.notes["ID"]
        except KeyError:
            no_notes_KEGG.append(rct)
        try:
            rct.notes["ENZYME"] = rct_matteo.notes["ENZYME"]
        except KeyError:
            no_notes_ec.append(rct)
        try:
            rct.notes["NAME"] = rct_matteo.notes["NAME"]
        except KeyError:
            continue
        try:
            rct.notes["DEFINITION"] = rct_matteo.notes["DEFINITION"]
        except KeyError:
            if rct in no_notes_rct:
                continue
            else:
                no_notes_rct.append(rct)
    else:
        no_notes_rct.append(rct)
print(len(no_notes_KEGG))
print(len(no_notes_ec))
print(len(no_notes_rct))

280
290
87


In [7]:
no_notes_rct

[<Reaction EX_179_e at 0x148af5c0e88>,
 <Reaction EX_82_e at 0x148af5da408>,
 <Reaction EX_394_e at 0x148af5f0588>,
 <Reaction EX_991_e at 0x148af5f0dc8>,
 <Reaction EX_2328_e at 0x148af5f0fc8>,
 <Reaction EX_1231_e at 0x148af605288>,
 <Reaction EX_1039_e at 0x148af605c88>,
 <Reaction EX_1243_e at 0x148af60a508>,
 <Reaction EX_1014_e at 0x148af614208>,
 <Reaction EX_813_e at 0x148af6190c8>,
 <Reaction EX_299_e at 0x148af61f548>,
 <Reaction 117 at 0x148af704448>,
 <Reaction 211 at 0x148af7a7b08>,
 <Reaction 323 at 0x148af881c48>,
 <Reaction 349 at 0x148af8b4848>,
 <Reaction 366 at 0x148af8d5cc8>,
 <Reaction 372 at 0x148af8df188>,
 <Reaction 430 at 0x148af938848>,
 <Reaction 436 at 0x148af93d6c8>,
 <Reaction 507 at 0x148af9aabc8>,
 <Reaction 544 at 0x148af9e2e48>,
 <Reaction 644 at 0x148afa6c4c8>,
 <Reaction 645 at 0x148afa6cbc8>,
 <Reaction 654 at 0x148afa77b08>,
 <Reaction 730 at 0x148afacd708>,
 <Reaction 831 at 0x148ac096948>,
 <Reaction 838 at 0x148af5f9d48>,
 <Reaction 839 at 0x148

The only reactions without the KEGG Id are the exchange and transport reactions, which makes sense. For these we will write another script later anyway to make them more meaningful. For the vast majority of the reactions this is also the case for the ec number. 

There are still 89 reactions without any notes (as they are either unique in beata's original version or they dont have notes in matteo's version).  
For these metabolites, we will need to do some manual modification of their ID's later anyway, so it's oke to leave them for now. 

In [8]:
#save & commit
cobra.io.write_sbml_model(model,"../../model/g-thermo.xml")

## Automatic ID modification

Once the reactions have their notes associated, we can modify the reaction IDs automatically  by mapping a MetaNetX ID to each reaction according to the assigned KEGG reaction. Then from there, we can find a BiGG ID to assign to those reactions. To do so, we have downloaded the MetaNetX database and will use that. This can be found here: https://www.metanetx.org/mnxdoc/mnxref.html

Then we can inspect the reactions that remain unnamed to see why and how to re-name them more intuitively in another approach.

In [9]:
cobra.io.read_sbml_model("../../model/g-thermo.xml")

0,1
Name,GTModelBeata2015
Memory address,0x0148b251ab88
Number of metabolites,851
Number of reactions,1202
Number of groups,15
Objective expression,1.0*M_biomass - 1.0*M_biomass_reverse_e7414
Compartments,"c, e"


In [4]:
# Load databae of reaction IDs, taken from MetanetX
rct_df = pd.read_csv("../databases/reac_xref.tsv", sep="\t", skiprows=385)
#NOTE: an extra colum head was added to the file on line 386 called 'note' to fix the headings being aligned further

In [5]:
#try to change reacion IDs to match BiGG IDs
unmatched_rct = []
for rct in model.reactions:
    # construct string that matches reaction ID to IDs in database
    try:
        rct_id = "kegg:"+ rct.notes["KEGG ID"]
    except KeyError:
        unmatched_rct.append(rct)
        continue
    # try to find metanetx id for this reaction
    try:
        rct_new_id = rct_df.loc[rct_df["#XREF"] == rct_id,"MNX_ID"].values[0]
    except IndexError:
        unmatched_rct.append(rct)
        continue
    # find all entries that have the same metanetx id
    matched_compounds = rct_df[rct_df['MNX_ID'] == rct_new_id]
    # find the shortest BiGG id that correspond to our MetaNetX id
    try:
        # Look for BiGG ID, if it fails put it in the unmatched_rct list
        new_id = (
            matched_compounds[matched_compounds["#XREF"].str.startswith("bigg:")]["#XREF"]
            .str.replace("bigg:", "").sort_values(ascending=True).values[0]
              )
    except IndexError:
        unmatched_rct.append(rct)
        continue
        # overwrite model id with matched id
    try:
        if "-" in new_id:
            new_id = new_id.replace("-","__")
        rct.id = new_id
    except ValueError:
        unmatched_rct.append(rct)
        continue

In [6]:
len(unmatched_rct)

579

In [9]:
unmatched_rct

[<Reaction EX_1819_e at 0x1978d9f3dc8>,
 <Reaction EX_499_e at 0x1978d9f3d88>,
 <Reaction EX_23_e at 0x1978d9fb148>,
 <Reaction EX_43_e at 0x1978d9fb348>,
 <Reaction EX_1864_e at 0x1978d9fb5c8>,
 <Reaction EX_85_e at 0x1978d9fb708>,
 <Reaction EX_140_e at 0x1978d9fb948>,
 <Reaction EX_408_e at 0x1978d9fbbc8>,
 <Reaction EX_14_e at 0x1978d9fbe08>,
 <Reaction EX_29_e at 0x1978da03208>,
 <Reaction EX_280_e at 0x1978da035c8>,
 <Reaction EX_1754_e at 0x1978da03888>,
 <Reaction EX_1755_e at 0x1978da03a88>,
 <Reaction EX_1_e at 0x1978da03cc8>,
 <Reaction EX_7_e at 0x1978da03f48>,
 <Reaction EX_11_e at 0x1978da06188>,
 <Reaction EX_71_e at 0x1978da06388>,
 <Reaction EX_209_e at 0x1978da06648>,
 <Reaction EX_179_e at 0x1978da06888>,
 <Reaction EX_37_e at 0x1978da06ac8>,
 <Reaction EX_55_e at 0x1978da06d88>,
 <Reaction EX_45_e at 0x1978da0b148>,
 <Reaction EX_357_e at 0x1978da0b488>,
 <Reaction EX_452_e at 0x1978da0b788>,
 <Reaction EX_333_e at 0x1978da0ba08>,
 <Reaction EX_34_e at 0x1978da0bc48

In [8]:
#save & commit
cobra.io.write_sbml_model(model,"../../model/g-thermo.xml")

about half of the reactions have now modified ID's. If we inspect the rest of the unmatched reactions, we can see that many are the exchange and transport reactions.

## Renaming Transport and Exchange Reactions
We would want the transport and exchange reactions to in their name have the metabolite ID they are referring to, together with a logical way to distinguish them as transport and exchange. Later we will revisit these reactions ayway to see if they are all correct or not, so this is not done here.

To fix the exchange reactions, we will first remove them. Then for each extracellular metabolite we will add the transport reaction. Then finally we can add the exchange reaction for the extracellular metabolite.

In [5]:
model = cobra.io.read_sbml_model("../../model/g-thermo.xml")

In [7]:
#gather all exchanges
remove_ex = model.exchanges
len(remove_ex)

179

In [8]:
#remove those reactions
model.remove_reactions(remove_ex)

In [9]:
#check there are no exchanges left
model.exchanges

There are no boundary reactions in this model. Therefore specific types of boundary reactions such as 'exchanges', 'demands' or 'sinks' cannot be identified.


[]

In [10]:
#slightly rename the keys in the compartments to make further work easier.
model.compartments = {'c': 'cytosol', 'e': 'extracellular space'}

In [11]:
model.compartments

{'c': 'cytosol', 'e': 'extracellular space'}

In [12]:
#Which metabolties are now extracellular
extracell_mets = [met for met in model.metabolites if met.compartment == "e"]

In [13]:
len(extracell_mets)

169

In [14]:
#now in one go, we will rename the transport reaction as it is the only reaction of all extracellular metabolites left now
#then we will add the exchange reaction for each extracellular metabolite
strange_extracell_met = []
for met in model.metabolites:
    if met in extracell_mets:
        # Rename transport
        try:
            transport_reaction = list(met.reactions)[0]
            transport_reaction.id = f"T_{met.id}"
            transport_reaction.name = f"Transport of metabolite {met.id}"
        except:
            strange_extracell_met.append(met)
            print(met.id)
    # Add exchange
        model.add_boundary(met, type = "exchange")
    else: continue

f1p_e
adhlam_e
cit_e
fe3_e
pnto__R_e
glc__aD_e
M_6pgl_e
urea_e
adocbl_e
spmd_e
M_2hmcnsad_e
uaccg_e
pppi_e
ppap_e
fprica_e
M_3hmoa_e
uppg1_e
dad_5_e


In [15]:
cobra.io.write_sbml_model(model,"../../model/g-thermo.xml")

So, there are 18 metabolites that have intracellular reactions, but no transport. In the current state they will not interfere perse with the model, but to tidy up unneccesary components, they will be removed.

In [4]:
model = cobra.io.read_sbml_model("../../model/g-thermo.xml")

In [6]:
remove_mets = ["f1p_e","adhlam_e", "cit_e", "fe3_e", "pnto__R_e", "glc__aD_e", "M_6pgl_e", "urea_e", "adocbl_e", "spmd_e",
               "M_2hmcnsad_e", "uaccg_e", "pppi_e", "ppap_e", "fprica_e", "M_3hmoa_e", "uppg1_e","dad_5_e"]

In [7]:
remove_metabolites = [model.metabolites.get_by_id(mid) for mid in remove_mets]
model.remove_metabolites(remove_metabolites)

We should also remove the exchange reactions associated to these metabolites.

In [8]:
exch = []
for met in remove_mets:
    exchange = f"EX_{met}"
    exch.append(exchange)
remove_exch = [model.reactions.get_by_id(mid) for mid in exch]
model.remove_reactions(remove_exch)

In [9]:
cobra.io.write_sbml_model(model, "../../model/g-thermo.xml")

## Manually modifying left over reaction IDs
Now, we should make sure that the reactions that were not automatically re-named are done so manually. For this, we first need to make a list of all unmatched reactions and exclude the transport and exchange reactions. Then this list can be exported to excel, and we can manually add names to the file and re-import them here.

In [9]:
#which reactions are left with obscure names
unmatched_rct_left = []
for rct in model.reactions:
    if rct in unmatched_rct:
        if rct in model.exchanges:
            continue
        elif rct.id[0] == 'T' :
            continue
        else: 
            unmatched_rct_left.append(rct)
    else: continue

In [10]:
len(unmatched_rct_left)

277

There are still 277 reactions that are not named properly. For this we will have to manually fix them one by one. 

To do so, we assemble some lists: one of the rct.id, one of the rct.name (which is a kegg ID), one of the name in the notes. from there we can export these as a dataframe and then manually add a column of new reaction IDs.

In [11]:
#list of reaction Ids.
unmatched_rct_left_ID =[]
for rct in model.reactions:
    if rct in unmatched_rct_left:
        unmatched_rct_left_ID.append(rct.id)
    else: continue
len(unmatched_rct_left_ID)

277

In [12]:
#list of reaction names.
unmatched_rct_left_name =[]
for rct in model.reactions:
    if rct in unmatched_rct_left:
        unmatched_rct_left_name.append(rct.name)
    else: continue
len(unmatched_rct_left_name)

277

In [13]:
#list of reaction name notes.
unmatched_rct_left_note =[]
for rct in model.reactions:
    if rct in unmatched_rct_left:
        try: 
            unmatched_rct_left_note.append(rct.notes["NAME"])
        except KeyError:
            unmatched_rct_left_note.append('-')
    else: continue
len(unmatched_rct_left_note)

277

In [14]:
#export to a csv file to manipulate in excel
df_rct = pd.DataFrame({'ID' : unmatched_rct_left_ID, 'Name_KEGG' : unmatched_rct_left_name, 'Name_Notes':unmatched_rct_left_note})    
df_rct.to_csv('../databases/Reaction IDs.csv')

In [22]:
#import csv file with the new reaction names
#Now sort the exel file and import into python, to change the ID's for the metabolites changed by hand. 
new_rct_id = pd.read_csv("../../databases/Reaction IDs_new.csv", sep = ';')
new_rct_id

Unnamed: 0.1,Unnamed: 0,ID,Name_KEGG,Name_Notes,New ID
0,276,M_biomass,biomass,-,biomass
1,0,2,R00006,pyruvate:pyruvate acetaldehydetransferase (dec...,PYRACTT
2,1,9,R00028,-,MALHYDRO
3,2,15,R00081,Ferrocytochrome-c:oxygen oxidoreductase,FOCYCTCOR
4,3,16,R00082,Ferrocytochrome c2:oxygen oxidoreductase,FOCYTCCOR
...,...,...,...,...,...
259,274,2116,R10147,L-aspartate-4-semialdehyde hydro-lyase [adding...,ASPSALY
260,273,2115,R10305,O3-acetyl-L-serine:L-homocysteine S-(2-amino-2...,ACSERT
261,167,1059,R10404,-,SAM
262,272,2114,R10619,aldose 1-epimerase,GALISO


In [23]:
#rename the so far unnamed metabolites
matched_rct = []
for rct in model.reactions:
    found = new_rct_id[new_rct_id["Name_KEGG"]==rct.name]
    if found["New ID"].empty or found["New ID"].isna().values[0]: #NaN is stored as empty in the data frame
        continue
    elif found["New ID"].empty == False:    
        try: 
            rct.id = found["New ID"].values[0]
            matched_rct.append(rct)
        except: 
            print (rct.id, "non unique")
len(matched_rct)

264

Now we've named 264 of the reactions with other IDs.However, there were originally 277 reactions not named. So ill try to find the 13 missing now. 

In [24]:
unnamed_rct =[]
for rct in unmatched_rct_left:
    if rct in matched_rct:
        continue
    else: 
        unnamed_rct.append(rct.id)
len(unnamed_rct)
        

13

In [25]:
unnamed_rct

['729',
 '730',
 '737',
 '751',
 '752',
 '1504',
 '1548',
 '1551',
 '1552',
 '1553',
 '1554',
 '1555',
 '1905']

For some reason these were not included in the excel file I made... so now I will just chage their ID manually.

In [26]:
model.reactions.get_by_id("729").id = "MENAOR"

In [27]:
model.reactions.get_by_id("730").id = "PANTPTRANS"

In [28]:
model.reactions.get_by_id("737").id = "HISTIOR"

In [29]:
model.reactions.get_by_id("751").id = "ALACPH"

In [30]:
model.reactions.get_by_id("752").id = "DHMBISO"

In [31]:
model.reactions.get_by_id("1504").id = "S7PTRANSPP"

In [32]:
model.reactions.get_by_id("1548").id = "H2MB4POR"

In [33]:
model.reactions.get_by_id("1551").id = "AMETTRANS"

In [34]:
model.reactions.get_by_id("1552").id = "AHCYSMETTRANS"

In [35]:
model.reactions.get_by_id("1553").id = "SBZCOAHYDRO"

In [36]:
model.reactions.get_by_id("1554").id = "ADOCBITRANS"

In [37]:
model.reactions.get_by_id("1555").id = "DECDPPP"

In [38]:
model.reactions.get_by_id("1905").id = "CCTTPENPP"

Now that each reaction has a more insightful and easy to work with name, the model can be saved, and in the next notebook we can add all the annotations needed.

In [40]:
cobra.io.write_sbml_model(model,"../../model/g-thermo.xml")