# Introduction
In notebook 33 we saw that there was information about the pathways that reactions play a role in stored in the notes field. This has dissappeared in the new wrapper form. This information was initially incomplete, but it could still be usefull information too include in the model. Therefore I will add reaction groups that represent the subsystem/pathway that  reaction falls under. 

This information is based and stored in KEGG; so ideally we find a way to lift it from there. First I will inspect the groups and notes about groups we have. Then I will try to put the reactions that have the subsystems mentioned in the notes to those groups.

Finally, for ungrouped reactions I will then try to find a way to find what pathway they belong in and then fit as many reactions in correct groups as possible.

In [1]:
import cobra.io

In [2]:
import pandas as pd

In [3]:
import cameo

In [4]:
from cobra.core import Group

In [5]:
model = cobra.io.read_sbml_model('../model/p-thermo.xml')

In [None]:
model_e_coli = cameo.load_model('iML1515')

# Group clean up
First, our model already contains some groups. I will check which they are and also do some 'clean up' so that they are tidy before we start creating new groups and assigning reactions to them.

In [None]:
for group in model.groups:
    print(group)

In [None]:
for member in model.groups.get_by_id("S_").members[0:10]:
    print(member)

It seems that none of the groups in our model are particularly useful. Each group is quite incomplete, or has quite 'useless' information, e.g.'X'. 
S_ seems to contain some of the transport reactions, which would be a useful group to include as these do not have KEGG IDs anyway. Exchange reactions are already included in the model.exchanges field but to later ensure that each reaction is in a group, I will add them in a seperate group. 

For clean up, I will remove all the groups that are persent. I will then later recategorize the reactions into groups. 

In [None]:
groups = []
for group in model.groups:
    groups.append(group)

In [None]:
model.remove_groups(groups)

In [None]:
model.groups

In [None]:
#save&commit
cobra.io.write_sbml_model(model,'../model/p-thermo.xml')

Now that we have removed all groups, I can start re-adding them again. First I will make a group for transport reactions, as these require a seperate code and will be grouped differently. 

## Transport reactions
The transport reactions in the model are named according to the type of transport they reflect.
- 't' suffix: just a permease or diffusion
    - use code 02000 'Passive transport'. This is not an actual KEGG compliant code, but will be used to group these reactions. It doesn't overlap with any other code so should be fine.
- 'tpts' suffix: PTS system
    - use 02060 'Phosphotransferase system (PTS)'
- 'tabc' suffix: ABC system
    - use code 02010 'ABC transporters' 


In [None]:
#first create the three groups

In [None]:
model.add_groups(Group('02000',name='Passive transport'))

In [None]:
model.add_groups(Group('02060',name='Phosphotransferase system (PTS)'))

In [None]:
model.add_groups(Group('02010',name='ABC transporters'))

In [None]:
#add each reaction to the appropriate group
for rct in model.reactions:
    if rct.id[-1:] in 't': #all the passive transports
        model.groups.get_by_id('02000').add_members(rct)
    elif rct.id[-4:] in 'tabc':
        model.groups.get_by_id('02010').add_members(rct)
    elif rct.id[-4:] in 'tpts':
        model.groups.get_by_id('02060').add_members(rct)
    else: 
        continue

In [None]:
#one reaction named a bit differently
model.groups.get_by_id('02000').add_members(model.reactions.Kt2)

In [None]:
#save&commit
cobra.io.write_sbml_model(model,'../model/p-thermo.xml')

## Exchange reactions
The exchange reactions are already captured in the model.exchanges, but to make sure that each reaction is associated to a group I will add the 00000 group, as exchange reactions. 

In [None]:
model.add_groups(Group('00000',name='Exchange reactions'))

In [None]:
for rct in model.reactions:
    if rct.id[:2] in 'EX':
        model.groups.get_by_id('00000').add_members(rct)
    else: 
        continue

In [None]:
#save&commit
cobra.io.write_sbml_model(model,'../model/p-thermo.xml')

# Metabolic reactions
For the metabolic reactions, I will first make a list of the possible reaction groups and the associated names to generate all the groups we would want included in the model.

For example: the 1.0 Global and overview maps contain almost all reactions and are not very informative so will be excluded. 

So I will only include the groups that are in the 'Metabolism' section, as this seems most relevant. I will use the list of groups from http://rest.kegg.jp/list/pathway, exclude which groups I want to and use this to make all the groups. At the end I will check and remove which groups have no reactions added.

After that, I will make a dataframe linking kegg IDs to the reaction groups I've added. 


In [6]:
df_groups = pd.read_csv('http://rest.kegg.jp/list/pathway', header=None, sep = '\t')

In [7]:
df_groups.columns = ['ID', 'Name']

In [8]:
df_groups['ID'] = df_groups['ID'].str.replace(r'path:map', '')

In [9]:
#now filter out the IDs I dont want to include
#i want to remove all rows below number 153
df_groups = df_groups[0:154]

In [10]:
#convert df to dictionary
groups_dict = dict(zip(df_groups.ID, df_groups.Name))

In [None]:
#now add all these groups to the model
for grp in groups_dict:
    grp_name = df_groups.loc[df_groups["ID"] == grp,"Name"].values[0]
    model.add_groups(Group(grp,name=grp_name))

In [None]:
#save & commit
cobra.io.write_sbml_model(model,'../model/p-thermo.xml')

### Mapping reactions to groups
Now that the groups are added, one should map the reactions to these groups by using their Kegg IDs. 

I found a list in the KEGG API that links kegg IDs to pathways. it is found here: http://rest.kegg.jp/link/reaction/pathway

First I will import it and make it into a proper dataframe to allow searching of the KEGG ID to find the the pathway it is associated to. 

In [52]:
model = cobra.io.read_sbml_model('../model/p-thermo.xml')

In [7]:
df = pd.read_csv('http://rest.kegg.jp/link/reaction/pathway', header=None)

In [8]:
df.columns = ['map'] #rename the column

In [9]:
df[0:5]

Unnamed: 0,map
0,path:map00010\trn:R00014
1,path:rn00010\trn:R00014
2,path:map00010\trn:R00200
3,path:rn00010\trn:R00200
4,path:map00010\trn:R00229


In [10]:
# new data frame with split value columns 
new = df["map"].str.split("\t", n = 1, expand = True) 
  
# making separate first name column from new data frame 
df["Map"]= new[0] 
  
# making separate last name column from new data frame 
df["Reaction"]= new[1] 
  
# Dropping old Name columns 
df.drop(columns =["map"], inplace = True) 

In [11]:
df[0:5]

Unnamed: 0,Map,Reaction
0,path:map00010,rn:R00014
1,path:rn00010,rn:R00014
2,path:map00010,rn:R00200
3,path:rn00010,rn:R00200
4,path:map00010,rn:R00229


In [12]:
#remove all 'path:' and 'rn:'
df['Map'] = df['Map'].str.replace(r'path:rn', '')

In [13]:
df['Reaction'] = df['Reaction'].str.replace(r'rn:', '')

In [14]:
df[0:5]

Unnamed: 0,Map,Reaction
0,path:map00010,R00014
1,00010,R00014
2,path:map00010,R00200
3,00010,R00200
4,path:map00010,R00229


In [15]:
#remove the rows with 'path_map' to prevent duplication
df = df[~df['Map'].str.contains("map")]

In [16]:
#rename Map
df.columns = ['Subsystem','Reaction']

In [17]:
df[0:5]

Unnamed: 0,Subsystem,Reaction
1,10,R00014
3,10,R00200
5,10,R00229
7,10,R00235
9,10,R00341


This dataframe now will include all subsystems, even ones I didn't want to include in the model. Just to note for making the final mapping.

NOTE: first here I will need to make sure all KEGG IDs are moved from notes to annotations to simplify the code. I will run the part of the code from notebook 33 that coverts the reaction kegg info to the annotation and commit this before moving forward.

In [18]:
#need a list of all the group ids
group_IDs = []
for grp in model.groups:
    group_IDs.append(grp.id)

In [19]:
for rct in model.reactions:
    if rct.id[:2] in 'EX': #exclude exchanges
        continue
    if rct.id[-1:] in 't': #ignore all the transports
        continue
    if rct.id[-4:] in 'tabc': 
        continue
    if rct.id[-4:] in 'tpts':
        continue
    else: 
        try:
            kegg = rct.annotation['kegg.reaction']#lift the kegg reaction id
            if type(kegg) == list: #if there are more than one kegg stored as a list, take just the first
                kegg = kegg[0]
            elif type(kegg) == str: #if there are more than one kegg id stored as a single string, isolate the first like this
                kegg = kegg[0:6]
            found = df.loc[df["Reaction"] == kegg] #find which subsystems the reaction is in
            for index, react in found.iterrows(): #goes through each row and gives the pathway for the found reaction
                found_grp = react['Subsystem'] #name the subsystem
                if found_grp in group_IDs: #if this subsystem is in the model
                    model.groups.get_by_id(found_grp).add_members(rct) #add the reaction to that subsystem
                else: #if not, then just ignore it
                    continue        
        except KeyError:
            print (rct.id)


need to pass in a list



biomass
GTBIHY
TURAHY
BGAL
TAG1PK
ATPS4r
Kt2
CYTBO3


There are 8 reactions that did not map automatically to a group. I will check them by hand now.

Biomass: will be added to its own group 'Biomass'.

For the reactions without a Kegg ID, I will add a group called 'other' to which we can add those.


In [20]:
model.add_groups(Group(id = 'Biomass', name ='Biomass reaction' ))


need to pass in a list



In [21]:
model.groups.get_by_id('Biomass').add_members(model.reactions.biomass)

In [22]:
model.add_groups(Group(id = 'Other', name ='Other reactions, without KEGG' ))

In [23]:
model.groups.Other.add_members(model.reactions.GTBIHY)

In [24]:
model.groups.Other.add_members(model.reactions.TURAHY)

In [25]:
model.groups.Other.add_members(model.reactions.BGAL)

In [26]:
model.groups.Other.add_members(model.reactions.TAG1PK)

In [27]:
model.groups.Other.add_members(model.reactions.ATPS4r)

In [28]:
#one reaction named a bit differently
model.groups.get_by_id('02000').add_members(model.reactions.Kt2)

In [29]:
model.groups.Other.add_members(model.reactions.CYTBO3)

In [30]:
#save & commit after checking the individual reactions
cobra.io.write_sbml_model(model,'../model/p-thermo.xml')

I will now just check to ensure that all reactions are mapped to a group.

I cant seem to get the group for each singluar reaction, so I will go about it a different way. I will make a set of all reactions that are in a group. And then from there deduce what reactions are not in a group. 

In [40]:
rct_w_grp = []
for grp in model.groups:
    for member in grp.members:
        rct_w_grp.append(member.id)
    set(rct_w_grp)

In [41]:
len(set(rct_w_grp))

816

In [44]:
len(model.reactions)

907

There are about 80 reactions that are not in a group currently. I will inspect some, and decide what to do with them. Possibly i will add them all to the 'other' group. As they may otherwise belong to too obscure groups that become meaningless.

In [53]:
rct_wo_grp = []
for rct in model.reactions:
    if rct.id in rct_w_grp:
        continue
    else:
        rct_wo_grp.append(rct)

In [54]:
#add all to other group
for rct in rct_wo_grp:
    model.groups.Other.add_members(rct)

In [55]:
#save&commit
cobra.io.write_sbml_model(model,'../model/p-thermo.xml')

Finally, I will just check how many groups have no reactions associated to them. 

In [8]:
group_wo_members = []
for grp in model.groups:
    if len(grp.members) == 0:
        group_wo_members.append(grp)
    else: 
        continue

In [11]:
len(group_wo_members)/len(model.groups)

0.475

So about half of our groups don't have members associated to them. Most seem to be associated to secondary metabolites, which was to be expected. I could remove them from the model, but I can also leave them so that when necessary and reactions are added they can be placed in the appropriate groups.