# Introduction
In this notebook; I want to create an output of the different types of reactions present in our model, which we can include in the paper. I will use the kegg classification of the different groups as the different reaction types.

I will go through all the different groups and tally the number of reactions in there. Then also tally the number of reactions not in a group, and use this to make a donut chart in R to visualize it. 

In [1]:
import cameo
import pandas as pd
import cobra.io

In [2]:
model = cobra.io.read_sbml_model('../model/p-thermo.xml')

### Mapping reactions to groups
It seems some new reactions were added/modified that weren't added to a group yet. I took this code from previosuly, so will reuse it. (from notebook 34) The code is slightly modified, to now map the E.C. code, not the KEGG idea so that we can cover more reactions into the groups they belong. 

I used a list in the KEGG API that links EC codes to pathways. it is found here: http://rest.kegg.jp/link/enzyme/pathway 

First I will import it and make it into a proper dataframe to allow searching of the KEGG ID to find the the pathway it is associated to. 

In [3]:
df = pd.read_csv('http://rest.kegg.jp/link/enzyme/pathway', header=None)

In [4]:
df.columns = ['map'] #rename the column

In [5]:
df[0:5]

Unnamed: 0,map
0,path:map00010\tec:1.1.1.1
1,path:ec00010\tec:1.1.1.1
2,path:map00010\tec:1.1.1.2
3,path:ec00010\tec:1.1.1.2
4,path:map00010\tec:1.1.1.27


In [6]:
# new data frame with split value columns 
new = df["map"].str.split("\t", n = 1, expand = True) 
  
# making separate first name column from new data frame 
df["Map"]= new[0] 
  
# making separate last name column from new data frame 
df["Reaction"]= new[1] 
  
# Dropping old Name columns 
df.drop(columns =["map"], inplace = True) 

In [7]:
df[0:5]

Unnamed: 0,Map,Reaction
0,path:map00010,ec:1.1.1.1
1,path:ec00010,ec:1.1.1.1
2,path:map00010,ec:1.1.1.2
3,path:ec00010,ec:1.1.1.2
4,path:map00010,ec:1.1.1.27


In [8]:
#remove all 'path:' and 'rn:'
df['Map'] = df['Map'].str.replace(r'path:map', '')

In [9]:
df['Reaction'] = df['Reaction'].str.replace(r'ec:', '')

In [10]:
df[0:5]

Unnamed: 0,Map,Reaction
0,00010,1.1.1.1
1,path:ec00010,1.1.1.1
2,00010,1.1.1.2
3,path:ec00010,1.1.1.2
4,00010,1.1.1.27


In [11]:
#remove the rows with 'path_map' to prevent duplication
df = df[~df['Map'].str.contains("path")]

In [12]:
#rename Map
df.columns = ['Subsystem','Reaction']

In [13]:
df[0:5]

Unnamed: 0,Subsystem,Reaction
0,10,1.1.1.1
2,10,1.1.1.2
4,10,1.1.1.27
6,10,1.1.2.7
8,10,1.1.2.8


This dataframe now will include all subsystems, even ones I didn't want to include in the model. Just to note for making the final mapping.


In [14]:
#need a list of all the group ids
group_IDs = []
for grp in model.groups:
    group_IDs.append(grp.id[0:5])

In [15]:
#also need a dataframe of all the groups & their name

In [16]:
df_groups = pd.read_csv('http://rest.kegg.jp/list/pathway', header=None, sep = '\t')

In [17]:
df_groups.columns = ['ID', 'Name']

In [18]:
df_groups['ID'] = df_groups['ID'].str.replace(r'path:map', '')

In [19]:
#now filter out the IDs I dont want to include
#i want to remove all rows below number 153
df_groups = df_groups[0:154]

In [20]:
for rct in model.reactions:
    if rct.id[:2] in 'EX': #exclude exchanges
        continue
    if rct.id[-1:] in 't': #ignore all the transports
        continue
    if rct.id[-4:] in 'tabc': 
        continue
    if rct.id[-4:] in 'tpts':
        continue
    else: 
        try:
            kegg = rct.annotation['ec-code']#lift the kegg reaction id
            if type(kegg) == list: #if there are more than one kegg stored as a list, take just the first
                kegg = kegg[0]
            elif type(kegg) == str: #if there are more than one kegg id stored as a single string, isolate the first like this
                kegg = rct.annotation['ec-code']
            found = df.loc[df["Reaction"] == kegg] #find which subsystems the reaction is in
            for index, react in found.iterrows(): #goes through each row and gives the pathway for the found reaction
                found_grp = react['Subsystem'] #name the subsystem
                if found_grp in group_IDs: #if this subsystem is in the model
                    try:
                        found_df = df_groups.loc[df_groups['ID']==found_grp]
                        group = found_df['ID'].values[0] + ' - ' + found_df['Name'].values[0] 
                        model.groups.get_by_id(group).add_members(rct) #add the reaction to that group
                    except KeyError:
                        continue
                else: #if not, then just ignore it
                    continue        
        except KeyError:
            continue


need to pass in a list



Just to check, I will do the same with the KEGG info stored in the annotations.

In [21]:
df = pd.read_csv('http://rest.kegg.jp/link/reaction/pathway', header=None)

In [22]:
df.columns = ['map'] #rename the column

In [23]:
df[0:5]

Unnamed: 0,map
0,path:map00010\trn:R00014
1,path:rn00010\trn:R00014
2,path:map00010\trn:R00199
3,path:rn00010\trn:R00199
4,path:map00010\trn:R00200


In [24]:
# new data frame with split value columns 
new = df["map"].str.split("\t", n = 1, expand = True) 
  
# making separate first name column from new data frame 
df["Map"]= new[0] 
  
# making separate last name column from new data frame 
df["Reaction"]= new[1] 
  
# Dropping old Name columns 
df.drop(columns =["map"], inplace = True) 

In [25]:
df[0:5]

Unnamed: 0,Map,Reaction
0,path:map00010,rn:R00014
1,path:rn00010,rn:R00014
2,path:map00010,rn:R00199
3,path:rn00010,rn:R00199
4,path:map00010,rn:R00200


In [26]:
#remove all 'path:' and 'rn:'
df['Map'] = df['Map'].str.replace(r'path:map', '')

In [27]:
df['Reaction'] = df['Reaction'].str.replace(r'ec:', '')

In [28]:
df[0:5]

Unnamed: 0,Map,Reaction
0,00010,rn:R00014
1,path:rn00010,rn:R00014
2,00010,rn:R00199
3,path:rn00010,rn:R00199
4,00010,rn:R00200


In [29]:
#remove the rows with 'path_map' to prevent duplication
df = df[~df['Map'].str.contains("path")]

In [30]:
#rename Map
df.columns = ['Subsystem','Reaction']

In [31]:
df[0:5]

Unnamed: 0,Subsystem,Reaction
0,10,rn:R00014
2,10,rn:R00199
4,10,rn:R00200
6,10,rn:R00206
8,10,rn:R00229


This dataframe now will include all subsystems, even ones I didn't want to include in the model. Just to note for making the final mapping.


In [38]:
for rct in model.reactions:
    if rct.id[:2] in 'EX': #exclude exchanges
        continue
    if rct.id[-1:] in 't': #ignore all the transports
        continue
    if rct.id[-4:] in 'tabc': 
        continue
    if rct.id[-4:] in 'tpts':
        continue
    else: 
        try:
            kegg = rct.annotation['kegg.reaction']#lift the kegg reaction id
            if type(kegg) == list: #if there are more than one kegg stored as a list, take just the first
                kegg = kegg[0]
            elif type(kegg) == str: #if there are more than one kegg id stored as a single string, isolate the first like this
                kegg = kegg[0:6]
            found = df.loc[df["Reaction"] == kegg] #find which subsystems the reaction is in
            for index, react in found.iterrows(): #goes through each row and gives the pathway for the found reaction
                found_grp = react['Subsystem'] #name the subsystem
                if found_grp in group_IDs: #if this subsystem is in the model
                    try:
                        found_df = df_groups.loc[df_groups['ID']==found_grp]
                        group = found_df['ID'].values[0] + ' - ' + found_df['Name'].values[0] 
                        model.groups.get_by_id(group).add_members(rct) #add the reaction to that group
                    except KeyError:
                        continue
                else: #if not, then just ignore it
                    continue        
        except KeyError:
            continue

Now that i have checked all reactions possible are in a group, I can tally what kind of reaction they fall into.

In [39]:
names =[]
rcts = []
for grps in model.groups:
    if grps.id[0:1] in '0':
        names.append(grps.id[8:])
        rcts.append(len(grps.members))
    else: 
        names.append(grps.id)
        rcts.append(len(grps.members))
df_grps = pd.DataFrame({'Class': names, 'Number of reactions': rcts} )

Now I need to just add a row with the information of reactions not present in a group. First, I will make a list of all reactions present in a group. Then search all reactions in the model through this list to check the number which are not associated to a group.

In [40]:
reactions = []
for grps in model.groups:
    for rcts in grps.members:
        reactions.append(rcts.id)

In [41]:
no_grp = []
for rct in model.reactions:
    if rct.id in reactions:
        continue
    else:
        no_grp.append(rct.id)

In [42]:
len(no_grp)

269

I see that some reactions aren't in the group they should be. So I will fix those and then run the above code again. 

In [43]:
#make sure all exchange reactions are in the right group
for rct in model.reactions:
    if rct.id in reactions:
        continue
    else:
        if rct.id[:2] in 'EX':
            model.groups.get_by_id('00000 - Exchange reactions').add_members(rct)


need to pass in a list



In [44]:
#make sure all are in the transport they should be
for rct in model.reactions:
    if rct.id in reactions:
        continue
    else:
        if rct.id[-1:] in 't':
            model.groups.get_by_id('02000 - Passive transport').add_members(rct)
        if rct.id[-2:] in 't2':
            model.groups.get_by_id('02000 - Passive transport').add_members(rct)
        elif rct.id[-4:] in 'tabc':
            model.groups.get_by_id('02010 - ABC transporters').add_members(rct)

In [45]:
#make sure all lipid biosynthesis are in the right category too
for rct in model.reactions:
    if rct.id not in reactions:
        if rct.id[:4] in ['3HAD', '3HMA']:
            model.groups.get_by_id('00061 - Fatty acid biosynthesis').add_members(rct)
        if rct.id[:3] in ['EAR','EMA']:
            model.groups.get_by_id('00061 - Fatty acid biosynthesis').add_members(rct)
        if rct.id[:2] in ['3O', 'E3', '3H']:
            model.groups.get_by_id('00061 - Fatty acid biosynthesis').add_members(rct)
    else:
        continue

Now I can re-run the above code to get new, up to date values and see how many reactions are then undefined still.

In [46]:
names =[]
rcts = []
for grps in model.groups:
    if grps.id[0:1] in '0':
        names.append(grps.id[8:])
        rcts.append(len(grps.members))
    else: 
        names.append(grps.id)
        rcts.append(len(grps.members))
df_grps = pd.DataFrame({'Class': names, 'Number of reactions': rcts} )

Now I need to just add a row with the information of reactions not present in a group. First, I will make a list of all reactions present in a group. Then search all reactions in the model through this list to check the number which are not associated to a group.

In [47]:
reactions = []
for grps in model.groups:
    for rcts in grps.members:
        reactions.append(rcts.id)

In [48]:
no_grp = []
for rct in model.reactions:
    if rct.id in reactions:
        continue
    else:
        no_grp.append(rct.id)

In [49]:
len(no_grp)

70

## Summarize
so now, i will add a last row to the df_grps data frame that contains the uncategorized reactions. 

In [54]:
df_grps = df_grps.append({'Class':'Other','Number of reactions': 70}, ignore_index=True)

In [56]:
#now export the table to a csv file
df_grps.to_csv('../databases/Reaction types.csv')

In [58]:
#save&commit to model
cobra.io.write_sbml_model(model,'../model/p-thermo.xml')