# Recon 3D CHO GPRs
## Description
This notebook describes the process by which we can mine information from Recon3D to add it to our reconstruction in CHO cells. First we map all the CHO GPRs corresponding to the Recon3D Human GPRs and a dataset containing all the reactions from Recon3D with their corresponding CHO GPRs is generated (1), then a dataset containing reactions from Recon3D that are not present in our reconstruction, along with annotated CHO GPRs, is generated in order to add it to our reconstruction (2).

In [None]:
import pandas as pd
import numpy as np
from tqdm.notebook import tqdm

import cobra
from cobra.io.mat import load_matlab_model

import gspread

### 1. Finding CHO orthologs for Human GPRs in Recon3D
Using the dataset generated in the **Orthologs** notebook, we can map all the CHO genes in the recon 3D dataset GPR column and generate a new column called "CHO GPR"

In [None]:
#Generate recon3d df from the recon3d supplementary data
df1 = pd.read_excel('../../Data/GPR_Curation/recon3d_gprs.xlsx')
df1

In [None]:
#Generate another df from the recon3d model
recon3d_model = load_matlab_model('../../Data/GPR_Curation/Recon3D_301.mat')
recon3d_model

In [None]:
#Generate another df from the recon3d model
attributes = []
for reaction in tqdm(recon3d_model.reactions):
    attributes.append([reaction.id, reaction.name, reaction.reaction, reaction.gpr, 
                       reaction.subsystem, reaction.lower_bound, reaction.upper_bound])

df2 = pd.DataFrame(data=attributes, columns=['m_reaction', 'Reaction Name', 'm_metabolites', 'm_gene_reaction_rule', 'm_subsystem', 'Lower bound', 'Upper bound'])
df2['m_reaction'] = df2['m_reaction'].str.replace("[", "_")
df2['m_reaction'] = df2['m_reaction'].str.replace("]", "")
df2

In [None]:
# Unify df1 and df2 into one dataset containing all the necesary information
recon3d = pd.concat([df2, df1])
recon3d = recon3d.groupby('m_reaction').first()
recon3d = recon3d.reset_index(drop = False)
recon3d['m_reaction'] = recon3d['m_reaction'].str.replace('_hs$', '', regex=True) # Erase the _hs at the end of the reaction IDs
recon3d['m_gene_reaction_rule'] = recon3d['m_gene_reaction_rule'].replace(np.nan,'',regex=True)
recon3d.to_excel('../../Data/Reconciliation/datasets/recon3D_all_reactions.xlsx')

In [None]:
# Generate orthologs dict from the dataset generated in the Orthologs notebook
orthologs = pd.read_excel('../../Data/GPR_Curation/orthologs.xlsx', dtype=str)
orthologs.fillna('', inplace=True)
orthologs_dict = orthologs.set_index('Human GeneID')['CHO GeneID'].to_dict()
orthologs_dict

In [None]:
# Extract GPR info from Recon3D and swap gene IDs from human to CHO
import re
cho_gpr = []

for row in recon3d['m_gene_reaction_rule']:
    row = str(row)
    if row != "":
        gpr = re.findall('[\d.]*\d+', row)
        new_gpr = row
        for g in gpr:
            human_g = g.split('.')[0]
            try:
                cho_g = orthologs_dict[human_g]
                if cho_g == '':
                    cho_g = f'h{human_g}'
            except:
                cho_g = f'h{human_g}'
            new_gpr = new_gpr.replace(g, cho_g)
    elif row == '':
        new_gpr = ''
        
    cho_gpr.append(new_gpr)

In [None]:
cho_gpr

In [None]:
# Generate Recon3D dataset with the addition of GPR with CHO genes
recon3d['CHO GPR'] = cho_gpr
recon3d.to_excel('../../Data/GPR_Curation/recon3D_chogprs.xlsx')
recon3d

The dataset generated here will then be used in the **Reactions** notebook **Part 6**

### 2. Adittion of new reactions from Recon 3D
Recon3D reactions with mapped CHO GPRs are added to our reconstruction

In [None]:
# Subset of Recon 3D dataset with information on CHO GPRs
recon3d_cho = recon3d[recon3d['CHO GPR'] != '']
recon3d_cho = recon3d_cho.reset_index(drop=True)

In [None]:
recon3d_cho['m_metabolites'] = recon3d_cho['m_metabolites'].str.replace("[", "_", regex=True)
recon3d_cho['m_metabolites'] = recon3d_cho['m_metabolites'].str.replace("]", "", regex=True)
recon3d_cho

In [None]:
#Retreive information from the reactions in our reconstruction in Google Sheets

# give service account details to gspread
sa = gspread.service_account(filename='../credentials.json')

# sa is a gspread client, which can be used for connecting to the sheets
# by using the open method and the sheet name.
cho_recon = sa.open('CHO Network Reconstruction')

# we also need to specify the page name before getting the data. In this case we use the Rxns and Metabolites sheet.
rxns_sheet = cho_recon.worksheet('Rxns')

# We can extract the data using the get_all_records method and create pd DataFrames

# Reactions IDs, names, formulas, GPRs
rxns = pd.DataFrame(rxns_sheet.get_all_records())
del rxns[rxns.columns[0]]
rxns

In [None]:
# List of all the reactions in our reconstruction
cho_rxns_list = list(rxns['Reaction'])

# List of all the reactions with CHO GPRs in Recon3D 
recon3D_rxns_list = list(recon3d_cho['m_reaction'])

In [None]:
# List of Recon3D reactions that are in our reconstruction
rxns_in_cho = []

# List of Recon3D reactions that are NOT in our reconstruction
rxns_not_in_cho = []

for rxn in recon3D_rxns_list:
    if rxn in cho_rxns_list:
        rxns_in_cho.append(rxn)
    elif rxn not in cho_rxns_list:
        rxns_not_in_cho.append(rxn)
              

In [None]:
# Generation of a dataset containing information about Recon3D reactions that are not in our reconstruction
subset_df = recon3d_cho[recon3d_cho['m_reaction'].isin(rxns_not_in_cho)]
subset_df

In [None]:
#This dataset contains all the reactions added from Recon 3D

rxns_recon3d_toadd = subset_df[['m_reaction', 'Reaction Name', 'm_metabolites', 'm_subsystem', 'CHO GPR', 'Lower bound', 'Upper bound']]
rxns_recon3d_toadd.to_excel('../../Data/Reconciliation/datasets/rxns_recon3d_toadd.xlsx')