# Whole Cell Network Reconstruction in CHO Cells
The following notebook retrieves and updates information in the "Whole Cell Network Reconstruction for CHO Cells" Google Sheet file.

### 1. Access and retrieve information from the Google Sheet file through the Google Sheet API
Using the gspread library we can access the Google Sheet file and create a pandas df to visualize it.

In [41]:
import gspread
import pandas as pd
import numpy as np

In [42]:
# give service account details to gspread
sa = gspread.service_account(filename='credentials.json')

# sa is a gspread client, which can be used for connecting to the sheets
# by using the open method and the sheet name.
cho_recon = sa.open('CHO Network Reconstruction')

# we also need to specify the page name before getting the data. In this case we use the Rxns sheet.
rxns_sheet = cho_recon.worksheet('Rxns')

In [45]:
# visualization of all the sheets in our dataset
for sheets in cho_recon:
    print(sheets)

<Worksheet 'Info' id:0>
<Worksheet 'Rxns' id:1966089892>
<Worksheet 'Attributes' id:745769606>
<Worksheet 'Added Rxns' id:1377582373>
<Worksheet 'Genes' id:239167986>
<Worksheet 'Metabolites' id:1367015881>


In [46]:
# We can extract the data using the get_all_records method and create a pd DataFrame
df = pd.DataFrame(rxns_sheet.get_all_records())
df

Unnamed: 0,Curated,Reaction,Reaction Name,Reaction Formula,Subsystem,GPR_hef,GPR_fou,GPR_yeo,GPR_Recon3D,GPR_final,Conf. Score,Curation Notes,References
0,PD,10FTHF5GLUtl,"5-glutamyl-10FTHF transport, lysosomal",10fthf5glu_c --> 10fthf5glu_l,"TRANSPORT, LYSOSOMAL",,,,,,1,No information available in the literature abo...,
1,PD,10FTHF5GLUtm,"5-glutamyl-10FTHF transport, mitochondrial",10fthf5glu_m --> 10fthf5glu_c,"TRANSPORT, MITOCHONDRIAL",,,,,,1,No information available in the literature abo...,
2,PD,10FTHF6GLUtl,"6-glutamyl-10FTHF transport, lysosomal",10fthf6glu_c --> 10fthf6glu_l,"TRANSPORT, LYSOSOMAL",,,,,,1,No information available in the literature abo...,
3,PD,10FTHF6GLUtm,"6-glutamyl-10FTHF transport, mitochondrial",10fthf6glu_m --> 10fthf6glu_c,"TRANSPORT, MITOCHONDRIAL",,,,,,1,No information available in the literature abo...,
4,PD,10FTHF7GLUtl,"7-glutamyl-10FTHF transport, lysosomal",10fthf7glu_c --> 10fthf7glu_l,"TRANSPORT, LYSOSOMAL",,,,,,1,No information available in the literature abo...,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
8184,,r2537,Utilized transport,lnlncgcoa_c <=> lnlncgcoa_r,Transport,,,,,,,,
8185,,r2538,Utilized transport,dlnlcgcoa_c <=> dlnlcgcoa_r,Transport,,,,,,,,
8186,,r2539,Postulated transport reaction,L2aadp6sa_c + L2aadp_m <=> L2aadp6sa_m + L2aadp_c,Transport,,,,,,,,
8187,PD,ALLTTtm,"Allantoate transport via diffusion, mitochondria",alltt_c <=> alltt_m,"Transport, mitochondria",,,,,,1,The transport of Allantoate from the cytoplasm...,


### 2. Add information to the "Genes" sheet

Using a list of all the genes included in the dataset we can retrieve information from the NIH database regarding Gene Symbol, Gene Name, Gene Ensembl ID, and mRNA ID and protein ID

In [47]:
# Generation of gene_list from all the genes in the "Whole Cell Network Reconstruction in CHO Cells" dataset
import re

gene_list = []
for index, row in df.iterrows():
    if row['GPR_final'] != '':
        gpr = str(row['GPR_final'])
        num = re.findall(r'\d+', gpr)
        for n in num:
            gene_list.append(n)
        
gene_list = list(set(gene_list))

In [48]:
gene_list

['100766828',
 '100774202',
 '100689033',
 '100772585',
 '100750667',
 '100752083',
 '100758278',
 '100754811',
 '100751551',
 '100763203',
 '100756509',
 '100765249',
 '100765571',
 '100766541',
 '100763756',
 '3419',
 '100765192',
 '100756695',
 '100771549',
 '100758424',
 '100758244',
 '100763881',
 '100752746',
 '100760123',
 '100774248',
 '100769225',
 '100760062',
 '100763714',
 '100764101',
 '100761469',
 '100526794',
 '100773614',
 '100763658',
 '100767405',
 '100774815',
 '100767908',
 '100753375',
 '100755870',
 '100772030',
 '100753811',
 '100750927',
 '100769732',
 '100682536',
 '57834',
 '100773620',
 '100773026',
 '100750893',
 '100765152',
 '100689188',
 '100756173',
 '100763432',
 '100751762',
 '100768417',
 '100774578',
 '100755982',
 '100759818',
 '100773187',
 '100751197',
 '100775004',
 '100768771',
 '100754978',
 '100689022',
 '100756632',
 '100766362',
 '100753200',
 '100764885',
 '100773198',
 '100768962',
 '100773095',
 '100766757',
 '100771717',
 '100751224',
 

In [49]:
# Fetch information from the NIH database
import time
from utils import get_gene_info

# Open the Genes excel Sheet
cho_temporary = sa.open('CHO Network Reconstruction')
genes_sheet = cho_temporary.worksheet('Genes')
df = pd.DataFrame(genes_sheet.get_all_records())
df = df.set_index('Index')

# Complete null or blank information in the already generated "Genes Sheet" dataset
for i,row in df.iterrows():
    if row['Gene Entrez ID'] == '':
        for gene in genee_list:
            gene_sheet_list = [str(x) for x in df['Gene Entrez ID']]
            if gene not in gene_sheet_list:
                print(i)
                gene_symbol, gene_name, gene_description, picr_ensembl_id, chok1gs_ensembl_id, mRNA_ncbi_id, protein_ncbi_id, go_terms = get_gene_info(gene)
                genes_sheet.update_cell(i+1,1,i)
                time.sleep(5)
                genes_sheet.update_cell(i+1,2,gene)
                time.sleep(5)
                genes_sheet.update_cell(i+1,3,gene_symbol)
                time.sleep(5)
                genes_sheet.update_cell(i+1,4,gene_name)
                time.sleep(5)
                genes_sheet.update_cell(i+1,5,gene_description)
                time.sleep(5)
                genes_sheet.update_cell(i+1,6,picr_ensembl_id)
                time.sleep(5)
                genes_sheet.update_cell(i+1,7,chok1gs_ensembl_id)
                time.sleep(5)
                genes_sheet.update_cell(i+1,8,mRNA_ncbi_id)
                time.sleep(5)
                genes_sheet.update_cell(i+1,9,protein_ncbi_id)
                time.sleep(5)
                genes_sheet.update_cell(i+1,10,go_terms)
                break
    elif row['Gene Entrez ID'] != '' and (row['Gene Symbol'] == '' or row['Gene Name'] == '' or row['PICR Ensembl ID'] == '' or row['Transcript ID'] == '' or row['Protein ID'] == ''):
        print(i)
        gene_symbol, gene_name, gene_description, picr_ensembl_id, chok1gs_ensembl_id, mRNA_ncbi_id, protein_ncbi_id, go_terms = get_gene_info(row['Gene Entrez ID'])
        genes_sheet.update_cell(i+1,3,gene_symbol)
        time.sleep(5)
        genes_sheet.update_cell(i+1,4,gene_name)
        time.sleep(5)
        genes_sheet.update_cell(i+1,5,gene_description)
        time.sleep(5)
        genes_sheet.update_cell(i+1,6,picr_ensembl_id)
        time.sleep(5)
        genes_sheet.update_cell(i+1,7,chok1gs_ensembl_id)
        time.sleep(5)
        genes_sheet.update_cell(i+1,8,mRNA_ncbi_id)
        time.sleep(5)
        genes_sheet.update_cell(i+1,9,protein_ncbi_id)
        time.sleep(5)
        genes_sheet.update_cell(i+1,10,go_terms)

# Add genes from the gene_list that are not yet in the "Genes Sheet" dataset
for gene in gene_list:
    df = pd.DataFrame(genes_sheet.get_all_records())
    # the first try/except is to avoid overwritting data in case there already some info in the dataset
    try:
        gene_sheet_list = [str(x) for x in df['Gene Entrez ID']]
        id = max(df['Index']) + 2
    except:
        gene_sheet_list = []
        id = 2
    if gene not in gene_sheet_list:
        try:
            gene_symbol, gene_name, gene_description, picr_ensembl_id, chok1gs_ensembl_id, mRNA_ncbi_id, protein_ncbi_id, go_terms = get_gene_info(gene)
            print(id-1)
            genes_sheet.update_cell(id,1,id-1)
            time.sleep(5)
            genes_sheet.update_cell(id,2,gene)
            time.sleep(5)
            genes_sheet.update_cell(id,3,gene_symbol)
            time.sleep(5)
            genes_sheet.update_cell(id,4,gene_name)
            time.sleep(5)
            genes_sheet.update_cell(id,5,gene_description)
            time.sleep(5)
            genes_sheet.update_cell(id,6,picr_ensembl_id)
            time.sleep(5)
            genes_sheet.update_cell(id,7,chok1gs_ensembl_id)
            time.sleep(5)
            genes_sheet.update_cell(id,8,mRNA_ncbi_id)
            time.sleep(5)
            genes_sheet.update_cell(id,9,protein_ncbi_id)
            time.sleep(5)
            genes_sheet.update_cell(id,10,go_terms)
        except:
            print('Google API quota exceeded')
            time.sleep(5)
            continue

Google API quota exceeded
1370
Google API quota exceeded
Google API quota exceeded


APIError: <!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 502 (Server Error)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>502.</b> <ins>That’s an error.</ins>
  <p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.  <ins>That’s all we know.</ins>
