# Final CHO Model
This notebook is to asses the validity of our reconstruction and how complete it is.

[1. Generation of the dataset and model reconstruction](#generation) <br>
&nbsp;&nbsp;&nbsp;&nbsp;**1.1 Retrieve information from the Google Sheet datasets reactions and metabolites**<br>
&nbsp;&nbsp;&nbsp;&nbsp;**1.2 Build a model and feed it the information from the df generated** <br>
&nbsp;&nbsp;&nbsp;&nbsp;**1.3 Save and validate the model** <br>

[2. Identification of Blocked Reactions and Dead-End Metabolites](#blocked&deadends) <br>
&nbsp;&nbsp;&nbsp;&nbsp;**2.1 Identification of Blocked Reactions**<br>
&nbsp;&nbsp;&nbsp;&nbsp;**2.2 Identification of Dead-Ends Metabolites** <br>
&nbsp;&nbsp;&nbsp;&nbsp;**2.3 Addition of Extracellular Exchange Reanctions** <br>

[3. Generation of the Mass Flow Graph](#MFG) <br>
&nbsp;&nbsp;&nbsp;&nbsp;**3.1 Generation of the "D-Matrix"**<br>
&nbsp;&nbsp;&nbsp;&nbsp;**3.2 Plotting the "D-Matrix", Normalized Flow Graph (NFG)** <br>
&nbsp;&nbsp;&nbsp;&nbsp;**3.3 Generation of the "FluxOpenValue" matrix** <br>
&nbsp;&nbsp;&nbsp;&nbsp;**3.4 Generation of the "Mass Flow Graph (MFG) Matrix"** <br>
&nbsp;&nbsp;&nbsp;&nbsp;**3.5 Plotting the "MFG Matrix"** <br>

[4. Identification of duplicates through Chemical Formulas](#formulas) <br>

## 1. Generation of the dataset and model reconstruction <a id='generation'></a>
Here we generate the CHO model from the dataset stored in the Google Sheet file. We first use the google_sheet module to extract all the necessary information from the original dataset. Then we use those dataset and the COBRApy library to: (1) Create a new model and add reactions from the **Rxns Sheet**, (2) Add information on each reaction obtained from the **Rxns Sheet** and **Attributes Sheet**, (3) Add boundary reactions from the **BoundaryRxns Sheet**, and (4) Add information for each metabolite from the **Metabolites Sheet**. Finally we save the model as a SBML file and validate it using the cobrapy built-in function "validate_sbml_model( )".

In [1]:
# Import libraries
import pandas as pd
import numpy as np
from datetime import datetime

import cobra
from cobra import Model, Reaction, Metabolite
from cobra.io import validate_sbml_model, save_json_model, write_sbml_model

from tqdm.notebook import tqdm

from google_sheet import GoogleSheet

### 1.1 Retrieve information from the Google Sheet datasets reactions and metabolites

In [2]:
##### ----- Generate datasets from Google Sheet ----- #####

#Credential file
KEY_FILE_PATH = 'credentials.json'

#CHO Network Reconstruction + Recon3D_v3 Google Sheet ID
SPREADSHEET_ID = '1MlBXeHIKw8k8fZyXm-sN__AHTRSunJxar_-bqvukZws'

# Initialize the GoogleSheet object
sheet = GoogleSheet(SPREADSHEET_ID, KEY_FILE_PATH)

# Read data from the Google Sheet
sheet_met = 'Metabolites'
sheet_rxns = 'Rxns'
sheet_attributes = 'Attributes'
sheet_boundary = 'BoundaryRxns'

metabolites = sheet.read_google_sheet(sheet_met)
rxns = sheet.read_google_sheet(sheet_rxns)
rxns_attributes = sheet.read_google_sheet(sheet_attributes)
boundary_rxns = sheet.read_google_sheet(sheet_boundary)

### 1.2 Build a model and feed it the information from the df generated

In [3]:
##### ----- Create a model and add reactions ----- #####
model = Model("iCHO")
lr = []
for _, row in rxns.iterrows():
    r = Reaction(row['Reaction'])
    lr.append(r)    
model.add_reactions(lr)

Set parameter Username
Academic license - for non-commercial use only - expires 2024-03-24


In [4]:
##### ----- Add information to each one of the reactions ----- #####
for i,r in enumerate(tqdm(model.reactions)):    
    r.build_reaction_from_string(rxns['Reaction Formula'][i])
    r.name = rxns['Reaction Name'][i]
    r.subsystem = rxns['Subsystem'][i]
    r.gene_reaction_rule = str(rxns['GPR_final'][i])
    r.lower_bound = float(rxns_attributes['Lower bound'][i])
    r.upper_bound = float(rxns_attributes['Upper bound'][i])
    
model

  0%|          | 0/10511 [00:00<?, ?it/s]

unknown metabolite '10fthf5glu_c' created
unknown metabolite '10fthf5glu_l' created
unknown metabolite '10fthf5glu_m' created
unknown metabolite '10fthf6glu_c' created
unknown metabolite '10fthf6glu_l' created
unknown metabolite '10fthf6glu_m' created
unknown metabolite '10fthf7glu_c' created
unknown metabolite '10fthf7glu_l' created
unknown metabolite '10fthf7glu_m' created
unknown metabolite '10fthf_c' created
unknown metabolite '10fthf_l' created
unknown metabolite '10fthf_m' created
unknown metabolite '11docrtsl_c' created
unknown metabolite 'atp_c' created
unknown metabolite 'h2o_c' created
unknown metabolite '11docrtsl_e' created
unknown metabolite 'adp_c' created
unknown metabolite 'h_c' created
unknown metabolite 'pi_c' created
unknown metabolite '11docrtsl_m' created
unknown metabolite '11docrtsl_r' created
unknown metabolite '11docrtstrn_c' created
unknown metabolite '11docrtstrn_e' created
unknown metabolite '11docrtstrn_m' created
unknown metabolite '11docrtstrn_r' created


unknown metabolite '4mtolbutamide_e' created
unknown metabolite '4mtolbutamide_c' created
unknown metabolite '4nphsf_e' created
unknown metabolite '4nphsf_c' created
unknown metabolite '4nph_c' created
unknown metabolite '4nph_e' created
unknown metabolite 'cl_e' created
unknown metabolite 'cl_c' created
unknown metabolite '4pyrdx_c' created
unknown metabolite '4pyrdx_e' created
unknown metabolite '5adtststeroneglc_c' created
unknown metabolite '5adtststeroneglc_e' created
unknown metabolite '5adtststeroneglc_r' created
unknown metabolite '5adtststerone_c' created
unknown metabolite '5adtststerones_c' created
unknown metabolite '5adtststerones_e' created
unknown metabolite '5adtststerone_e' created
unknown metabolite '5adtststerone_r' created
unknown metabolite '5aop_c' created
unknown metabolite '5aop_m' created
unknown metabolite 'glcn_c' created
unknown metabolite '5dglcn_c' created
unknown metabolite '5dhf_c' created
unknown metabolite '5dhf_l' created
unknown metabolite '5eipencoa

unknown metabolite 'andrstndn_c' created
unknown metabolite 'and19one_c' created
unknown metabolite 'andrstandn_c' created
unknown metabolite 'andrstandn_e' created
unknown metabolite 'andrstndn_e' created
unknown metabolite 'andrstrnglc_c' created
unknown metabolite 'andrstrnglc_e' created
unknown metabolite 'andrstrnglc_r' created
unknown metabolite 'andrstrn_e' created
unknown metabolite 'andrstrn_c' created
unknown metabolite 'andrstrn_r' created
unknown metabolite 'anth_e' created
unknown metabolite 'anth_c' created
unknown metabolite 'antipyrene_e' created
unknown metabolite 'antipyrene_c' created
unknown metabolite 'ap4a_c' created
unknown metabolite 'ala_B_m' created
unknown metabolite 'msa_m' created
unknown metabolite 'apnnox_e' created
unknown metabolite 'apnnox_c' created
unknown metabolite 'apoC_c' created
unknown metabolite 'lys_L_c' created
unknown metabolite 'apoC_Lys_c' created
unknown metabolite 'apoC_m' created
unknown metabolite 'lys_L_m' created
unknown metabolite 

Malformed gene_reaction_rule '(__cobra_escape__100766805 and __cobra_escape__3979182 and __cobra_escape__100768142 and __cobra_escape__100774773 and __cobra_escape__100750822 and __cobra_escape__100766032 and __cobra_escape__100759002 and __cobra_escape__100763175 and __cobra_escape__100764001 and __cobra_escape__100766809 and __cobra_escape__100768457 and __cobra_escape__100770670 and __cobra_escape__100757101 and __cobra_escape__103161269 and __cobra_escape__100755207 and __cobra_escape__100765757 and __cobra_escape__100762701) or (__cobra_escape__100766805 and __cobra_escape__100774773 and __cobra_escape__100768142 and __cobra_escape__3979182 and __cobra_escape__100750822 and __cobra_escape__100759002 and __cobra_escape__100766032 and __cobra_escape__100763175 and __cobra_escape__100764001 and __cobra_escape__100766809 __cobra_escape__100768457 and __cobra_escape__100770670 and __cobra_escape__100757101 and __cobra_escape__103161269 and __cobra_escape__100755207 and __cobra_escape__

GPR will be empty


unknown metabolite 'cdpdag_cho_m' created
unknown metabolite 'cdprbtl_c' created
unknown metabolite 'cdprbtl_g' created
unknown metabolite 'cdp_c' created
unknown metabolite 'cdp_g' created
unknown metabolite 'ctp_c' created
unknown metabolite 'ctp_m' created
unknown metabolite 'CE1273_c' created
unknown metabolite 'CE1273_e' created
unknown metabolite 'CE1352_c' created
unknown metabolite 'CE1352_e' created
unknown metabolite 'CE1447_c' created
unknown metabolite 'CE1447_e' created
unknown metabolite 'CE1918_e' created
unknown metabolite 'CE1918_c' created
unknown metabolite 'CE2006_c' created
unknown metabolite 'CE2006_e' created
unknown metabolite 'CE2026_c' created
unknown metabolite 'CE2026_e' created
unknown metabolite 'CE2209_c' created
unknown metabolite 'CE2209_e' created
unknown metabolite 'CE2211_c' created
unknown metabolite 'CE2211_e' created
unknown metabolite 'CE2510_e' created
unknown metabolite 'CE2510_c' created
unknown metabolite 'CE2537_e' created
unknown metabolite

  warn(


unknown metabolite 'cysasnmet_e' created
unknown metabolite 'cysasnmet_c' created
unknown metabolite 'cysaspphe_e' created
unknown metabolite 'cysaspphe_c' created
unknown metabolite 'cyscys_e' created
unknown metabolite 'cyscys_c' created
unknown metabolite 'cysglnmet_e' created
unknown metabolite 'cysglnmet_c' created
unknown metabolite 'Lcystin_c' created
unknown metabolite 'gthrd_c' created
unknown metabolite 'gthox_c' created
unknown metabolite 'cysgluhis_e' created
unknown metabolite 'cysgluhis_c' created
unknown metabolite 'cysglutrp_e' created
unknown metabolite 'cysglutrp_c' created
unknown metabolite 'cysleuthr_e' created
unknown metabolite 'cysleuthr_c' created
unknown metabolite 'thcys_c' created
unknown metabolite 'cyssermet_e' created
unknown metabolite 'cyssermet_c' created
unknown metabolite 'mercppyr_c' created
unknown metabolite 'Lcystin_e' created
unknown metabolite 'cys_L_m' created
unknown metabolite 'mercppyr_m' created
unknown metabolite 'cyst_L_c' created
unknow

unknown metabolite 'CN0022_r' created
unknown metabolite 'CN0023_r' created
unknown metabolite 'CN0020_c' created
unknown metabolite 'CN0021_c' created
unknown metabolite 'CN0020_r' created
unknown metabolite 'CN0021_r' created
unknown metabolite 'dmgly_m' created
unknown metabolite 'fald_m' created
unknown metabolite 'sarcs_m' created
unknown metabolite 'dmhptcoa_c' created
unknown metabolite 'dmhptcrn_c' created
unknown metabolite 'dmhptcrn_m' created
unknown metabolite 'dmhptcrn_e' created
unknown metabolite 'dmnoncoa_c' created
unknown metabolite 'dmnoncrn_m' created
unknown metabolite 'dnac_n' created
unknown metabolite 'dna5mtc_n' created
unknown metabolite 'dnad_c' created
unknown metabolite 'nicrnt_c' created
unknown metabolite 'dnad_n' created
unknown metabolite 'dna_n' created
unknown metabolite 'seasmet_n' created
unknown metabolite 'seahcys_n' created
unknown metabolite 'dttp_n' created
unknown metabolite 'dna_prod_n' created
unknown metabolite 'dtdp_m' created
unknown meta

unknown metabolite 'M01191_x' created
unknown metabolite '3hdeccoa_m' created
unknown metabolite 'c161_7Ecoa_m' created
unknown metabolite 'M01191_m' created
unknown metabolite 'c161_9Ecoa_m' created
unknown metabolite 'hexddcoa_m' created
unknown metabolite 'CE0849_m' created
unknown metabolite '2hexdtricoa_x' created
unknown metabolite '4hexdtricoa_m' created
unknown metabolite '4hexdtricoa_x' created
unknown metabolite '2hexdtricoa_m' created
unknown metabolite '3hexdtricoa_m' created
unknown metabolite '3hexdtricoa_x' created
unknown metabolite 'hexdpencoa_m' created
unknown metabolite 'hexdpencoa_x' created
unknown metabolite 'phyt_x' created
unknown metabolite 'phytcoa_x' created
unknown metabolite '3hexdcoa_m' created
unknown metabolite 'hexdicoa_x' created
unknown metabolite 'hexdiac_r' created
unknown metabolite 'hexdicoa_r' created
unknown metabolite 'hdca_r' created
unknown metabolite 'whhdca_r' created
unknown metabolite 'octdececoa_m' created
unknown metabolite '3octdece1c

unknown metabolite 'oglcnc_ser_thr_g' created
unknown metabolite 'glcn_e' created
unknown metabolite 'glcr_e' created
unknown metabolite 'glcur_c' created
unknown metabolite 'glcur_r' created
unknown metabolite 'glc_D_g' created
unknown metabolite 'dxtrn_c' created
unknown metabolite 'glygn3_c' created
unknown metabolite 'glnasngln_e' created
unknown metabolite 'glnasngln_c' created
unknown metabolite 'glnhishis_e' created
unknown metabolite 'glnhishis_c' created
unknown metabolite 'glnhislys_e' created
unknown metabolite 'glnhislys_c' created
unknown metabolite 'guln_r' created
unknown metabolite 'gullac_r' created
unknown metabolite 'glnlyslys_e' created
unknown metabolite 'glnlyslys_c' created
unknown metabolite 'glnlystrp_e' created
unknown metabolite 'glnlystrp_c' created
unknown metabolite 'glnproglu_e' created
unknown metabolite 'glnproglu_c' created
unknown metabolite 'uaaGgla_c' created
unknown metabolite 'uaaGgtla_c' created
unknown metabolite 'glntrna_c' created
unknown meta

Malformed gene_reaction_rule '__cobra_escape__100750552 and __cobra_escape__100756663 __cobra_escape__100756951 __cobra_escape__100766960' for 100750552 and 100756663 100756951 100766960
Traceback (most recent call last):
  File "/Users/pablodigiusto/miniforge3/envs/mncho2.0/lib/python3.10/site-packages/cobra/core/gene.py", line 350, in from_string
    tree = ast_parse(escaped_str, "<string>", "eval")
  File "/Users/pablodigiusto/miniforge3/envs/mncho2.0/lib/python3.10/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
  File "<string>", line 1
    __cobra_escape__100750552 and __cobra_escape__100756663 __cobra_escape__100756951 __cobra_escape__100766960
                                                            ^^^^^^^^^^^^^^^^^^^^^^^^^
SyntaxError: invalid syntax

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/pablodigiusto/miniforge3/envs/mncho2.0/lib/python3.10/site-packages/cobra/cor

unknown metabolite 'hca6g_e' created
unknown metabolite 'hcoumarin_e' created
unknown metabolite 'hcoumarin_c' created
unknown metabolite 'M02155_c' created
unknown metabolite 'hdca24g_c' created
unknown metabolite 'M02155_r' created
unknown metabolite 'hdca24g_r' created
unknown metabolite 'hdca24g_e' created
unknown metabolite 'hdca6g_c' created
unknown metabolite 'hdca6g_r' created
unknown metabolite 'hdca6g_e' created
unknown metabolite 'hdcea_e' created
unknown metabolite 'pmtcrn_e' created
unknown metabolite 'hdd2crn_e' created
unknown metabolite 'apocytc_m' created
unknown metabolite 'cytc_m' created
unknown metabolite 'pe15_hs_e' created
unknown metabolite 'hepdecethe' created
unknown metabolite 'hestratriol_c' created
unknown metabolite 'hestratriol_e' created
unknown metabolite 'hestratriol_r' created
unknown metabolite 'gm3_cho_l' created
unknown metabolite 'gd3_cho_l' created
unknown metabolite 'gm2_cho_e' created
unknown metabolite 'acgal_e' created
unknown metabolite 'gm3

  warn(


unknown metabolite 'hpdca_e' created
unknown metabolite 'M00003_e' created
unknown metabolite 'M00003_c' created
unknown metabolite 'M00004_c' created
unknown metabolite 'M01238_e' created
unknown metabolite 'M01238_c' created
unknown metabolite 'M01237_c' created
unknown metabolite 'M00019_e' created
unknown metabolite 'M00019_c' created
unknown metabolite 'M00020_c' created
unknown metabolite 'vacc_e' created
unknown metabolite 'M00127_c' created
unknown metabolite 'M00115_e' created
unknown metabolite 'M00115_c' created
unknown metabolite 'M00116_c' created
unknown metabolite 'lneldc_e' created
unknown metabolite 'M02613_e' created
unknown metabolite 'M02613_c' created
unknown metabolite 'M02612_c' created
unknown metabolite 'M00017_e' created
unknown metabolite 'M00017_c' created
unknown metabolite 'M00018_c' created
unknown metabolite 'CE5151_c' created
unknown metabolite 'M01235_e' created
unknown metabolite 'M01235_c' created
unknown metabolite 'M01236_c' created
unknown metabol

Malformed gene_reaction_rule '__cobra_escape__100756663 __cobra_escape__100756951 __cobra_escape__100766960 or __cobra_escape__100773287 or __cobra_escape__100750552' for 100756663 100756951 100766960 or 100773287 or 100750552
Traceback (most recent call last):
  File "/Users/pablodigiusto/miniforge3/envs/mncho2.0/lib/python3.10/site-packages/cobra/core/gene.py", line 350, in from_string
    tree = ast_parse(escaped_str, "<string>", "eval")
  File "/Users/pablodigiusto/miniforge3/envs/mncho2.0/lib/python3.10/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
  File "<string>", line 1
    __cobra_escape__100756663 __cobra_escape__100756951 __cobra_escape__100766960 or __cobra_escape__100773287 or __cobra_escape__100750552
                              ^^^^^^^^^^^^^^^^^^^^^^^^^
SyntaxError: invalid syntax

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/pablodigiusto/miniforge3/envs/mncho2.0/

unknown metabolite 'gd1a_cho_c' created
unknown metabolite 'gm3_cho_c' created
unknown metabolite 'gm2_cho_c' created
unknown metabolite 'M03160_l' created
unknown metabolite 'M02012_l' created
unknown metabolite 'M02013_l' created
unknown metabolite 'gt1alpha_cho_c' created
unknown metabolite 'gd2_cho_c' created
unknown metabolite 'gd1b_cho_c' created
unknown metabolite 'gt1b_cho_c' created
unknown metabolite 'gt2_cho_c' created
unknown metabolite 'gt1c_cho_c' created
unknown metabolite 'acglcgalgluside_cho_c' created
unknown metabolite 'galacglcgalgluside_cho_c' created
unknown metabolite 'M00808_c' created
unknown metabolite 'M01849_c' created
unknown metabolite 'M01860_c' created
unknown metabolite 'M02195_c' created
unknown metabolite 'acngalacglcgalgluside_cho_c' created
unknown metabolite 'galfucgalacglcgalgluside_cho_c' created
unknown metabolite 'acgalfucgalacglcgalgluside_cho_c' created
unknown metabolite 'M02194_c' created
unknown metabolite 'M02683_c' created
unknown metabo

unknown metabolite 'CE5256_c' created
unknown metabolite 'C05298_l' created
unknown metabolite 'CE5256_l' created
unknown metabolite 'C05298_r' created
unknown metabolite 'CE5256_r' created
unknown metabolite 'CE2180_c' created
unknown metabolite 'CE5255_c' created
unknown metabolite 'CE2180_l' created
unknown metabolite 'CE5255_l' created
unknown metabolite 'CE2180_r' created
unknown metabolite 'CE5255_r' created
unknown metabolite 'CE5253_c' created
unknown metabolite 'CE5253_l' created
unknown metabolite 'hestratriol_l' created
unknown metabolite 'CE5254_c' created
unknown metabolite 'CE5254_l' created
unknown metabolite 'C05301_l' created
unknown metabolite 'vitd3_m' created
unknown metabolite 'CE2201_m' created
unknown metabolite 'CE2202_m' created
unknown metabolite '1a2425thvitd3_c' created
unknown metabolite 'CE2206_c' created
unknown metabolite 'CE2206_m' created
unknown metabolite 'CE2242_c' created
unknown metabolite 'CE2253_c' created
unknown metabolite '3ohxccoa_c' created

unknown metabolite 'vacc_r' created
unknown metabolite 'lneldc_r' created
unknown metabolite 'M02613_r' created
unknown metabolite 'arach_r' created
unknown metabolite 'CE2510_r' created
unknown metabolite 'M00017_r' created
unknown metabolite 'M01235_r' created
unknown metabolite 'M01207_r' created
unknown metabolite 'M02457_r' created
unknown metabolite 'M02053_r' created
unknown metabolite 'doco13ac_r' created
unknown metabolite 'M01582_r' created
unknown metabolite 'M03045_r' created
unknown metabolite 'lgnc_r' created
unknown metabolite 'nrvnc_r' created
unknown metabolite 'hexc_r' created
unknown metabolite 'M03153_r' created
unknown metabolite 'strdnc_r' created
unknown metabolite 'eicostet_r' created
unknown metabolite 'clpnd_r' created
unknown metabolite 'tetpent3_r' created
unknown metabolite 'crvnc_r' created
unknown metabolite 'M00010_r' created
unknown metabolite 'M00341_r' created
unknown metabolite 'adrn_r' created
unknown metabolite 'tettet6_r' created
unknown metabolit

unknown metabolite 'trnaval_c' created
unknown metabolite 'M03146_l' created
unknown metabolite 'M01569_l' created
unknown metabolite 'pchol_cho_l' created
unknown metabolite 'HC02066_l' created
unknown metabolite 'lpchol_cho_l' created
unknown metabolite 'glyc_l' created
unknown metabolite 'atp_l' created
unknown metabolite 'adp_l' created
unknown metabolite 'ile_L_l' created
unknown metabolite 'leu_L_l' created
unknown metabolite 'lys_L_l' created
unknown metabolite 'met_L_l' created
unknown metabolite 'phe_L_l' created
unknown metabolite 'pro_L_l' created
unknown metabolite 'ser_L_l' created
unknown metabolite 'thr_L_l' created
unknown metabolite 'trp_L_l' created
unknown metabolite 'tyr_L_l' created
unknown metabolite 'val_L_l' created
unknown metabolite 'M01358_l' created
unknown metabolite 'M02757_l' created
unknown metabolite 'M01352_l' created
unknown metabolite 'CE7220_c' created
unknown metabolite 'CE2089_c' created
unknown metabolite 'M01995_c' created
unknown metabolite 'um

unknown metabolite 'M00187_c' created
unknown metabolite 'M02737_c' created
unknown metabolite 'M02542_c' created
unknown metabolite 'M02541_c' created
unknown metabolite 'M02513_c' created
unknown metabolite 'M02521_c' created
unknown metabolite 'M01018_c' created
unknown metabolite 'M02702_c' created
unknown metabolite 'M02701_c' created
unknown metabolite 'M00155_c' created
unknown metabolite 'M00822_c' created
unknown metabolite 'M00823_c' created
unknown metabolite 'M02829_c' created
unknown metabolite 'M03144_c' created
unknown metabolite 'M02339_c' created
unknown metabolite 'M00196_c' created
unknown metabolite 'M02706_c' created
unknown metabolite 'M02707_c' created
unknown metabolite 'M02375_c' created
unknown metabolite 'M01128_c' created
unknown metabolite 'M02704_c' created
unknown metabolite 'trdox_c' created
unknown metabolite 'M02703_c' created
unknown metabolite 'trdrd_c' created
unknown metabolite 'M02708_c' created
unknown metabolite 'M02705_c' created
unknown metabo

unknown metabolite 'nadp_e' created
unknown metabolite 'adprbp_e' created
unknown metabolite 'CE5665_c' created
unknown metabolite 'Ser_Thr_l' created
unknown metabolite 's_nadh_m' created
unknown metabolite 's_nadph_m' created
unknown metabolite 'na1_x' created
unknown metabolite 'idp_e' created
unknown metabolite 'imp_e' created
unknown metabolite 'gdp_e' created
unknown metabolite 'gmp_e' created
unknown metabolite 'udp_e' created
unknown metabolite 'ump_e' created
unknown metabolite 'ump_g' created
unknown metabolite 'utp_e' created
unknown metabolite 'didp_m' created
unknown metabolite 'ditp_m' created
unknown metabolite 'udp_m' created
unknown metabolite 'utp_m' created
unknown metabolite 'idp_m' created
unknown metabolite 'acnam_e' created
unknown metabolite 'gd1a_cho_e' created
unknown metabolite 'gd1b_cho_e' created
unknown metabolite 'gt1b_cho_e' created
unknown metabolite 'gd3_cho_e' created
unknown metabolite 'gd2_cho_e' created
unknown metabolite 'gd1a_cho_n' created
unkno

unknown metabolite 'h2so3_m' created
unknown metabolite 'ps_cho_m' created
unknown metabolite 'pser_L_c' created
unknown metabolite 'ps_cho_e' created
unknown metabolite 'psyl_e' created
unknown metabolite 'psylchol_e' created
unknown metabolite 'psyltchol_e' created
unknown metabolite 'psyltdechol_e' created
unknown metabolite 'ps_cho_r' created
unknown metabolite 'ps_cho_g' created
unknown metabolite 'ppap_c' created
unknown metabolite '6pthp_c' created
unknown metabolite '6pthp_n' created
unknown metabolite 'ppi_e' created
unknown metabolite 'pyam5p_m' created
unknown metabolite 'pydam_e' created
unknown metabolite 'pydx5p_m' created
unknown metabolite 'pydx_c' created
unknown metabolite 'pydxn_e' created
unknown metabolite 'pydx_e' created
unknown metabolite 'peracd_c' created
unknown metabolite 'pylald_m' created
unknown metabolite 'peracd_m' created
unknown metabolite 'pyr_e' created
unknown metabolite 'q10h2_c' created
unknown metabolite 'q10h2_e' created
unknown metabolite 'qul

unknown metabolite 'CE1272_c' created
unknown metabolite 'xoltriol_r' created
unknown metabolite 'CE1272_r' created
unknown metabolite 'CE1273_r' created
unknown metabolite 'CE1277_c' created
unknown metabolite 'CE1277_r' created
unknown metabolite 'CE1279_c' created
unknown metabolite 'CE1279_r' created
unknown metabolite 'CE1278_c' created
unknown metabolite 'CE1278_r' created
unknown metabolite 'CE5242_c' created
unknown metabolite 'CE5252_m' created
unknown metabolite 'CE5242_m' created
unknown metabolite 'CE5252_r' created
unknown metabolite 'CE5242_r' created
unknown metabolite 'CE5252_x' created
unknown metabolite 'CE5242_x' created
unknown metabolite 'CE5243_c' created
unknown metabolite 'CE5249_m' created
unknown metabolite 'CE5243_m' created
unknown metabolite 'CE5249_r' created
unknown metabolite 'CE5243_r' created
unknown metabolite 'CE5249_x' created
unknown metabolite 'CE5243_x' created
unknown metabolite 'CE5244_c' created
unknown metabolite 'CE5251_m' created
unknown me

unknown metabolite 'CE6449_c' created
unknown metabolite 'CE6451_c' created
unknown metabolite 'CE6450_c' created
unknown metabolite 'CE6453_c' created
unknown metabolite 'CE6452_c' created
unknown metabolite 'CE6454_c' created
unknown metabolite 'CE6455_c' created
unknown metabolite 'CE6456_c' created
unknown metabolite 'CE6457_c' created
unknown metabolite 'CE6458_c' created
unknown metabolite 'CE6459_c' created
unknown metabolite 'CE6448_c' created
unknown metabolite 'CE6460_c' created
unknown metabolite 'CE6461_c' created
unknown metabolite 'CE6462_c' created
unknown metabolite 'CE6463_c' created
unknown metabolite 'CE6464_c' created
unknown metabolite 'CE6465_c' created
unknown metabolite 'CE6467_c' created
unknown metabolite 'CE6466_c' created
unknown metabolite 'CE0074_c' created
unknown metabolite 'CE0074_e' created
unknown metabolite 'urea_e' created
unknown metabolite 'urate_n' created
unknown metabolite 'CE0074_n' created
unknown metabolite 'urea_n' created
unknown metabolit

unknown metabolite 'CE7083_c' created
unknown metabolite 'CE7086_c' created
unknown metabolite 'CE7087_c' created
unknown metabolite 'HC02213_n' created
unknown metabolite 'CE7081_n' created
unknown metabolite 'CE7109_n' created
unknown metabolite 'CE7081_r' created
unknown metabolite 'CE7109_r' created
unknown metabolite 'CE7110_c' created
unknown metabolite 'CE7114_c' created
unknown metabolite 'CE7113_c' created
unknown metabolite 'CE7113_r' created
unknown metabolite 'CE7113_x' created
unknown metabolite 'CE7112_c' created
unknown metabolite 'CE7112_r' created
unknown metabolite 'CE7112_x' created
unknown metabolite 'CE7111_c' created
unknown metabolite 'CE7085_x' created
unknown metabolite 'CE7111_x' created
unknown metabolite 'CE7090_n' created
unknown metabolite 'CE7110_n' created
unknown metabolite 'CE7115_c' created
unknown metabolite 'CE7083_r' created
unknown metabolite 'CE7115_r' created
unknown metabolite 'CE0347_n' created
unknown metabolite 'leuktrB4_n' created
unknown m

unknown metabolite 'serglyglu_e' created
unknown metabolite 'serglyglu_c' created
unknown metabolite 'serlyshis_e' created
unknown metabolite 'serlyshis_c' created
unknown metabolite 'serphelys_e' created
unknown metabolite 'serphelys_c' created
unknown metabolite 'sertrna_m' created
unknown metabolite 'trnaser_m' created
unknown metabolite 'sertrphis_e' created
unknown metabolite 'sertrphis_c' created
unknown metabolite 'ser_L_x' created
unknown metabolite 'slfcys_c' created
unknown metabolite 'slfcys_e' created
unknown metabolite 'sgalside_cho_c' created
unknown metabolite 'ethamp_r' created
unknown metabolite 'hxdcal_r' created
unknown metabolite 'aanam_c' created
unknown metabolite 'l2n2m2mn_c' created
unknown metabolite 'acngalgbside_cho_g' created
unknown metabolite 'sph1p_c' created
unknown metabolite 'sl_L_c' created
unknown metabolite 'sl_L_e' created
unknown metabolite 'sl_L_m' created
unknown metabolite 'sphmyln_cho_g' created
unknown metabolite 'sphmyln_cho_l' created
unkno

unknown metabolite 'trptyrtyr_e' created
unknown metabolite 'trpvalasp_c' created
unknown metabolite 'val_L_c' created
unknown metabolite 'trpvalasp_e' created
unknown metabolite 'trypta_e' created
unknown metabolite 'ts3_c' created
unknown metabolite 'tststeroneglc_c' created
unknown metabolite 'tststeroneglc_e' created
unknown metabolite 'tststeroneglc_r' created
unknown metabolite 'tststerones_c' created
unknown metabolite 'tststerones_e' created
unknown metabolite 'tststerone_e' created
unknown metabolite 'tsul_e' created
unknown metabolite 'ttdca_e' created
unknown metabolite 'tudca3s_c' created
unknown metabolite 'tudca3s_e' created
unknown metabolite 'txa2_e' created
unknown metabolite 'txa2_c' created
unknown metabolite 'HC02180_c' created
unknown metabolite 'HC02180_r' created
unknown metabolite 'HC02180_e' created
unknown metabolite 'tymsf_c' created
unknown metabolite 'tymsf_e' created
unknown metabolite 'tym_e' created
unknown metabolite 'tyralaphe_e' created
unknown metabo

unknown metabolite 'HC02228_c' created
unknown metabolite 'HC01842_c' created
unknown metabolite 'HC01797_c' created
unknown metabolite 'HC00004_e' created
unknown metabolite 'citr_L_e' created
unknown metabolite 'HC00319_c' created
unknown metabolite 'pydx5p_e' created
unknown metabolite 'arachd_l' created
unknown metabolite 'hdca_l' created
unknown metabolite 'HC00250_e' created
unknown metabolite 'na1_r' created
unknown metabolite 'acmana_e' created
unknown metabolite 'lnlc_l' created
unknown metabolite 'HC01787_e' created
unknown metabolite 'HC01787_c' created
unknown metabolite 'C02528_e' created
unknown metabolite 'C02470_e' created
unknown metabolite 'HC00001_c' created
unknown metabolite 'HC00002_c' created
unknown metabolite 'HC00003_c' created
unknown metabolite 'HC00005_c' created
unknown metabolite 'HC02222_m' created
unknown metabolite 'HC00006_c' created
unknown metabolite 'HC00007_c' created
unknown metabolite 'HC00008_c' created
unknown metabolite 'HC01852_c' created
un

0,1
Name,iCHO
Memory address,1591d11b0
Number of metabolites,7754
Number of reactions,10511
Number of genes,3103
Number of groups,0
Objective expression,0
Compartments,


In [5]:
##### ----- Add Boundary Reactions ----- #####
dr = []
for _, row in boundary_rxns.iterrows():
    r = Reaction(row['Reaction'])
    dr.append(r)    
model.add_reactions(dr)

boundary_rxns_dict = boundary_rxns.set_index('Reaction').to_dict()
boundary_rxns_dict

for i,r in enumerate(tqdm(model.reactions)):
    if r in dr:
        r.build_reaction_from_string(boundary_rxns_dict['Reaction Formula'][r.id])
        r.name = boundary_rxns_dict['Reaction Name'][r.id]
        r.subsystem = boundary_rxns_dict['Subsystem'][r.id]
        r.lower_bound = float(boundary_rxns_dict['Lower bound'][r.id])
        r.upper_bound = float(boundary_rxns_dict['Upper bound'][r.id]) 
model

  0%|          | 0/11495 [00:00<?, ?it/s]

0,1
Name,iCHO
Memory address,1591d11b0
Number of metabolites,7754
Number of reactions,11495
Number of genes,3103
Number of groups,0
Objective expression,0
Compartments,


In [6]:
##### ----- Add information for each metabolite ----- #####
metabolites_dict = metabolites.set_index('BiGG ID').to_dict('dict')
metabolites_dict['Name']
for met in model.metabolites:
    met.name = metabolites_dict['Name'][f'{met}']
    met.formula = metabolites_dict['Formula'][f'{met}']
    met.compartment = metabolites_dict['Compartment'][f'{met}'].split(' - ')[0]    
    

### 1.3 Save and validate the model

In [7]:
##### ----- Build the S matrix ----- #####
S = cobra.util.create_stoichiometric_matrix(model, array_type='dense')
model.S = S

model.objective = 'biomass_cho'
model.optimize()

Unnamed: 0,fluxes,reduced_costs
10FTHF5GLUtl,0.0,0.0
10FTHF5GLUtm,0.0,0.0
10FTHF6GLUtl,0.0,0.0
10FTHF6GLUtm,0.0,0.0
10FTHF7GLUtl,0.0,0.0
...,...,...
EX_c8dc,0.0,0.0
EX_docosac,-750.0,0.0
EX_ivcrn,0.0,0.0
EX_tdcrn,0.0,0.0


In [8]:
##### ----- Save the model ----- #####
model_name_xml = 'iCHOv3_CHO_'+datetime.now().strftime("%d%m%Y")+'.xml' 
write_sbml_model(model, model_name_xml)

# Write in json because, the sbml doesnt savve the subsystems
model_name_json = 'iCHOv3_CHO_'+datetime.now().strftime("%d%m%Y")+'.json' 
save_json_model(model, model_name_json)

In [9]:
##### ----- Test for errors in the recostruction ----- ######

# import tempfile
# from pprint import pprint
# from cobra.io import write_sbml_model, validate_sbml_model
# with tempfile.NamedTemporaryFile(suffix='.xml') as f_sbml:
#     write_sbml_model(model, filename=f_sbml.name)
#     report = validate_sbml_model(filename=f_sbml.name)
# pprint(report)

from cobra.io import read_sbml_model, validate_sbml_model
(_, errors) = validate_sbml_model(model_name_xml)
errors

{'SBML_FATAL': [],
 'SBML_ERROR': [],
 'SBML_SCHEMA_ERROR': [],
 'COBRA_FATAL': [],
 'COBRA_ERROR': [],
 'COBRA_CHECK': []}

## 2. Identification of Blocked Reactions and Dead-End Metabolites <a id='blocked&deadends'></a>
In this second part of the notebook we use two different functions from the utils module to: (1) Run a flux variability analysis and identify blocked reactions, and (2) identify dead-end metabolites. Finally we add Extracellular Exchange reactions for the dead-end metabolites that are in the extracellular compartment.

In [9]:
import pandas as pd
from cobra.io import read_sbml_model
from cobra.flux_analysis import find_blocked_reactions
from utils import detect_dead_ends

In [None]:
##### ----- Read Model ----- #####
model = read_sbml_model("iCHOv3_CHO_27072023.xml")

### 2.1 Identification of Blocked Reactions
The runMinMax_Single( ) function from the utils module runs an FVA (flux variability analysis) to find the ranges of each reaction's metabolic flux at the optimum. The results are stored in the "FVA_Results.txt" file with all the reactions IDs and its respective fluxes. Reactions with no flux (**Blocked Reactions**) are stored in another file called "Blocked_Reactions.txt"

In [None]:
##### ----- FVA ----- #####
import utils

model.solver = 'gurobi'


for rxn_exchange in model.exchanges:
    rxn_exchange.bounds = (-1000, 1000)
minmax = utils.runMinMax_Single(model, end_rxn_index=None)

##### Print Reactions with the min and max fluxes #####
with open('FVA_Results.txt', 'w') as f:
    for i, j in enumerate(minmax):
        print(model.reactions[i].id, "Min: ",j[0], "Max: ", j[1], file=f)
        
##### Print Blocked Reactions #####
with open('Blocked_Reactions.txt', 'w') as f:
    for i, j in enumerate(minmax):
        if j[0] == 0 and j[1] == 0:
            print(model.reactions[i].id, "Min: ",j[0], "Max: ", j[1], file=f)

In [10]:
##### ----- Blocked Reactions ----- #####
model.solver = 'gurobi'
blocked_reactions = find_blocked_reactions(model)

print(blocked_reactions)

Set parameter Username
Academic license - for non-commercial use only - expires 2024-03-24
Read LP format model from file /var/folders/_x/tfg8s2ks4n1ftkkwzp5sqjpc0000gn/T/tmpkzqp4p35.lp
Reading time = 0.06 seconds
: 7755 rows, 22991 columns, 98373 nonzeros
Set parameter Username
Academic license - for non-commercial use only - expires 2024-03-24
Read LP format model from file /var/folders/_x/tfg8s2ks4n1ftkkwzp5sqjpc0000gn/T/tmpokd74y93.lp
Reading time = 0.06 seconds
: 7755 rows, 22991 columns, 98373 nonzeros
Set parameter Username
Academic license - for non-commercial use only - expires 2024-03-24
Read LP format model from file /var/folders/_x/tfg8s2ks4n1ftkkwzp5sqjpc0000gn/T/tmpr3uwjhgo.lp
Reading time = 0.06 seconds
: 7755 rows, 22991 columns, 98373 nonzeros
Set parameter Username
Academic license - for non-commercial use only - expires 2024-03-24
Read LP format model from file /var/folders/_x/tfg8s2ks4n1ftkkwzp5sqjpc0000gn/T/tmpq6r9k72k.lp
Reading time = 0.06 seconds
: 7755 rows, 22

In [11]:
##### ----- Print the amount  and % of blocked reactions ----- #####
print('##### ----- Blocked Reactions ----- #####')
print(f'The model has {len(model.reactions)} total reactions')
print(f'The model has {len(blocked_reactions)} ({round(len(blocked_reactions)/len(model.reactions)*100)}%) blocked reactions')

##### ----- Blocked Reactions ----- #####
The model has 11495 total reactions
The model has 2791 (24%) blocked reactions


### 2.2 Identification of Dead-Ends Metabolites
The detect_dead_ends( ) function from the utils module returns a list with all the **dead-end** metabolites in our model. A dead-end metabolite refers to a metabolite that is either only consumed but not produced, or only produced but not consumed, in a given metabolic network. The results are stored in the "Dead-ends.txt" file.

In [12]:
##### ----- Detect Dead-Ends ----- #####
model.solver = 'gurobi' #change 'gurobi' for the default cobrapy solver 'glpk' 
dead_ends = detect_dead_ends(model)


with open('Dead_ends.txt', 'w') as f:
    for i,j in enumerate(dead_ends):
        if j:
            print(model.metabolites[i], file=f)
            
rxn_c = 0
met_c = 0
for i,j in enumerate(dead_ends):
    if j:
        met_c += 1
        print("\n", model.metabolites[i],": ")
        for met_rxn in model.metabolites[i].reactions:
            rxn_c += 1
            print(met_rxn)
            print(rxn_c)

print(f'Percentage of Dead-End metabolites is {round(met_c/len(model.metabolites)*100)}%' )


 phsphings_g : 
ACER23g: h2o_g + phcrm_hs_g --> Rtotal_g + phsphings_g
1

 phcrm_hs_r : 
ACER31r: h2o_r + phcrm_hs_r --> Rtotal_r + phsphings_r
2

 phsphings_r : 
ACER31r: h2o_r + phcrm_hs_r --> Rtotal_r + phsphings_r
3

 dca_x : 
ACOATE100p: dcacoa_x + h2o_x --> coa_x + dca_x + h_x
4

 ddca_m : 
ACOATE120m: ddcacoa_m + h2o_m --> coa_m + ddca_m + h_m
5

 ddca_x : 
ACOATE120p: ddcacoa_x + h2o_x --> coa_x + ddca_x + h_x
6

 ttdca_m : 
ACOATE140m: h2o_m + tdcoa_m --> coa_m + h_m + ttdca_m
7

 hdca_m : 
ACOATE160m: h2o_m + pmtcoa_m --> coa_m + h_m + hdca_m
8

 vacc_m : 
ACOATE1811m: h2o_m + octd11ecoa_m --> coa_m + h_m + vacc_m
9

 elaid_m : 
ACOATE1812m: h2o_m + od2coa_m --> coa_m + elaid_m + h_m
10

 lnlncg_m : 
ACOATE1831m: h2o_m + lnlncgcoa_m --> coa_m + h_m + lnlncg_m
11

 lnlnca_m : 
ACOATE1832m: h2o_m + lnlncacoa_m --> coa_m + h_m + lnlnca_m
12

 strdnc_m : 
ACOATE184m: h2o_m + strdnccoa_m --> coa_m + h_m + strdnc_m
13

 eicostri_c : 
ACOATE203: eicostricoa_c + h2o_c --> coa_c + ei

### 2.3 Addition of Extracellular Exchange Reanctions
The following cell adds **EXTRACELLULAR EXCHANGE** reactions to the dead-end metabolites in the extracellular compartment from the list generated above.

In [None]:
##### ----- Automatically add EXTRACELLULAR EXCHANGE reactions to the "BoundaryRxns" Sheet ----- #####
added_exchange = False
for i,j in enumerate(dead_ends):
    if j:
        if str(model.metabolites[i]).endswith('_e'):
            new_row_data = {'Curated': '', 'Reaction': 'EX_'+str(model.metabolites[i]), 'Reaction Name': 'Exchange of '+model.metabolites[i].name, 'Reaction Formula': str(model.metabolites[i])+' <=>', 'Subsystem': 'EXTRACELLULAR EXCHANGE',
                                    'Reversible': 1, 'Lower bound': -1000, 'Upper bound': 1000, 'Objective': 0}
            new_row_df = pd.DataFrame(new_row_data, index=[len(boundary_rxns)])
            boundary_rxns = pd.concat([boundary_rxns, new_row_df])
            added_exchange = True

#Check for duplicated reactions added to the boundary_rxns dataset, IF NOT: update the google sheet file
if added_exchange:
    if not boundary_rxns['Reaction'].duplicated().any() and not boundary_rxns['Reaction Formula'].duplicated().any():
        sheet.update_google_sheet(sheet_boundary, boundary_rxns)
        print("BoundaryRxns Google Sheet updated.")
    else:
        print('Duplicated values found in the dataset')

### 2.4 Gapfill for blocked reactions
Cobrapy has a gap filling implementation that is very similar to that of Reed et al. where we use a mixed-integer linear program to figure out the smallest number of reactions that need to be added for a user-defined collection of reactions, i.e. a universal model.

In [None]:
import cobra
from cobra.flux_analysis import gapfill

#recon_3d = read_sbml_model("../Data/GPR_curation/Recon3D.xml")
#iCHO2291 = read_sbml_model("../Data/Reconciliation/models/iCHO2291.xml")
#universal = recon_3d.merge(iCHO2291)

In [None]:
for blocked_reaction in blocked:
    model.objective = blocked_reaction
    model.optimize().objective_value
    try:
        solution = gapfill(model, iCHO2291, demand_reactions=True)
        print(blocked_reaction)
        print(solution)
    except Exception as e:
        print(f'Gapfill failed for {blocked_reaction}: {str(e)}')
        continue

### Test CHO - Recon GEM

In [None]:
universal

In [None]:
# iCHO_recon3dfrom cobra.io import read_sbml_model
# read_sbml_model(".xml")

model_EX = [i for i, rxn in enumerate(model.reactions) if 'EX_' in rxn.id]
model_SK = [i for i, rxn in enumerate(model.reactions) if 'SK_' in rxn.id]
model_DM = [i for i, rxn in enumerate(model.reactions) if 'DM_' in rxn.id]
for i in model_EX:
    model.reactions[i].bounds = -1000, 1000

for i in model_SK:
    model.reactions[i].bounds = -1000, 1000

for i in model_DM:
    model.reactions[i].bounds = 0, 1000
    

In [None]:
model.objective = "biomass" # 
sol1 = model.optimize()
print(sol1.objective_value)

model.objective = "biomass_prod" # 
sol2 = model.optimize()
print(sol2.objective_value)

model.objective = "biomass_producing" # 
sol3 = model.optimize()
print(sol3.objective_value)


In [None]:
##### ----- Test model KOs ----- #####
for reaction in model.reactions:
    with model as model:
        reaction.knock_out()
        model.optimize()
        print('%s blocked (bounds: %s), new growth rate %f' %
              (reaction.id, str(reaction.bounds), model.objective.value))

## 3. Generation of the Mass Flow Graph <a id='MFG'></a>
Based on the publication **_Flux-dependent graphs for metabolic networks_** by _Beguerisse-Diaz et al. (2018)_ (https://www.nature.com/articles/s41540-018-0067-y). Here we use our model to build the **D Matrix** and plot the corresponding graph, then the **M Matrix** and plot the corresponding graph, and finally we generate the **PageRank** file with all the reactions in our reconstruction sorted by importance.

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import networkx as nx
from scipy.linalg import pinv

import time

from skimage import draw
import matplotlib.pyplot as plt
from wordcloud import WordCloud
from collections import Counter

import cobra
from cobra.io import read_sbml_model

### 3.1 Generation of the "D-Matrix"
The "D-Matrix" defines the weight of the edge between reaction nodes Ri and Rj as the probability that any metabolite chosen at random is produced by Ri (reaction i) and consumed by Rj (reaction j). Summing over all metabolites and normalizing, we obtain the edge weights of the adjacency matrix of the NFG:

In [None]:
# Load the model
model = read_sbml_model('iCHOv3_CHO_06062023.xml')

# Extract stoichiometric matrix, m=reactions, n=metabolites
start = time.time()
S = cobra.util.array.create_stoichiometric_matrix(model)
n, m = S.shape
end = time.time()
print(f"Time taken to generate S, n and m: {end - start} seconds")

# Create S2m matrix
start = time.time()
r = np.array([reaction.reversibility for reaction in model.reactions]) # m-dimensional reversibility vector with components rj = 1 if reaction Rj is reversible and rj = 0 if it is irreversible.
Im = np.eye(m) # m × m identity matrix
diagR = r * Im # m × m matrix contains r in its main diagonal
S2m2 = np.block([[Im, np.zeros((m, m))], [np.zeros((m, m)), np.diag(r)]])
S2m1 = np.hstack((S, -S))
S2m = S2m1 @ S2m2 # unfolded version of the stoichiometric matrix of the 2m forward and reverse reactions.
end = time.time()
print(f"Time taken to generate S2m: {end - start} seconds")

# Create S2mplus and S2mminus matrices
start = time.time()
S2mplus = (np.abs(S2m) + S2m) / 2 # production stoichiometric matrix
S2mminus = (np.abs(S2m) - S2m) / 2 # consumption stoichiometric matrix
end = time.time()
print(f"Time taken to generate S2mplus and S2minus: {end - start} seconds")

# Calculate weights
start = time.time()
Wplus = pinv(np.diag(S2mplus @ np.ones((2*m, 1)).flatten()))
Wminus = pinv(np.diag(S2mminus @ np.ones((2*m, 1)).flatten()))
end = time.time()
print(f"Time taken to generate Wplus and Wminus: {end - start} seconds")

# Calculate D matrix
start = time.time()
D = 1/n * (Wplus @ S2mplus).T @ (Wminus @ S2mminus)
end = time.time()
print(f"Time taken to generate the D-Matrix: {end - start} seconds")

# Remove unused reactions
start = time.time()
IDr = np.nonzero(np.sum(D, axis=0) == 0)[0]
IDr = IDr[IDr > m]

D = np.delete(D, IDr, axis=0)
D = np.delete(D, IDr, axis=1)
end = time.time()
print(f"Time taken to remove unused reactions from the D-Matrix: {end - start} seconds")

### 3.2 Plotting the "D-Matrix", Normalized Flow Graph (NFG)

In [None]:
# Convert D matrix into a graph
G = nx.from_numpy_array(D, create_using=nx.DiGraph)

# Create a subgraph containing only the nodes with a degree greater than min_degree
degrees = {node: val for (node, val) in G.degree()} # Calculate the degrees of each node
min_degree = 2500 # Define the minimum degree for a node to be kept.
nodes_to_keep = [node for node, degree in degrees.items() if degree >= min_degree]
G_filtered = G.subgraph(nodes_to_keep)

# Normalize weights for the edges of the nodes
weights = [G_filtered[u][v]['weight'] for u,v in G_filtered.edges()]
normalized_weights = [(w - min(weights)) / (max(weights) - min(weights)) * (7 - 0.001) + 0.001 for w in weights]

# Create a dictionary mapping node indices to reaction names for labeling
reaction_names = np.concatenate(([reaction.id for reaction in model.reactions],
                         [reaction.id + '_r' for reaction in model.reactions]))
node_labels_filtered = {i: reaction_name for i, reaction_name in enumerate(reaction_names) if i in nodes_to_keep}

# Normalize the degrees for color mapping
degrees_filtered = [val for (node, val) in G_filtered.degree()]
normalized_degrees_filtered = [(d - min(degrees_filtered)) / (max(degrees_filtered) - min(degrees_filtered)) for d in degrees_filtered]

# Use a colormap to map normalized degrees to colors
cmap = plt.get_cmap('OrRd')  # Choose a colormap here
node_colors_filtered = [cmap(deg) for deg in normalized_degrees_filtered]

# Plot the graph
fig, ax = plt.subplots(figsize=(20, 20))
pos_filtered = nx.random_layout(G_filtered)
nx.draw(G_filtered, pos_filtered, width=normalized_weights, node_color=node_colors_filtered, edge_color='lightgray', node_size=300, ax=ax)
plt.title('Network Flux Graph (NFG)', fontsize=25, y=0.95)

# Labels
label_pos_filtered = {node: (x + 0.01, y + 0.01) for node, (x, y) in pos_filtered.items()}
nx.draw_networkx_labels(G_filtered, label_pos_filtered, labels=node_labels_filtered, font_size=12, ax=ax)

# Add a colorbar as the legend for node colors
sm = plt.cm.ScalarMappable(cmap=cmap, norm=plt.Normalize(vmin=min(normalized_degrees_filtered), vmax=max(normalized_degrees_filtered)))
sm.set_array([])
cbar_ax = fig.add_axes([0.7, 0.85, 0.07, 0.01])
cbar = plt.colorbar(sm, cax=cbar_ax, label='Node Degree', orientation='horizontal')

plt.savefig('network_flux_graph.png')
plt.show()

### 3.3 Generation of the "FluxOpenValue" matrix

In [None]:
# Fix the bounds for boundary reactions
model_EX = [i for i, rxn in enumerate(model.reactions) if 'EX_' in rxn.id]
model_SK = [i for i, rxn in enumerate(model.reactions) if 'SK_' in rxn.id]
model_DM = [i for i, rxn in enumerate(model.reactions) if 'DM_' in rxn.id]
for i in model_EX:
    model.reactions[i].bounds = -1000, 1000

for i in model_SK:
    model.reactions[i].bounds = -1000, 1000

for i in model_DM:
    model.reactions[i].bounds = 0, 1000

# Perform pFBA for Biomass on Non-Producing and Producing Cell Lines
model.solver = 'gurobi'
objectives = ['biomass', 'biomass_producing']

fluxes_list = []
for objective in objectives:
    model.objective = objective
    pfba_solution = cobra.flux_analysis.pfba(model)
    fluxes = np.array(pfba_solution.fluxes)
    fluxes_list.append(fluxes)
    
# Stack arrays horizontally
FluxOpenValue = np.column_stack(fluxes_list)

FluxOpenValue

In [None]:
# Count the number of nonzero values in each row
nonzero_counts = np.count_nonzero(FluxOpenValue, axis=1)

# Count the number of rows that contain only zeros
num_all_zero_rows = np.count_nonzero(nonzero_counts == 0)

# Count the number of rows that contain some nonzero value
num_some_nonzero_rows = np.count_nonzero(nonzero_counts != 0)

print("Number of Rxns with no flux:", num_all_zero_rows)
print("Number of Rxns with any flux:", num_some_nonzero_rows)

### 3.4 Generation of the "Mass Flow Graph (MFG) Matrix"
The MFG provides a versatile framework to create environment-specific metabolic graphs from FBA solutions

In [None]:
# Calculation of the MFG for each pFBA analysis
PageRank = []
M_list = []
for i in range(FluxOpenValue.shape[1]):
    start = time.time()
    v1 = FluxOpenValue[:, i].T

    # unfolding the flux vector
    # creation of vplus and vminus
    vplus = (np.abs(v1) + v1) / 2
    vminus = (np.abs(v1) - v1) / 2

    # creation of v2m
    v2m = np.concatenate((vplus, vminus))

    # creation of J_v
    J_v = S2mplus @ v2m.reshape(-1)

    # calculation of the MFG
    M = (S2mplus * v2m).T @ pinv(np.diag(J_v)) @ (S2mminus * v2m)
    M_list.append(M)

    # Calculate and store PageRank
    G = nx.from_numpy_array(M, create_using=nx.DiGraph)
    pr = nx.pagerank(G)
    PageRank.append(pr)
    end = time.time()
    print(f"Time taken to go through iteration {i}: {end - start} seconds")
    

# Post-processing of PageRank
df = pd.DataFrame(PageRank)
PageRank = df.values
PageRank = np.array(PageRank).T
PageRankRxns = PageRank[:m, :]
PageRankRxns_back = PageRank[m:, :]

for i in range(m):
    for j in range(PageRankRxns.shape[1]):
        if PageRankRxns_back[i, j] > PageRankRxns[i, j]:
            PageRankRxns[i, j] = PageRankRxns_back[i, j]

### 3.5 Plotting the "MFG Matrix"

In [None]:
# Convert D matrix into a graph
M_biomass = M_list[0]
M_biomass_producing = M_list[1]
G = nx.from_numpy_array(M_biomass_producing, create_using=nx.DiGraph)

# Create a subgraph containing only the nodes with a degree greater than min_degree
degrees = {node: val for (node, val) in G.degree()} # Calculate the degrees of each node
min_degree = 30 # Define the minimum degree for a node to be kept.
nodes_to_keep = [node for node, degree in degrees.items() if degree >= min_degree]
G_filtered = G.subgraph(nodes_to_keep)

# Normalize weights for the edges of the nodes
weights = [G_filtered[u][v]['weight'] for u,v in G_filtered.edges()]
normalized_weights = [(w - min(weights)) / (max(weights) - min(weights)) * (7 - 0.001) + 0.001 for w in weights]

# Create a dictionary mapping node indices to reaction names for labeling
reaction_names = np.concatenate(([reaction.id for reaction in model.reactions],
                         [reaction.id + '_r' for reaction in model.reactions]))
node_labels_filtered = {i: reaction_name for i, reaction_name in enumerate(reaction_names) if i in nodes_to_keep}

# Normalize the degrees for color mapping
degrees_filtered = [val for (node, val) in G_filtered.degree()]
normalized_degrees_filtered = [(d - min(degrees_filtered)) / (max(degrees_filtered) - min(degrees_filtered)) for d in degrees_filtered]

# Use a colormap to map normalized degrees to colors
cmap = plt.get_cmap('OrRd')  # Choose a colormap here
node_colors_filtered = [cmap(deg) for deg in normalized_degrees_filtered]

# Plot the graph
fig, ax = plt.subplots(figsize=(20, 20))
pos_filtered = nx.random_layout(G_filtered)
nx.draw(G_filtered, pos_filtered, width=normalized_weights, node_color=node_colors_filtered, edge_color='lightgray', node_size=300, ax=ax)
plt.title('Mass Flow Graph (MFG)', fontsize=20, y=0.95)

# Labels
label_pos_filtered = {node: (x + 0.01, y + 0.01) for node, (x, y) in pos_filtered.items()}
nx.draw_networkx_labels(G_filtered, label_pos_filtered, labels=node_labels_filtered, font_size=12, ax=ax)

# Add a colorbar as the legend for node colors
sm = plt.cm.ScalarMappable(cmap=cmap, norm=plt.Normalize(vmin=min(normalized_degrees_filtered), vmax=max(normalized_degrees_filtered)))
sm.set_array([])
cbar_ax = fig.add_axes([0.7, 0.85, 0.07, 0.01])
cbar = plt.colorbar(sm, cax=cbar_ax, label='Node Degree', orientation='horizontal')


plt.savefig('mass_flow_graph.png')
plt.show()

### 3.6 Word Cloud Plot for Metabolites Frecuencies in Pagerank

In [None]:
#row_sums = PageRankRxns.sum(axis=1)
df = pd.DataFrame()
for i,n in enumerate(objectives):
    sorted_indices = np.argsort(PageRankRxns[:,i])
    rxns_list = []
    values_list = []
    for s in sorted_indices[::-1]:
        rxns_list.append(model.reactions[s].id)
        values_list.append(PageRankRxns[s,i])
    
    df[n] = pd.Series(rxns_list)
    df[f'values_{n}'] = pd.Series(values_list)

In [None]:
for i,v in df.iterrows():
    print(v['biomass_producing'],v['values_biomass_producing'])

In [None]:
mets_list = []
for rxn in df['biomass'][df['values_biomass'] > 0.0000412]:
    r = model.reactions.get_by_id(rxn)
    mets = r.metabolites
    for met in mets:
        mets_list.append(met.id)
        
for rxn in df['biomass_producing'][df['values_biomass_producing'] > 0.0000412]:
    r = model.reactions.get_by_id(rxn)
    mets = r.metabolites
    for met in mets:
        mets_list.append(met.id)

In [None]:
# Count the frequencies of each metabolite
mets_freq = Counter(mets_list)
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('h2o_')} #eliminate water
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('h2o2_')} #eliminate peroxide
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('co2_')} #eliminate carbon dioxide
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('nh4_')} #eliminate amonium
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('h_')} #eliminate protons
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('atp_')} #eliminate atp
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('adp_')} #eliminate adp
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('amp_')} #eliminate amp
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('nad_')} #eliminate nad
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('nadh_')} #eliminate nadh
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('nadp_')} #eliminate nadp
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('nadph_')} #eliminate nadph
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('na1_')} #eliminate Sodium
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('coa_')} #eliminate CoA
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('accoa_')} #eliminate Acetyl-CoA
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('pi_')} #eliminate phosphate
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('ppi_')} #eliminate phosphate
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('fadh2_')} #eliminate FADH
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('fad_')} #eliminate FAD
mets_freq = {k: v for k, v in mets_freq.items() if not k.startswith('o2_')} #eliminate Oxigen

# Create a circular mask
radius = 500  # you can change to the size you need
circle_img = np.zeros((2*radius, 2*radius), np.uint8)
rr, cc = draw.disk((radius, radius), radius)
circle_img[rr, cc] = 1

# Create the word cloud
wordcloud = WordCloud(width = 1000, height = 500, mask=circle_img, background_color="rgba(255, 255, 255, 0)", mode="RGBA").generate_from_frequencies(mets_freq)

plt.figure(figsize=(8,8))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")

plt.savefig('wordcloud.png', bbox_inches='tight', transparent=True, pad_inches=0)
plt.show()

In [None]:
counter=0
for met in mets_freq:
    print(met,mets_freq[met])
    counter+=1
    
print(counter)

In [None]:
#Store the metabolites and their frequencies in a .txt file

with open('metabolites.txt', 'w') as f:
    for i, j in enumerate(mets_freq):
        print(j,'Freq:',mets_freq[j], file=f)

## 4. Identification of duplicates through Chemical Formulas <a id='formulas'></a>

In [None]:
# Chemical Formula
from cobra.io import read_sbml_model
model = read_sbml_model("iCHOv3_CHO_23052023.xml")

import pandas as pd
data = []

for m in model.metabolites:
    m.name = m.formula
for r in model.reactions:
    if r not in model.exchanges:
        print(r)
        reaction_id = r.id
        reaction_name = r.name
        reaction_formula = r.build_reaction_string(use_metabolite_names=True)
        data.append([reaction_id, reaction_name, reaction_formula])
df = pd.DataFrame(data, columns=['ID', 'Name', 'Reaction'])
df

In [None]:
for m in model.metabolites:
    m.name = m.formula
for r in model.reactions:
    if r not in model.exchanges:
        reaction_id = r.id
        reaction_name = r.name
        reaction_formula = r.build_reaction_string(use_metabolite_names=True)
        data.append([reaction_id, reaction_name, reaction_formula])
df = pd.DataFrame(data, columns=['ID', 'Name', 'Reaction'])
df

In [None]:
subset_duplicates = df[df.duplicated(subset='Reaction', keep=False)].reset_index(drop=True)
subset_duplicates = subset_duplicates.sort_values(by=['Reaction'])
subset_duplicates.to_excel('Test.xlsx')

for index, n in enumerate(subset_duplicates['Reaction']):
    next_value = subset_duplicates['Reaction'][index + 1]
    if n == next_value:
        print(subset_duplicates['Name'][index])

        

### Duplicates identified

In [None]:
print(model.metabolites.get_by_id('uppg1_c').name, model.metabolites.get_by_id('HC01609_c').name)
for r in model.metabolites.get_by_id('uppg1_c').reactions:
    print(r)
print('------')
for r in model.metabolites.get_by_id('HC01609_c').reactions:
    print(r)
# Based on reations HMR_4772 and GapFill-R04972, HC01609_c and uppg1_c  are the same 

In [None]:
print(model.metabolites.get_by_id('HC02187_c').name, model.metabolites.get_by_id('triodthy_c').name)
for r in model.metabolites.get_by_id('HC02187_c').reactions:
    print(r)
print('------')
for r in model.metabolites.get_by_id('triodthy_c').reactions:
    print(r)
# Based on reations HMR_6834 and HMR_6826, triodthy_c and HC02187_c  are the same 