Description:

In 20250425_Biosynthesis_Kinetic_Genes, 20250429_OtherMetabolism_Kinetic_Genes, and 20250501_CellProcesses_Kinetic_Genes, we have attempted to fix DEMs and induce flow to new metabolic reactions added to the model that initially had no flow in 20250307_track_reaction_usage script. Though not all attempts were successful, we can still incorporate the successful updates in a new checkpoint that acts in addition to checkpoint 1. In this checkpoint, we have also attempted triage of ~40 new genes and selected a few (~10) genes that are easy to incorporate, as indicated by green highlight in the Excel file gene_usage_fix_all.xsl. 

The purpose of this script is to incorporate the changes in the whole-cell model, whether it is adding a new transport reaction or novel reactions etc, and see how many more new genes are being used compared to the results in 20250307_track_reaction_usage and checkpoint1. To be more specific, the genes being incorporated into this checkpoint are:

G7408 (yqaB), EG11671 (atoE), EG11724 (adeP), EG10816	(rbsC), EG10818	(rbsK), EG11055	(uidA), EG12393 (ttdT), EG11054 (uhpT), EG10172 (cybB), EG10780 (pspE). 

In [1]:
import numpy as np
import ast
import pandas as pd
import os
import matplotlib.pyplot as plt
import dill
import requests
import xmltodict
import cvxpy as cp
import itertools
import networkx as nx
import plotly.express as px

from scipy.special import logsumexp

%matplotlib inline
# sns.set(style='dar|kgrid', palette='viridis', context='talk')

os.chdir(os.path.expanduser('~/dev/vivarium-ecoli'))

from ecoli.processes.metabolism_redux_classic import NetworkFlowModel, FlowResult
from ecoli.library.schema import numpy_schema, bulk_name_to_idx, listener_schema, counts
from ecoli.processes.registries import topology_registry
TOPOLOGY = topology_registry.access("ecoli-metabolism-redux")

In [2]:
# load basal new model
time = '400'
date = '2025-01-30'
experiment = 'NEW_updated_bad_rxn'
condition = 'basal'
entry = f'{experiment}_{time}_{date}'
folder = f'out/metabolism-comparison/{condition}/{entry}/'

output = np.load(folder + '0_output.npy',allow_pickle='TRUE').item()
# output = np.load(r"out/geneRxnVerifData/output_glc.npy", allow_pickle=True, encoding='ASCII').tolist()
output = output['agents']['0']
fba_unmodified = output['listeners']['fba_results']
bulk_unmodified = pd.DataFrame(output['bulk'])
f = open(folder + 'agent_steps.pkl', 'rb') 
agent_unmodified = dill.load(f)
f.close()

In [3]:
# get commonly stored variables
metabolism_unmodified = agent_unmodified['ecoli-metabolism-redux-classic']
stoichiometry_unmodified = metabolism_unmodified.stoichiometry.copy()
reaction_names_unmodified = metabolism_unmodified.reaction_names
fba_new_reaction_ids_unmodified = metabolism_unmodified.parameters["fba_new_reaction_ids"]
fba_reaction_ids_to_base_reaction_ids_unmodified = metabolism_unmodified.parameters['fba_reaction_ids_to_base_reaction_ids']
metabolites_unmodified = metabolism_unmodified.metabolite_names
binary_kinetic_idx_unmodified = metabolism_unmodified.binary_kinetic_idx
exchange_molecules_unmodified = metabolism_unmodified.exchange_molecules

S_unmodified = stoichiometry_unmodified.copy()
S_unmodified = pd.DataFrame(S_unmodified, index=metabolites_unmodified, columns=reaction_names_unmodified)
homeostatic_count_unmodified = pd.DataFrame(fba_unmodified["homeostatic_metabolite_counts"], columns=metabolism_unmodified.homeostatic_metabolites).loc[24, :]
homeostatic_unmodified = pd.DataFrame(fba_unmodified["target_homeostatic_dmdt"], columns=metabolism_unmodified.homeostatic_metabolites).loc[24, :]
maintenance_unmodified = pd.DataFrame(fba_unmodified["maintenance_target"][1:], columns=['maintenance_reaction']).iat[24, 0]

In [4]:
# load checkpoint 1 model
time = '400'
date = '2025-04-18'
experiment = 'NEW_NewGenes_checkpoint1'
condition = 'basal'
entry = f'{experiment}_{time}_{date}'
folder = f'out/metabolism-comparison/{condition}/{entry}/'

output = np.load(folder + '0_output.npy',allow_pickle='TRUE').item()
# output = np.load(r"out/geneRxnVerifData/output_glc.npy", allow_pickle=True, encoding='ASCII').tolist()
output = output['agents']['0']
fba_cp1 = output['listeners']['fba_results']
bulk_cp1 = pd.DataFrame(output['bulk'])
f = open(folder + 'agent_steps.pkl', 'rb')
agent_cp1 = dill.load(f)
f.close()

In [5]:
# get commonly stored variables
metabolism_cp1 = agent_cp1['ecoli-metabolism-redux-classic']
stoichiometry_cp1 = metabolism_cp1.stoichiometry.copy()
reaction_names_cp1 = metabolism_cp1.reaction_names
fba_new_reaction_ids_cp1 = metabolism_cp1.parameters["fba_new_reaction_ids"]
fba_reaction_ids_to_base_reaction_ids_cp1 = metabolism_cp1.parameters['fba_reaction_ids_to_base_reaction_ids']
metabolites_cp1 = metabolism_cp1.metabolite_names.copy()
binary_kinetic_idx_cp1 = metabolism_cp1.binary_kinetic_idx
exchange_molecules_cp1 = metabolism_cp1.exchange_molecules

S_cp1 = stoichiometry_cp1.copy()
S_cp1 = pd.DataFrame(S_cp1, index=metabolites_cp1, columns=reaction_names_cp1)
homeostatic_count_cp1 = pd.DataFrame(fba_cp1["homeostatic_metabolite_counts"], columns=metabolism_cp1.homeostatic_metabolites).loc[24, :]
homeostatic_cp1 = pd.DataFrame(fba_cp1["target_homeostatic_dmdt"], columns=metabolism_cp1.homeostatic_metabolites).loc[24, :]
maintenance_cp1 = pd.DataFrame(fba_cp1["maintenance_target"][1:], columns=['maintenance_reaction']).iat[24, 0]
kinetic_cp1 = pd.DataFrame(fba_cp1["target_kinetic_fluxes"], columns=metabolism_cp1.kinetic_constraint_reactions).loc[24, :].copy()

In [6]:
# load checkpoint 2 model
time = '400'
date = '2025-05-15'
experiment = 'NEW_NewGenes_checkpoint2'
condition = 'basal'
entry = f'{experiment}_{time}_{date}'
folder = f'out/metabolism-comparison/{condition}/{entry}/'

output = np.load(folder + '0_output.npy',allow_pickle='TRUE').item()
# output = np.load(r"out/geneRxnVerifData/output_glc.npy", allow_pickle=True, encoding='ASCII').tolist()
output = output['agents']['0']
fba_cp2 = output['listeners']['fba_results']
bulk_cp2 = pd.DataFrame(output['bulk'])
f = open(folder + 'agent_steps.pkl', 'rb')
agent_cp2 = dill.load(f)
f.close()

In [7]:
# get commonly stored variables
metabolism_cp2 = agent_cp2['ecoli-metabolism-redux-classic']
stoichiometry_cp2 = metabolism_cp2.stoichiometry.copy()
reaction_names_cp2 = metabolism_cp2.reaction_names
fba_new_reaction_ids_cp2 = metabolism_cp2.parameters["fba_new_reaction_ids"]
fba_reaction_ids_to_base_reaction_ids_cp2 = metabolism_cp2.parameters['fba_reaction_ids_to_base_reaction_ids']
metabolites_cp2 = metabolism_cp2.metabolite_names.copy()
binary_kinetic_idx_cp2 = metabolism_cp2.binary_kinetic_idx
exchange_molecules_cp2 = metabolism_cp2.exchange_molecules

S_cp2 = stoichiometry_cp2.copy()
S_cp2 = pd.DataFrame(S_cp2, index=metabolites_cp2, columns=reaction_names_cp2)
homeostatic_count_cp2 = pd.DataFrame(fba_cp2["homeostatic_metabolite_counts"], columns=metabolism_cp2.homeostatic_metabolites).loc[24, :]
homeostatic_cp2 = pd.DataFrame(fba_cp2["target_homeostatic_dmdt"], columns=metabolism_cp2.homeostatic_metabolites).loc[24, :]
maintenance_cp2 = pd.DataFrame(fba_cp2["maintenance_target"][1:], columns=['maintenance_reaction']).iat[24, 0]
kinetic_cp2 = pd.DataFrame(fba_cp2["target_kinetic_fluxes"], columns=metabolism_cp2.kinetic_constraint_reactions).loc[24, :].copy()

In [8]:
print(metabolism_unmodified.allowed_exchange_uptake == metabolism_cp2.allowed_exchange_uptake)
print(metabolism_unmodified.exchange_molecules == metabolism_cp2.exchange_molecules)

True
False


In [9]:
# parameters that are the same across the two simulation
kinetic_reaction_ids = metabolism_cp1.kinetic_constraint_reactions
allowed_exchange_uptake = metabolism_cp1.allowed_exchange_uptake
FREE_RXNS = ["TRANS-RXN-145", "TRANS-RXN0-545", "TRANS-RXN0-474"]
ADDED_RXNS = ['HS-TRANSPORT-RXN-CPD0-1202', 'HS-TRANSPORT-RXN-CPD0-1202 (reverse)',
                   'HS-TRANSPORT-RXN[CCO-OUTER-MEM]-OXAMATE', 'HS-TRANSPORT-RXN[CCO-OUTER-MEM]-OXAMATE (reverse)',
                   'HS-TRANSPORT-RXN[CCO-PM-BAC-NEG]-OXAMATE', 'HS-TRANSPORT-RXN[CCO-PM-BAC-NEG]-OXAMATE (reverse)',
                   'HS-BETA-GLUCURONID-RXN_CPD-3611//METOH', 'HS-SPONTANEOUS-TRANSPORT[CCO-OUTER-MEM]-HCN', 'HS-SPONTANEOUS-TRANSPORT[CCO-OUTER-MEM]-HCN (reverse)',
                   'HS-SPONTANEOUS-TRANSPORT[CCO-PM-BAC-NEG]-HCN','HS-SPONTANEOUS-TRANSPORT[CCO-PM-BAC-NEG]-HCN (reverse)']

# Define functions for testing

In [10]:
def get_subset_S(S, met_of_interest):
    S_met = S.loc[met_of_interest, :]
    S_met = S_met.loc[:,~np.all(S_met == 0, axis=0)]
    return S_met, S_met.columns

def get_keys(dict, value):
    return [key for key in dict if dict[key] == value]

def test_NetworkFlowModel(objective_weights,
                          uptake_addition = set([]), uptake_removal = set([]), new_exchange_molecules = set([]),
                          add_metabolite = None, add_reaction = None, add_kinetic = None, remove_reaction = None, force_reaction = None):
    # update exchanges
    uptake = metabolism_cp2.allowed_exchange_uptake.copy()
    uptake = set(uptake)
    uptake = uptake | uptake_addition
    uptake = uptake - uptake_removal
    
    exchange_molecules = metabolism_cp2.exchange_molecules.copy()
    exchange_molecules = exchange_molecules | new_exchange_molecules
    
    # update stoichiometry
    reaction_names = metabolism_cp2.reaction_names.copy()
    kinetic_reaction_ids = metabolism_cp2.kinetic_constraint_reactions.copy()
    kinetic = pd.DataFrame(fba_cp2["target_kinetic_fluxes"], columns=metabolism_cp2.kinetic_constraint_reactions).loc[24, :].copy()
    metabolites = metabolism_cp2.metabolite_names.copy()
    
    S_new = stoichiometry_cp2.copy()
    
    if add_metabolite is not None: # add to metabolites list because they are currently not included in the model
        for m in add_metabolite:
            if m not in metabolites:
                metabolites.append(m)
        # append rows of zeros to S_new of length add_metabolite
        S_new = np.concatenate((S_new, np.zeros((len(add_metabolite), S_new.shape[1]))), axis=0)
    
    if add_reaction is not None:
        # assert add_reaction is a dictionary
        assert isinstance(add_reaction, dict)
        
        for r,s in add_reaction.items():
            if r not in reaction_names:
                reaction_names.append(r)
            # append columns of reaction stoich to S_new of length add_reaction
            new_reaction = np.zeros((S_new.shape[0], 1))
            for m, v in s.items():
                new_reaction[metabolites.index(m), 0] = v
            S_new = np.concatenate((S_new, new_reaction), axis=1)
    
    if add_kinetic is not None:
        # assert add_kinetic is a dictionary
        assert isinstance(add_kinetic, dict)
        
        for r, v in add_kinetic.items():
            if r not in kinetic_reaction_ids:
                kinetic_reaction_ids.append(r)
                kinetic[r] = v
                
    if remove_reaction is not None:
        for r in remove_reaction:
            r_idx = reaction_names.index(r)
            S_new = np.delete(S_new, r_idx, axis=1)
            reaction_names.remove(r)
            if r in kinetic_reaction_ids:
                kinetic_reaction_ids.remove(r)
                del kinetic[r]
                
    if force_reaction is not None:
        force_reaction_idx = np.array([reaction_names.index(r) for r in force_reaction])
    else:
        force_reaction_idx = force_reaction

    # Solve NetworkFlowModel
    model = NetworkFlowModel(
            stoich_arr=S_new,
            metabolites=metabolites,
            reactions=reaction_names,
            homeostatic_metabolites=metabolism_cp2.homeostatic_metabolites,
            kinetic_reactions=kinetic_reaction_ids,
            free_reactions=FREE_RXNS)
    model.set_up_exchanges(exchanges=exchange_molecules, uptakes=uptake)
    solution: FlowResult = model.solve(
            homeostatic_concs=homeostatic_count_cp2 * metabolism_cp2.counts_to_molar.asNumber(), # in conc
            homeostatic_dm_targets=np.array(list(dict(homeostatic_cp2).values())), # *10^7
            maintenance_target=maintenance_cp2, # *10^6 ish
            kinetic_targets=np.array(list(dict(kinetic).values())), # *10^6 ish
            # binary_kinetic_idx=binary_kinetic_idx, #7646
            binary_kinetic_idx=None,
            force_flow_idx=force_reaction_idx,
            objective_weights=objective_weights, #same
            upper_flux_bound= 1000000000, # increase to 10^9 because notebook runs FlowResult using Counts, WC runs using conc.   
            solver=cp.GLOP) #SCS. ECOS, MOSEK
    print(f'Optimal objective function reached is: {solution.objective}')
    return solution.velocities, reaction_names, S_new, metabolites, kinetic

# Create Dataframes to Store Results


In [11]:
# all reactions
sim_unmodified = pd.DataFrame(fba_unmodified["estimated_fluxes"], columns= reaction_names_unmodified).loc[24, :].copy()
sim_cp1 = pd.DataFrame(fba_cp1["estimated_fluxes"], columns= reaction_names_cp1).loc[24, :].copy()
sim_cp2 = pd.DataFrame(fba_cp2["estimated_fluxes"], columns= reaction_names_cp2).loc[24, :].copy()

# kinetic reactions
kc_target_unmodified = pd.DataFrame(fba_unmodified["target_kinetic_fluxes"], columns= kinetic_reaction_ids).loc[24, :].copy()
kc_target_cp1 = pd.DataFrame(fba_cp1["target_kinetic_fluxes"], columns= kinetic_reaction_ids).loc[24, :].copy()
kc_target_cp2 = pd.DataFrame(fba_cp2["target_kinetic_fluxes"], columns= kinetic_reaction_ids).loc[24, :].copy()

In [12]:
df_all = pd.concat([sim_unmodified, sim_cp1, sim_cp2], axis=1).fillna(0)
df_all.columns = ['sim_unmodified','sim_cp1_basal', 'sim_cp2_basal']

In [13]:
df_all.head()

Unnamed: 0,sim_unmodified,sim_cp1_basal,sim_cp2_basal
1-ACYLGLYCEROL-3-P-ACYLTRANSFER-RXN,0.0,0.0,0.0
1.1.1.127-RXN,0.0,0.0,0.0
1.1.1.127-RXN (reverse),0.0,0.0,0.0
1.1.1.215-RXN (reverse),0.0,0.0,0.0
1.1.1.251-RXN,0.0,0.0,0.0


# Network Flow New Enviornments

In [14]:
conditions = {
    'G7408-α-D-fructopyranose': {
        'Add': set(['CPD-10725[e]']),
        'Remove': set(['GLC[p]']),
    },
    'G7408-β-D-fructopyranose': {
        'Add': set(['CPD-10726[e]']),
        'Remove': set(['GLC[p]']),
    },
    'G7408-keto-D-fructose':{
        'Add': set(['CPD-15382[e]']),
        'Remove': set(['GLC[p]']),
    },
    'EG11671-3-KETOBUTYRATE':{
        'Add': set(['3-KETOBUTYRATE[p]']),
        'Remove': set(['GLC[p]']),
    },
    'EG11724-KO_EG11691':{
        'Add': set(['ADENINE[e]']),
        'Remove': set([]),
        'Remove Reaction': ['TRANS-RXN0-577']
    },
    'EG10816-α-D-ribopyranose':{
        'Add': set(['CPD-15829[e]']),
        'Remove': set(['GLC[p]'])
    },
    'EG10816-β-D-ribopyranose':{
        'Add': set(['CPD0-1110[e]']),
        'Remove': set(['GLC[p]'])
    },
    'EG10818-aldehydo-D-ribose':{
        'Add': set(['CPD-15818[e]']),
        'Remove': set(['GLC[p]'])
    },
    'EG11055-1-O-methyl-β-D-glucuronate':{
        'Add': set(['CPD-3611[e]']),
        'Remove': set(['GLC[p]'])
    },
    'EG12393-ADD_TARTRATE[p]':{
        'Add': set(['TARTRATE[p]']),
        'Remove': set([])
    },
    'EG11054-α-D-mannopyranose 6-phosphate':{
        'Add': set(['MANNOSE-6P[e]']),
        'Remove': set(['GLC[p]'])
    },
    'EG10172-Oxidative_Stress':{
        'Add': set(['SUPER-OXIDE[e]']),
        'Remove': set([])
    },
    'EG10780-Thiosulfate':{
        'Add': set(['S2O3[e]','HCN[e]']),
        'Remove': set(['SULFATE[p]'])
    },
    'Isoprimeverose': {
        'Add': set(['CPD0-1202[e]']),
        'Remove': set(['GLC[p]']),
    },
    'Oxamate': {
        'Add': set(['S-ALLANTOIN[e]']),
        'Remove': set(['AMMONIUM[c]', 'OXYGEN-MOLECULE[p]']),
    },
    'Trehalose':{
        'Add': set(['TREHALOSE[e]']),
        'Remove': set(['GLC[p]']),
    },
    'GLC-1-P':{
        'Add': set(['GLC-1-P[e]']),
        'Remove': set(['GLC[p]']),
    },
    '3-HYDROXYPHENYL-PROPIONATE':{
        'Add': set(['3-HYDROXYPHENYL-PROPIONATE[e]']),
        'Remove': set(['GLC[p]'])
    },
    '3-PHENYLPROPIONATE':{
        'Add': set(['3-PHENYLPROPIONATE[e]']),
        'Remove': set(['GLC[p]'])
    },
    'cinnate':{
        'Add': set(['CPD-10797[e]', 'CPD-674[e]']),
        'Remove': set(['GLC[p]'])
    },
    'L-galactonate':{
        'Add': set(['CPD0-1083[e]']),
        'Remove': set(['GLC[p]'])
    },
    'D-GLUCARATE':{
        'Add': set(['D-GLUCARATE[e]']),
        'Remove': set(['GLC[p]'])
    }
}

In [15]:
df_all = pd.concat([sim_unmodified, sim_cp1, sim_cp2], axis=1).fillna(0)
df_all.columns = ['sim_unmodified','sim_cp1_basal', 'sim_cp2_basal']
condition_names = []
for condition_name, condition in conditions.items():
    
    if 'Remove Reaction' in condition:
        remove_reaction = condition['Remove Reaction']
    else:
        remove_reaction = None
    
    objective_weights = {'secretion': 0.01, 'efficiency': 0.000001, 'kinetics': 0.000001}
    
    solution_flux, test_reaction_names, S_new, test_metabolites, test_kinetic = test_NetworkFlowModel(
                                            objective_weights,
                                            uptake_addition=condition['Add'], uptake_removal=condition['Remove'],
                                            remove_reaction=remove_reaction,
    )    
    
    # get the fluxes
    sim_flux = pd.DataFrame({f'sim_cp2_{condition_name}': solution_flux}, index = test_reaction_names)
    condition_names.append(f'sim_cp2_{condition_name}')
    df_all = pd.concat([df_all, sim_flux], axis=1)
    
    print(f"""Finished enviornment: {condition_name}""")

Optimal objective function reached is: 19392.15326359196
Finished enviornment: G7408-α-D-fructopyranose
Optimal objective function reached is: 19392.15326359196
Finished enviornment: G7408-β-D-fructopyranose
Optimal objective function reached is: 19392.15326324289
Finished enviornment: G7408-keto-D-fructose
Optimal objective function reached is: 20080.02654779716
Finished enviornment: EG11671-3-KETOBUTYRATE
Optimal objective function reached is: 16835.095467054645
Finished enviornment: EG11724-KO_EG11691
Optimal objective function reached is: 20300.910359885554
Finished enviornment: EG10816-α-D-ribopyranose
Optimal objective function reached is: 20300.91036015805
Finished enviornment: EG10816-β-D-ribopyranose
Optimal objective function reached is: 20301.354128669984
Finished enviornment: EG10818-aldehydo-D-ribose
Optimal objective function reached is: 24898.764548893858
Finished enviornment: EG11055-1-O-methyl-β-D-glucuronate
Optimal objective function reached is: 18409.451777001108
Fi

In [16]:
df_all.head()

Unnamed: 0,sim_unmodified,sim_cp1_basal,sim_cp2_basal,sim_cp2_G7408-α-D-fructopyranose,sim_cp2_G7408-β-D-fructopyranose,sim_cp2_G7408-keto-D-fructose,sim_cp2_EG11671-3-KETOBUTYRATE,sim_cp2_EG11724-KO_EG11691,sim_cp2_EG10816-α-D-ribopyranose,sim_cp2_EG10816-β-D-ribopyranose,...,sim_cp2_EG10780-Thiosulfate,sim_cp2_Isoprimeverose,sim_cp2_Oxamate,sim_cp2_Trehalose,sim_cp2_GLC-1-P,sim_cp2_3-HYDROXYPHENYL-PROPIONATE,sim_cp2_3-PHENYLPROPIONATE,sim_cp2_cinnate,sim_cp2_L-galactonate,sim_cp2_D-GLUCARATE
1-ACYLGLYCEROL-3-P-ACYLTRANSFER-RXN,0.0,0.0,0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0
1.1.1.127-RXN,0.0,0.0,0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0
1.1.1.127-RXN (reverse),0.0,0.0,0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0
1.1.1.215-RXN (reverse),0.0,0.0,0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0
1.1.1.251-RXN,0.0,0.0,0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0


In [17]:
df_all['is_new'] = 'Old Reactions'
df_all.loc[ADDED_RXNS, 'is_new'] = 'Heena\'s Reactions'
df_all.loc[fba_new_reaction_ids_unmodified, 'is_new'] = 'New Reactions'

df_all['kinetic'] = [kinetic_cp1[r] if r in kinetic_reaction_ids else False for r in df_all.index]

In [18]:
df_all.loc[['RXN0-5186__G7408-MONOMER','TRANS-RXN0-281', 'TRANS-RXN0-447', 'ABC-28-RXN-ATP/CPD-15829/WATER//ADP/CPD-15829/Pi/PROTON.45.', 
            'ABC-28-RXN-ATP/CPD-15829/WATER//ADP/CPD0-1110/Pi/PROTON.45.', 'ABC-28-RXN-ATP/CPD-15829/WATER//ADP/D-Ribopyranose/Pi/PROTON.50.',
            'ABC-28-RXN-ATP/CPD0-1110/WATER//ADP/CPD-15829/Pi/PROTON.45.', 'ABC-28-RXN-ATP/CPD0-1110/WATER//ADP/CPD0-1110/Pi/PROTON.45.',
            'ABC-28-RXN-ATP/CPD0-1110/WATER//ADP/D-Ribopyranose/Pi/PROTON.50.', 'HS-BETA-GLUCURONID-RXN_CPD-3611//METOH',
            'THIOSULFATE-SULFURTRANSFERASE-RXN[CCO-PERI-BAC]-S2O3/HCN//HSCN/SO3/PROTON.40.','HS-SPONTANEOUS-TRANSPORT[CCO-PM-BAC-NEG]-HCN','HS-SPONTANEOUS-TRANSPORT[CCO-PM-BAC-NEG]-HCN (reverse)',
            'THIOSULFATE--THIOL-SULFURTRANSFERASE-RXN', 'TRANS-RXN-127', 'TRANS-RXN-127 (reverse)','TRANS-RXN-33', 'TRANS-RXN0-502', 'TRANS-RXN0-501', 'TRANS-RXN0-534',
            'RXN-20148-CPD-9956/OXYGEN-MOLECULE//SUPER-OXIDE/UBIQUINONE-8/PROTON.58. (reverse)']]

Unnamed: 0,sim_unmodified,sim_cp1_basal,sim_cp2_basal,sim_cp2_G7408-α-D-fructopyranose,sim_cp2_G7408-β-D-fructopyranose,sim_cp2_G7408-keto-D-fructose,sim_cp2_EG11671-3-KETOBUTYRATE,sim_cp2_EG11724-KO_EG11691,sim_cp2_EG10816-α-D-ribopyranose,sim_cp2_EG10816-β-D-ribopyranose,...,sim_cp2_Oxamate,sim_cp2_Trehalose,sim_cp2_GLC-1-P,sim_cp2_3-HYDROXYPHENYL-PROPIONATE,sim_cp2_3-PHENYLPROPIONATE,sim_cp2_cinnate,sim_cp2_L-galactonate,sim_cp2_D-GLUCARATE,is_new,kinetic
RXN0-5186__G7408-MONOMER,0.0,0.0,0.0,1571.0,1571.0,1571.0,0.0,-0.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,0.0,0.0,0.0,0.0,0.0,New Reactions,2598.0
TRANS-RXN0-281,0.0,0.0,0.0,-0.0,-0.0,-0.0,706925.55,-0.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,New Reactions,False
TRANS-RXN0-447,0.0,0.0,0.0,-0.0,-0.0,-0.0,-0.0,292974.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,New Reactions,False
ABC-28-RXN-ATP/CPD-15829/WATER//ADP/CPD-15829/Pi/PROTON.45.,0.0,0.0,0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,New Reactions,False
ABC-28-RXN-ATP/CPD-15829/WATER//ADP/CPD0-1110/Pi/PROTON.45.,0.0,0.0,0.0,-0.0,-0.0,-0.0,-0.0,-0.0,443768.887097,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,New Reactions,False
ABC-28-RXN-ATP/CPD-15829/WATER//ADP/D-Ribopyranose/Pi/PROTON.50.,0.0,0.0,0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,New Reactions,False
ABC-28-RXN-ATP/CPD0-1110/WATER//ADP/CPD-15829/Pi/PROTON.45.,0.0,0.0,0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,New Reactions,False
ABC-28-RXN-ATP/CPD0-1110/WATER//ADP/CPD0-1110/Pi/PROTON.45.,0.0,0.0,0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,443768.887097,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,New Reactions,False
ABC-28-RXN-ATP/CPD0-1110/WATER//ADP/D-Ribopyranose/Pi/PROTON.50.,0.0,0.0,0.0,-0.0,-0.0,-0.0,-0.0,0.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,New Reactions,False
HS-BETA-GLUCURONID-RXN_CPD-3611//METOH,0.0,0.0,0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,Heena's Reactions,False


# calculate % gene usage

In [19]:
%store -r df_gene_usage genes_to_enzymes df_all_gene_usage new_genes

In [20]:
conditions_previous = ['basal', 'acetate', 'rich', 'anaerobic_basal', 'anaerobic_acetate', 'anaerobic_rich']
conditions_all = conditions_previous + condition_names

df_all_gene_usage = df_all_gene_usage.copy()

gene_used_before = df_gene_usage['is_used']
all_gene_used_before = df_all_gene_usage['is_used']

df_all_gene_usage = df_all_gene_usage.drop(['is_used'], axis = 1)
df_all_gene_usage_cp2 = df_all_gene_usage.copy()
reaction_catalysts = metabolism_cp2.parameters["reaction_catalysts"]

for condition in condition_names:
    df_all_gene_usage_cp2[condition] = 0
    new_reaction_usage = df_all.loc[fba_new_reaction_ids_cp2, condition]
    for rxn in fba_new_reaction_ids_cp2:
        is_used = new_reaction_usage.loc[rxn]
        enzymes = reaction_catalysts.get(rxn,[])
        for enz in enzymes:
            enz = enz[:-3]
            genes = get_keys(genes_to_enzymes, enz)
            for gene in genes:
                if is_used:
                    df_all_gene_usage_cp2.loc[gene, condition] += 1
                if rxn in kinetic_reaction_ids:
                    df_all_gene_usage_cp2.loc[gene, 'has_kinetic'] = True


# tally usage
df_all_gene_usage_cp2['is_used'] = np.any(df_all_gene_usage_cp2[conditions_all] > 0, axis=1)
df_gene_usage_cp2 = df_all_gene_usage_cp2.loc[new_genes,:]

In [21]:
# previous % new gene usage
num_genes = len(gene_used_before)
assert len(gene_used_before) == len(df_gene_usage_cp2['is_used']), 'Different number of new genes'

perc_gene_usage_prev = sum(gene_used_before)/num_genes
perc_gene_usage_new  = sum(df_gene_usage_cp2['is_used'])/num_genes

print(f'% new genes usage before checkpoint 1 and 2 is: {perc_gene_usage_prev: 0.2%}')
print(f'          % new genes usage at checkpoint 2 is: {perc_gene_usage_new: 0.2%}')

% new genes usage before checkpoint 1 and 2 is:  4.56%
          % new genes usage at checkpoint 2 is:  17.26%


In [22]:
df_all_gene_usage_cp2[df_all_gene_usage_cp2['is_used']]

Unnamed: 0_level_0,Enzyme encoded,Reactions,has_kinetic,basal,acetate,rich,anaerobic_basal,anaerobic_acetate,anaerobic_rich,sim_cp2_G7408-α-D-fructopyranose,...,sim_cp2_Isoprimeverose,sim_cp2_Oxamate,sim_cp2_Trehalose,sim_cp2_GLC-1-P,sim_cp2_3-HYDROXYPHENYL-PROPIONATE,sim_cp2_3-PHENYLPROPIONATE,sim_cp2_cinnate,sim_cp2_L-galactonate,sim_cp2_D-GLUCARATE,is_used
Gene ID (EcoCyc),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
EG11647,ACETYL-COA-CARBOXYLTRANSFER-CPLX,[RXN0-5055],False,1,1,1,1,0,0,0,...,0,0,0,0,0,0,0,0,0,True
EG10276,BIOTIN-CARBOXYL-CPLX,[BIOTIN-CARBOXYL-RXN],True,1,1,1,1,0,0,0,...,0,0,0,0,0,0,0,0,0,True
EG10217,ACETYL-COA-CARBOXYLTRANSFER-CPLX,[RXN0-5055],False,1,1,1,1,0,0,0,...,0,0,0,0,0,0,0,0,0,True
EG10022,ISOCIT-LYASE,[ISOCIT-CLEAV-RXN],False,1,1,0,0,1,0,0,...,0,1,0,0,1,1,1,0,1,True
EG10023,MALATE-SYNTHASE,[MALSYN-RXN],False,1,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
EG12115,EG12115-MONOMER,"[RXN-14025, RXN-14026, AMP-DEPHOSPHORYLATION-R...",False,2,2,2,2,2,0,0,...,0,0,0,0,0,0,0,0,0,True
G7408,G7408-MONOMER,[RXN0-5186],True,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,True
G6393,B0752-MONOMER,[TRANS-RXN0-200],False,1,1,1,1,1,0,0,...,0,0,0,0,0,0,0,0,0,True
EG11167,CPLX0-8617,"[RXN0-12, TRANS-RXN-8, RXN0-16, RXN0-10, TRANS...",False,3,3,3,3,2,0,0,...,0,0,0,0,0,0,0,0,0,True


# Plot 1: Create dataframe used for kinetic reactions combined shiny app

In [23]:
df_all.head()

Unnamed: 0,sim_unmodified,sim_cp1_basal,sim_cp2_basal,sim_cp2_G7408-α-D-fructopyranose,sim_cp2_G7408-β-D-fructopyranose,sim_cp2_G7408-keto-D-fructose,sim_cp2_EG11671-3-KETOBUTYRATE,sim_cp2_EG11724-KO_EG11691,sim_cp2_EG10816-α-D-ribopyranose,sim_cp2_EG10816-β-D-ribopyranose,...,sim_cp2_Oxamate,sim_cp2_Trehalose,sim_cp2_GLC-1-P,sim_cp2_3-HYDROXYPHENYL-PROPIONATE,sim_cp2_3-PHENYLPROPIONATE,sim_cp2_cinnate,sim_cp2_L-galactonate,sim_cp2_D-GLUCARATE,is_new,kinetic
1-ACYLGLYCEROL-3-P-ACYLTRANSFER-RXN,0.0,0.0,0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,Old Reactions,False
1.1.1.127-RXN,0.0,0.0,0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,Old Reactions,False
1.1.1.127-RXN (reverse),0.0,0.0,0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,Old Reactions,False
1.1.1.215-RXN (reverse),0.0,0.0,0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,Old Reactions,False
1.1.1.251-RXN,0.0,0.0,0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,Old Reactions,False


In [24]:
np.all(kc_target_cp1 == kc_target_cp2)

True

In [25]:
df_kc_target  = pd.DataFrame({'new_kc_target_unmodified': kc_target_unmodified, 'new_kc_target_cp1': kc_target_cp1, 'new_kc_target_cp2': kc_target_cp2,
                       'is_new': ['New Reactions' if id in fba_new_reaction_ids_cp2 else 'Old Reactions' for id in kinetic_reaction_ids]})
cols = condition_names + ['is_new', 'kinetic']
df_kc_sim_cp2 = df_all.loc[kinetic_reaction_ids, cols]

In [26]:
df_kc_target.head()

Unnamed: 0,new_kc_target_unmodified,new_kc_target_cp1,new_kc_target_cp2,is_new
1.1.1.39-RXN,17428.0,15764.0,15764.0,Old Reactions
1.1.1.83-RXN,930.0,698.0,698.0,Old Reactions
1.13.11.16-RXN,100.0,67.0,67.0,New Reactions
2.1.1.79-RXN-CPD-18361/S-ADENOSYLMETHIONINE//CPD-18373/ADENOSYL-HOMO-CYS/PROTON.67.,87.0,90.0,90.0,New Reactions
2.1.1.79-RXN-CPD-18362/S-ADENOSYLMETHIONINE//CPD-18406/ADENOSYL-HOMO-CYS/PROTON.67.,87.0,90.0,90.0,New Reactions


In [27]:
df_kc_sim_cp2.head()

Unnamed: 0,sim_cp2_G7408-α-D-fructopyranose,sim_cp2_G7408-β-D-fructopyranose,sim_cp2_G7408-keto-D-fructose,sim_cp2_EG11671-3-KETOBUTYRATE,sim_cp2_EG11724-KO_EG11691,sim_cp2_EG10816-α-D-ribopyranose,sim_cp2_EG10816-β-D-ribopyranose,sim_cp2_EG10818-aldehydo-D-ribose,sim_cp2_EG11055-1-O-methyl-β-D-glucuronate,sim_cp2_EG12393-ADD_TARTRATE[p],...,sim_cp2_Oxamate,sim_cp2_Trehalose,sim_cp2_GLC-1-P,sim_cp2_3-HYDROXYPHENYL-PROPIONATE,sim_cp2_3-PHENYLPROPIONATE,sim_cp2_cinnate,sim_cp2_L-galactonate,sim_cp2_D-GLUCARATE,is_new,kinetic
1.1.1.39-RXN,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,15764.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,Old Reactions,15764.0
1.1.1.83-RXN,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,Old Reactions,698.0
1.13.11.16-RXN,0.0,0.0,0.0,0.0,-0.0,0.0,0.0,0.0,0.0,-0.0,...,-0.0,0.0,0.0,295747.0625,276318.423358,0.0,0.0,0.0,New Reactions,67.0
2.1.1.79-RXN-CPD-18361/S-ADENOSYLMETHIONINE//CPD-18373/ADENOSYL-HOMO-CYS/PROTON.67.,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,...,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,New Reactions,90.0
2.1.1.79-RXN-CPD-18362/S-ADENOSYLMETHIONINE//CPD-18406/ADENOSYL-HOMO-CYS/PROTON.67.,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,...,-0.0,0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,New Reactions,90.0


In [28]:
df_kc_sim_cp2_log = df_kc_sim_cp2.copy()
df_kc_sim_cp2_log['kinetic'] = df_kc_sim_cp2_log['kinetic'].apply(pd.to_numeric, errors='coerce')
cols = condition_names + [ 'kinetic']
df_kc_sim_cp2_log.loc[:,cols] += 1e-6
df_kc_sim_cp2_log.loc[:,cols] = np.log10(df_kc_sim_cp2_log.loc[:,cols])
df_kc_sim_cp2_log

Unnamed: 0,sim_cp2_G7408-α-D-fructopyranose,sim_cp2_G7408-β-D-fructopyranose,sim_cp2_G7408-keto-D-fructose,sim_cp2_EG11671-3-KETOBUTYRATE,sim_cp2_EG11724-KO_EG11691,sim_cp2_EG10816-α-D-ribopyranose,sim_cp2_EG10816-β-D-ribopyranose,sim_cp2_EG10818-aldehydo-D-ribose,sim_cp2_EG11055-1-O-methyl-β-D-glucuronate,sim_cp2_EG12393-ADD_TARTRATE[p],...,sim_cp2_Oxamate,sim_cp2_Trehalose,sim_cp2_GLC-1-P,sim_cp2_3-HYDROXYPHENYL-PROPIONATE,sim_cp2_3-PHENYLPROPIONATE,sim_cp2_cinnate,sim_cp2_L-galactonate,sim_cp2_D-GLUCARATE,is_new,kinetic
1.1.1.39-RXN,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,4.197666,...,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,Old Reactions,4.197666
1.1.1.83-RXN,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,...,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,Old Reactions,2.843855
1.13.11.16-RXN,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,...,-6.000000,-6.000000,-6.000000,5.470920,5.441410,-6.000000,-6.000000,-6.000000,New Reactions,1.826075
2.1.1.79-RXN-CPD-18361/S-ADENOSYLMETHIONINE//CPD-18373/ADENOSYL-HOMO-CYS/PROTON.67.,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,...,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,New Reactions,1.954243
2.1.1.79-RXN-CPD-18362/S-ADENOSYLMETHIONINE//CPD-18406/ADENOSYL-HOMO-CYS/PROTON.67.,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,...,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,New Reactions,1.954243
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UGD-RXN,1.875061,1.875061,1.875061,1.875061,1.875061,1.875061,1.875061,1.875061,1.875061,1.875061,...,1.875061,1.875061,1.875061,1.875061,1.875061,1.875061,1.875061,1.875061,Old Reactions,1.812913
UNDECAPRENYL-DIPHOSPHATASE-RXN[CCO-CYTOSOL]-UNDECAPRENYL-DIPHOSPHATE/WATER//CPD-9646/Pi/PROTON.64.__PGPPHOSPHAB-MONOMER,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,...,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,Old Reactions,1.000000
URA-PHOSPH-RXN,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,...,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,-6.000000,Old Reactions,3.122871
UROGENIIISYN-RXN,1.431364,1.431364,1.431364,1.431364,1.431364,1.431364,1.431364,1.431364,1.431364,1.431364,...,1.431364,1.431364,1.431364,1.431364,1.431364,1.431364,1.431364,1.431364,Old Reactions,4.350248


In [29]:
# work with df_kc_cp1 or df_kc_cp1_log
# create a dataframe with rows as reactions, col1 as flux closest to target, col2 being condition(s) that have that flux
df_kc_cp2_log_combined = pd.DataFrame({'flux': np.NaN, 'condition': 'none', 'target': df_kc_sim_cp2_log['kinetic'], 'is_new': df_kc_sim_cp2_log['is_new']}, index=kinetic_reaction_ids)

# create temporary dataframe that tracks the difference between sim flux and target
temp = df_kc_sim_cp2_log.copy()
temp.replace(0, np.nan, inplace=True)
for condition in condition_names:
    temp.loc[:, condition] = np.abs(temp.loc[:, condition] - df_kc_sim_cp2_log['kinetic'].values)

# parse by reaction (index) to find the conditions with flux closest to target (minimal difference)
for reaction in temp.index:
    # get the condition with the minimum difference
    min_diff = temp.loc[reaction, condition_names].min()
    # get the condition(s) with the minimum difference
    conditions = temp.loc[reaction, condition_names][temp.loc[reaction, condition_names] == min_diff].index.tolist()
    # update the dataframe
    if np.all(df_kc_sim_cp2_log.loc[reaction,condition_names] == -6):
        df_kc_cp2_log_combined.loc[reaction, 'flux'] = -6
    else:
        df_kc_cp2_log_combined.loc[reaction, 'flux'] = df_kc_sim_cp2_log.loc[reaction,conditions[0]]
        if conditions == condition_names:
            df_kc_cp2_log_combined.loc[reaction, 'condition'] = 'all'
        else:
            df_kc_cp2_log_combined.loc[reaction, 'condition'] = ', '.join(conditions)
# 
# df_kc_cp1_log_combined = df_kc_cp1_combined.copy()         
# df_kc_cp1_log_combined['target'] = df_kc_cp1_log_combined['target'].apply(pd.to_numeric, errors='coerce')
# df_kc_cp1_log_combined.loc[:,['flux','target']] += 1e-6
# df_kc_cp1_log_combined.loc[:,['flux','target']] = np.log10(df_kc_cp1_log_combined.loc[:,['flux','target']])
# 
df_kc_cp2_log_combined.to_csv('notebooks/Heena notebooks/Metabolism_New Genes/df_kc_cp2_combined.csv', index=True)

# Plot 2: Create histogram to track gene usage

In [30]:
%store -r df_all_gene_usage_cp1 df_all_gene_usage

In [31]:
df_all_gene_usage.head()

Unnamed: 0_level_0,Enzyme encoded,Reactions,has_kinetic,basal,acetate,rich,anaerobic_basal,anaerobic_acetate,anaerobic_rich,is_used
Gene ID (EcoCyc),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
EG11679,AAS-MONOMER,"[ACYLGPEACYLTRANS-RXN, RXN-5741, RXN0-5513]",False,0,0,0,0,0,0,False
EG11647,ACETYL-COA-CARBOXYLTRANSFER-CPLX,[RXN0-5055],False,1,1,1,1,0,0,True
EG10276,BIOTIN-CARBOXYL-CPLX,[BIOTIN-CARBOXYL-RXN],True,1,1,1,1,0,0,True
EG10217,ACETYL-COA-CARBOXYLTRANSFER-CPLX,[RXN0-5055],False,1,1,1,1,0,0,True
EG10022,ISOCIT-LYASE,[ISOCIT-CLEAV-RXN],False,1,1,0,0,1,0,True


In [32]:
df_gene_usage = df_all_gene_usage.loc[new_genes,:]
df_gene_usage_cp1 = df_all_gene_usage_cp1.loc[new_genes,:]

In [33]:
# % new gene usage
num_genes = len(df_gene_usage)
perc_gene_usage_basal = sum(df_gene_usage_cp2.basal > 0)/num_genes * 100
perc_gene_usage_basic_conditions = sum(gene_used_before)/num_genes * 100
perc_gene_usage_cp1  = sum(df_gene_usage_cp1['is_used'])/num_genes * 100
perc_gene_usage_cp2  = sum(df_gene_usage_cp2['is_used'])/num_genes * 100

perc = [perc_gene_usage_basal, perc_gene_usage_basic_conditions, perc_gene_usage_cp1, perc_gene_usage_cp2] 
x_label = ['Basal', 'Basal, Acetate, Rich, Anaerobic', 'Checkpoint 1', 'Checkpoint 2']

In [34]:
# % all gene usage
num_genes = len(df_all_gene_usage)
perc_all_gene_usage_basal = sum(df_all_gene_usage_cp2.basal > 0)/num_genes * 100
perc_all_gene_usage_basic_conditions = sum(df_all_gene_usage['is_used'])/num_genes * 100
perc_all_gene_usage_cp1  = sum(df_all_gene_usage_cp1['is_used'])/num_genes * 100
perc_all_gene_usage_cp2  = sum(df_all_gene_usage_cp2['is_used'])/num_genes * 100

perc_all = [perc_all_gene_usage_basal, perc_all_gene_usage_basic_conditions, perc_all_gene_usage_cp1, perc_all_gene_usage_cp2]

In [45]:
# Create a DataFrame
df = pd.DataFrame({
    "Condition": x_label,
    "New Metabolic Gene": perc,
    "All Metabolic Gene": perc_all,
})

# Reshape to long format
df_long = df.melt(id_vars="Condition", 
                  value_vars=["New Metabolic Gene", "All Metabolic Gene"],
                  var_name="Gene Group",
                  value_name="Percent Usage")

# Create grouped bar plot
fig = px.bar(
    df_long,
    x="Condition",
    y="Percent Usage",
    color="Gene Group",              # creates a legend
    barmode="group",                 # side-by-side bars
    text="Percent Usage",
    title="New vs All Metabolic Gene Usage by Condition",
    labels={"Percent Usage": "Percent gene usage (%)"},
    color_discrete_map={
        "New Metabolic Gene": "#4C78A8",
        "All Metabolic Gene": "#c26426"
    }
)

# Customize appearance
fig.update_traces(texttemplate='%{text:.2f}%', textposition='outside')
fig.update_layout(
    paper_bgcolor='rgba(0,0,0,0)',
    yaxis_title="Percent gene usage (%)",
    yaxis=dict(range=[0, 45]),
    xaxis_title=None,
    xaxis_tickangle=-35,
    uniformtext_minsize=8,
    uniformtext_mode='hide',
    margin=dict(t=50, b=50, l=50, r=50),
    width=800,   # width in pixels
    height=500
)

# fig.show(renderer='browser')
fig.write_image('notebooks/Heena notebooks/Metabolism_New Genes/gene_usage_histogram_cp2.png', width=800, height=500, scale=5)

# Plot 3: Create heatmap of fluxes going through the genes

In [36]:
%store -r genes_to_enzymes

In [37]:
def get_keys(dict, value):
    return [key for key in dict if dict[key] == value]

In [38]:
reaction_catalysts = metabolism_cp2.parameters["reaction_catalysts"]
conditions = df_all.columns[:-2]

heatmap_gene_flux = pd.DataFrame(index = df_all_gene_usage_cp2.index, columns = conditions, data=0)

# len(reaction_names)
# basal_new_reaction_usage = df_reaction_track.loc[fba_new_reaction_ids,'basal']
for condition in conditions:
    for rxn in reaction_names_cp2:
        enzymes = reaction_catalysts.get(rxn,[])
        for enz in enzymes:
            enz = enz[:-3]
            genes = get_keys(genes_to_enzymes, enz)
            for gene in genes:
                heatmap_gene_flux.loc[gene, condition] += df_all.loc[rxn, condition]

In [40]:
# normalize the fluxes by row - condition
heatmap_gene_flux_normalized = heatmap_gene_flux.copy()
heatmap_gene_flux_normalized = heatmap_gene_flux_normalized.div(heatmap_gene_flux_normalized.max(axis=1), axis=0)
heatmap_gene_flux_normalized.fillna(-1, inplace=True)
heatmap_gene_flux_normalized

Unnamed: 0_level_0,sim_unmodified,sim_cp1_basal,sim_cp2_basal,sim_cp2_G7408-α-D-fructopyranose,sim_cp2_G7408-β-D-fructopyranose,sim_cp2_G7408-keto-D-fructose,sim_cp2_EG11671-3-KETOBUTYRATE,sim_cp2_EG11724-KO_EG11691,sim_cp2_EG10816-α-D-ribopyranose,sim_cp2_EG10816-β-D-ribopyranose,...,sim_cp2_EG10780-Thiosulfate,sim_cp2_Isoprimeverose,sim_cp2_Oxamate,sim_cp2_Trehalose,sim_cp2_GLC-1-P,sim_cp2_3-HYDROXYPHENYL-PROPIONATE,sim_cp2_3-PHENYLPROPIONATE,sim_cp2_cinnate,sim_cp2_L-galactonate,sim_cp2_D-GLUCARATE
Gene ID (EcoCyc),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
EG11679,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,...,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000
EG11647,1.000000,0.800000,0.800000,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333,...,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333
EG10276,1.000000,0.800000,0.800000,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333,...,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333
EG10217,1.000000,0.800000,0.800000,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333,...,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333,0.733333
EG10022,0.008442,0.000000,0.000000,0.000000,0.000000,0.000000,1.000000,0.031923,0.000000,0.000000,...,0.002336,0.000000,0.859592,0.000000,0.000000,0.105241,0.165488,0.147124,0.000000,0.084222
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
EG12215,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,...,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000
G7017,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,...,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000
G7018,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,...,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000
EG11167,1.000000,0.880829,0.880829,0.880829,0.880829,0.880829,0.880829,0.880829,0.880829,0.880829,...,0.880829,0.880829,0.880829,0.880829,0.880829,0.880829,0.880829,0.880829,0.880829,0.880829


In [41]:
heatmap_new_gene_flux_normalized = heatmap_gene_flux_normalized.loc[new_genes,:]

In [47]:
from plotly import graph_objects as go
data = heatmap_new_gene_flux_normalized

# Convert to numpy array for masking
z = data.values

# Create custom colorscale: map 0 to grey, and nonzero values to white→red
custom_colorscale = [
    [0.0, 'rgb(211,211,211)'],  # grey for zeros
    [1e-8, 'rgb(255,255,255)'], # white just above zero
    [1.0, '#b53131']   # soft red for max
]

# Mask 0s slightly to ensure they stay grey

fig = go.Figure(data=go.Heatmap(
    z=z,
    x=data.columns,
    y=data.index,
    colorscale=custom_colorscale,
    colorbar=dict(title='Normalized Flux'),
    zmin=0,
    zmax=1,  # ensures white → red applies to nonzeros
    hoverongaps=False
))

fig.update_layout(
    title='Gene Flux Heatmap (Normalized)',
    xaxis_title='Condition',
    yaxis_title='Gene ID',
    height=1000,
)

# fig.show(renderer='browser')
fig.write_html('notebooks/Heena notebooks/Metabolism_New Genes/gene_usage_bt_condition_heatmap_cp2.html', include_plotlyjs='cdn')

In [48]:
%store df_all_gene_usage_cp2

Stored 'df_all_gene_usage_cp2' (DataFrame)
