# orthoMCL_GSM_geneComparisons.

This notebook seeks to reconcile genes and reactions for the *Lipomyces starkeyi* GSM. Gene orthologs were obtained via orthoMCL (provided by Dr. Yichao Han for four species). 


GSMs were originally built using only bbh between the scaffold species and *L. starkeyi*. As an example: bbh was performed between *Rhodosporidium toruloides* and *L. starkeyi*. Reactions with suspected *L. starkeyi* genes were ported from the rto scaffold to the lst GSM. 


In [1]:
import cobra
import pandas as pd
import numpy as np

In [2]:
# load GSMs.
rto = cobra.io.load_json_model('../models/Rt_IFO0880.json')
sce = cobra.io.load_matlab_model('../models/yeast8_modifiedwBIGGnames.mat')
ylip = cobra.io.load_matlab_model('../models/twoModels/iYLI647_corr.mat')

# lst_from_sce = cobra.io.load_matlab_model("../models/Lst_v_0.0_modelRemovedYeast8Genes.matlab")
# lst_from_ylip = cobra.io.load_json_model('../models/OLD_models_prior16Dec22/Lst_v0.1_model_yarrowiaBased.json')

# lst_from_rto = cobra.io.load_json_model("../models/Lst_v_0.1_modelRemovedGenes.json")

Set parameter TokenServer to value "leghorn.emsl.pnl.gov"


This model seems to have confidenceScores instead of rxnConfidenceScores field. Will use confidenceScores for what rxnConfidenceScores represents.
This model seems to have metCharge instead of metCharges field. Will use metCharge for what metCharges represents.
No defined compartments in model model. Compartments will be deduced heuristically using regular expressions.
Using regular expression found the following compartments:c, e, g, m, n, r, v, x


In [3]:
# adjust iYLI647 parameters for consistent annotation. 
for m in ylip.metabolites:
    if ('[' in m.id):
        ylip.metabolites.get_by_id(m.id).id = m.id.replace('[','_').replace(']','')
    if ('_L_' in m.id):
        ylip.metabolites.get_by_id(m.id).id = m.id.replace('_L_','__L_')
    if ('_D_' in m.id):
        ylip.metabolites.get_by_id(m.id).id = m.id.replace('_D_','__D_')
for r in ylip.reactions:
    if ('(e)' in r.id):
        ylip.reactions.get_by_id(r.id).id = r.id.replace('(e)','_e')
    if(('_L_') in r.id):
        ylip.reactions.get_by_id(r.id).id = r.id.replace('_L_','__L_')

    if(('_D_') in r.id):
        ylip.reactions.get_by_id(r.id).id = r.id.replace('_D_','__D_')


load the *L. starkeyi* framework model. 

In [4]:
## below is the original model used. 
# model = cobra.io.load_json_model("../models/Lst_v_0.1_modelRemovedGenes.json")
## this is a model to verify. 
model = cobra.io.load_json_model("../models/Lst_v_0.0_removedRTOGenes_forPUB.json")


### load orthoMCL data. 

In [5]:
orthoMCL = pd.read_csv('../orthoMCL/orthoMCL_orthologs_2023.csv')
orthoMCL.head(4)

Unnamed: 0.1,Unnamed: 0,Lst,Rto,Sce,Yli,s2,s3,s4
0,Lst|G1IA6-22289-MONOMER,Lipst1_1_113024,10878.0,YML069W,YALI0D25058g,1.447,1.776,1.588
1,Lst|G1IA6-16751-MONOMER,Lipst1_1_441,14212.0,,YALI0F19734g,1.055,,1.272
2,Lst|G1IA6-23047-MONOMER,Lipst1_1_175925,12668.0,YKL046C,YALI0D02563g,0.159,1.579,1.284
3,Lst|G1IA6-17901-MONOMER,Lipst1_1_91016,,,YALI0E00396g,,,0.895


In [6]:
# obtain the genes present in each GSM. 

g = ylip.genes
ylip_GSM_genes = [temp.id for temp in g]

g = rto.genes
rto_GSM_genes = [temp.id for temp in g]

g = sce.genes
sce_GSM_genes = [temp.id for temp in g]

In [7]:
# the number of genes in the GSM that have Lst orthologs. 

# rto to Lst
print(len(set(orthoMCL.Rto.to_list()).intersection(set(rto_GSM_genes))))

# sce to Lst
print(len(set(orthoMCL.Sce.to_list()).intersection(set(sce_GSM_genes))))

# yli to Lst
print(len(set(orthoMCL.Yli.to_list()).intersection(set(ylip_GSM_genes))))

810
790
512


In [8]:
# more genes are present in the direct bbh blasting scaffolds for GSMs. 
# print(len(lst_from_rto.genes))
# print(len(lst_from_sce.genes))
# print(len(lst_from_ylip.genes))

In [9]:
# create dictionary mapping the genes for each species. 
lst_rto_dict = dict(zip(orthoMCL.Rto,orthoMCL.Lst))
lst_sce_dict = dict(zip(orthoMCL.Sce,orthoMCL.Lst))
lst_ylip_dict = dict(zip(orthoMCL.Yli,orthoMCL.Lst))



### next to do.

comparisons between GSMs using python. 


In [10]:
# determine reactions annotations in all three GSMs. 
# and either ylip:rto or sce:rto GSMs.

ylip_rto_overlap = []
ylip_sce_rto_overlap = []
rto_sce_overlap = []

# start with Rhodo GSM reaction names. 
for r in rto.reactions:
    
    # search for ID in yarrowia. 
    if r.id in ylip.reactions:
        
        # now search for ID in saccharomyces.
        if r.id in sce.reactions:
            ylip_sce_rto_overlap.append(r.id)
        
        # reaction not in saccharomyces. 
        # ylip and rto overlapping reactions. 
        else:
            ylip_rto_overlap.append(r.id)
    
    # reaction not in yarrowia.
    # sce and rto overlapping reactions.
    elif r.id in sce.reactions:
        rto_sce_overlap.append(r.id)


                

In [11]:
# obtain the ylip:sce reaction overlap. 
# obtain the sce and ylip only genes. 

ylip_sce_overlap = []
ylip_rxn_only = []
ylip_gene_only = []
sce_rxn_only = []
sce_gene_only = []

# now loop through the yarrowia reactions. 
for r in ylip.reactions:
    
    # check if the id is not already in one of the previously determined lists. 
    if (r.id not in ylip_sce_rto_overlap) and (r.id not in rto_sce_overlap) and (r.id not in ylip_rto_overlap):
        
        # check if the reaction is in saccharomyces. 
        if r.id in sce.reactions:
            ylip_sce_overlap.append(r.id)
        
        # reaction is only present in yarrowia. 
        else:
            ylip_genes = r.gene_reaction_rule
            
            # obtain the gpr for the reaction if it is also present in the orthoMCL groups. 
            for g in [ylip_genes]:
                if g in set(orthoMCL.Yli.to_list()).intersection(set(ylip_GSM_genes)):
                    ylip_rxn_only.append(r.id)
                    ylip_gene_only.append(g)

# now loop through saccharomyces reactions. 
for r in sce.reactions:
    
    # check if the reaction has already been inlcuded in one of the previously determined lists. 
    if (r.id not in ylip_sce_rto_overlap) and (r.id not in rto_sce_overlap) and (r.id not in ylip_rto_overlap) and (r.id not in ylip_sce_overlap):
        
        # reaction is only in saccharomyces. 
        sce_genes = r.gene_reaction_rule
        
        # obtain the gpr for the reaction if it is also present in the orthoMCL groups. 
        for g in [sce_genes]:
            if g in set(orthoMCL.Yli.to_list()).intersection(set(ylip_GSM_genes)):
                sce_rxn_only.append(r.id)
                sce_gene_only.append(g)
                

## Grab the L. starkeyi ortholog for reactions that have genes present in the orthoMCL groups.

#### note, this section does not modify the GSM. It just obtains orthologs and new GPRs.


compare all three sets. 

In [12]:
# definition to compare the genes in common reactions. 
def compare_three_lists(list1, list2, list3):

    
    # Convert lists to sets.
    set1, set2, set3 = set(list1), set(list2), set(list3)
    
    # Find all intersections and differences.
    common_all = set1 & set2 & set3
    
    only_12 = (set1 & set2) - set3
    only_13 = (set1 & set3) - set2
    only_23 = (set2 & set3) - set1
    
    only_1 = set1 - set2 - set3
    only_2 = set2 - set1 - set3
    only_3 = set3 - set1 - set2
    
    all_three_diff = (set1 | set2 | set3) - common_all - only_12 - only_13 - only_23 - only_1 - only_2 - only_3
    
    # Construct verbose explanation.
    explanation = {}
    
#     explanation['common_all'] = f"Items common in all three lists: {list(common_all)}"
#     explanation['only_12'] = f"Items found only in list 1 and list 2: {list(only_12)}"
#     explanation['only_13'] = f"Items found only in list 1 and list 3: {list(only_13)}"
#     explanation['only_23'] = f"Items found only in list 2 and list 3: {list(only_23)}"
#     explanation['only_1'] = f"Items unique to list 1: {list(only_1)}"
#     explanation['only_2'] = f"Items unique to list 2: {list(only_2)}"
#     explanation['only_3'] = f"Items unique to list 3: {list(only_3)}"
#     explanation['all_three_diff'] = f"Items present in all three lists but not shared by all of them simultaneously: {list(all_three_diff)}"
    
    explanation['common_all'] = list(common_all)
    explanation['only_12'] = list(only_12)
    explanation['only_13'] = list(only_13)
    explanation['only_23'] = list(only_23)
    explanation['only_1'] = list(only_1)
    explanation['only_2'] = list(only_2)
    explanation['only_3'] = list(only_3)
    explanation['all_three_diff'] = list(all_three_diff)
    
    
    return explanation


# gene overlap in all three.

_names = []
_gpr_sce = []
_gpr_ylip = []
_gpr_rto = []

_lst_rto_gene = []
_lst_sce_gene = []
_lst_ylip_gene = []

_id = []
_stoich = []
_subsystem = []
_lst_gene = []
_ylip_gene = []


consensus = []
consensus_count = []
consensus_source = []
_output = []


_common_all = []
_only_12 = []
_only_13 = []
_only_23 = []
_only_1 = []
_only_2 = []
_only_3 = []
_all_three_diff = []

# loop through the reactions present in all three GSMs and determine the L. starkeyi ortholog. 
for rxn in ylip_sce_rto_overlap:
# test cases: 
# for rxn in ['NNDPRm','CSp','MTHFCm','PYRt2','PDE4','CSm']:
    r = rto.reactions.get_by_id(rxn)
    y = ylip.reactions.get_by_id(rxn)
    s = sce.reactions.get_by_id(rxn)
    
    _names.append(r.name)
    _gpr_rto.append(r.gene_reaction_rule)
    _gpr_sce.append(s.gene_reaction_rule)
    _gpr_ylip.append(y.gene_reaction_rule)

    _id.append(r.id)
    _stoich.append(r.reaction)
    _subsystem.append(r.subsystem)
    
    
    # rto genes.
    temp1 = []
    for g in r.genes:
        try:
            temp1.append(lst_rto_dict[g.id])
        except:
            temp1.append('NA')
            donothing=0
    _lst_rto_gene.append(temp1)
    
    # sce genes.
    temp2 = []
    for g in s.genes:
        try:
            temp2.append(lst_sce_dict[g.id])
        except:
            temp2.append('NA')
            donothing=0
    _lst_sce_gene.append(temp2)
    
    # ylip genes. 
    temp3 = []
    for g in y.genes:
        try:
            temp3.append(lst_ylip_dict[g.id])
        except:
            temp3.append('NA')
            donothing=0
    _lst_ylip_gene.append(temp3)

    

    ### comparisions below - general gene info is above.

    # remove NA for comparisions. 
    if 'NA' in temp1:
        temp1.remove('NA')
        temp1 = [e for e in temp1 if e not in ('NA')]
    if 'NA' in temp2:
        temp2.remove('NA')
        temp2 = [e for e in temp2 if e not in ('NA')]

    if 'NA' in temp3:
        temp3.remove('NA')
        temp3 = [e for e in temp3 if e not in ('NA')]
    
    result = compare_three_lists(temp1, temp2, temp3)
    _output.append(result)
    
    
    _common_all.append(result['common_all'])
    _only_12.append(result['only_12'])
    _only_13.append(result['only_13'])
    _only_23.append(result['only_23'])
    _only_1.append(result['only_1'])
    _only_2.append(result['only_2'])
    _only_3.append(result['only_3'])
    _all_three_diff.append(result['all_three_diff'])
#     for key, value in result.items():
        
#         print(f"{key}: {value}")

In [13]:
# df3 = pd.DataFrame([_names,_id,_stoich,_subsystem,_gpr_rto,_gpr_sce,_gpr_ylip,_lst_rto_gene,_lst_sce_gene,_lst_ylip_gene,_common_all,_only_12,_only_13,_only_23,_only_1,_only_2,_only_3,_all_three_diff]).T

In [14]:
# df3

In [15]:
# df3 = pd.DataFrame([_names,_id,_stoich,_subsystem,_gpr_rto,_gpr_sce,_gpr_ylip,_lst_rto_gene,_lst_sce_gene,_lst_ylip_gene,_common_all,_only_12,_only_13,_only_23,_only_1,_only_2,_only_3,_all_three_diff]).T
# df3.to_csv('allcomparisions_genes.csv')

compare the ylip:sce overlap. 

In [16]:
# rxn overlap for ylip and sce.    
# _names = []
# _gpr_sce = []
# _gpr_ylip = []
# _gpr_rto = []

# _lst_rto_gene = []
# _lst_sce_gene = []
# _lst_ylip_gene = []

# _id = []
# _stoich = []
# _subsystem = []
# _lst_gene = []
# _ylip_gene = []

# _common_all = []
# _only_12 = []
# _only_13 = []
# _only_23 = []
# _only_1 = []
# _only_2 = []
# _only_3 = []
# _all_three_diff = []

for rxn in ylip_sce_overlap:
#     r = rto.reactions.get_by_id(rxn)
    y = ylip.reactions.get_by_id(rxn)
    s = sce.reactions.get_by_id(rxn)
    
    _names.append(s.name)
#     _gpr_rto.append(r.gene_reaction_rule)
    _gpr_rto.append('None')
    _gpr_sce.append(s.gene_reaction_rule)
    _gpr_ylip.append(y.gene_reaction_rule)

    _id.append(s.id)
    _stoich.append(s.reaction)
    _subsystem.append(s.subsystem)
    
    temp1 = []
#     for g in [r.gene_reaction_rule]:
#         try:
#             temp1.append(lst_rto_dict[g])
#         except:
    temp1.append('NA')
    _lst_rto_gene.append(temp1)
    
    temp2 = []
    for g in s.genes:
        try:
            temp2.append(lst_sce_dict[g.id])
        except:
            temp2.append('NA')
            donothing=0
    _lst_sce_gene.append(temp2)
    
    temp3 = []
    for g in y.genes:
        try:
            temp3.append(lst_ylip_dict[g.id])
        except:
            temp3.append('NA')
            donothing=0

    _lst_ylip_gene.append(temp3)
    
    # remove NA for comparisions. 
    if 'NA' in temp1:
        temp1.remove('NA')
        temp1 = [e for e in temp1 if e not in ('NA')]

    if 'NA' in temp2:
        temp2.remove('NA')
        temp2 = [e for e in temp2 if e not in ('NA')]

    if 'NA' in temp3:
        temp3.remove('NA')
        temp3 = [e for e in temp3 if e not in ('NA')]
    
    result = compare_three_lists(temp1, temp2, temp3)
    
    
    _common_all.append(result['common_all'])
    _only_12.append(result['only_12'])
    _only_13.append(result['only_13'])
    _only_23.append(result['only_23'])
    _only_1.append(result['only_1'])
    _only_2.append(result['only_2'])
    _only_3.append(result['only_3'])
    _all_three_diff.append(result['all_three_diff'])
#     for key, value in result.items():
           

In [17]:
# df3 = pd.DataFrame([_names,_id,_stoich,_subsystem,_gpr_sce,_gpr_ylip,_lst_sce_gene,_lst_ylip_gene,_common_all,_only_12,_only_13,_only_23,_only_1,_only_2,_only_3,_all_three_diff]).T

In [18]:
# df3.head()

In [19]:
# df3.to_csv('ylip_sce_rxn_overlap_genes.csv')

compare the ylip:rto overlap. 

In [20]:
# overlap ylip and rto.  
# _names = []
# _gpr_sce = []
# _gpr_ylip = []
# _gpr_rto = []

# _lst_rto_gene = []
# _lst_sce_gene = []
# _lst_ylip_gene = []

# _id = []
# _stoich = []
# _subsystem = []
# _lst_gene = []
# _ylip_gene = []

# _common_all = []
# _only_12 = []
# _only_13 = []
# _only_23 = []
# _only_1 = []
# _only_2 = []
# _only_3 = []
# _all_three_diff = []

for rxn in ylip_rto_overlap:
    r = rto.reactions.get_by_id(rxn)
    y = ylip.reactions.get_by_id(rxn)
#     s = sce.reactions.get_by_id(rxn)
    
    _names.append(r.name)
    _gpr_rto.append(r.gene_reaction_rule)
#     _gpr_sce.append(s.gene_reaction_rule)
    _gpr_sce.append('None')
    
    
    _gpr_ylip.append(y.gene_reaction_rule)

    _id.append(r.id)
    _stoich.append(r.reaction)
    _subsystem.append(r.subsystem)
    
    temp1 = []
    for g in r.genes:
        try:
            temp1.append(lst_rto_dict[g.id])
        except:
            temp1.append('NA')
            donothing=0

    _lst_rto_gene.append(temp1)
    
    temp2 = []
    temp2.append('NA')
#     for g in [s.gene_reaction_rule]:
#         try:
#             temp2.append(lst_sce_dict[g])
#         except:
#             temp2.append('NA')
    _lst_sce_gene.append(temp2)
    
    temp3 = []
    for g in y.genes:
        try:
            temp3.append(lst_ylip_dict[g.id])
        except:
            temp3.append('NA')
            donothing=0

    _lst_ylip_gene.append(temp3)
      
        
    
    # remove NA for comparisions. 
    # remove NA for comparisions. 
    if 'NA' in temp1:
        temp1.remove('NA')
        temp1 = [e for e in temp1 if e not in ('NA')]
    
    if 'NA' in temp2:
        temp2.remove('NA')
        temp2 = [e for e in temp2 if e not in ('NA')]

    if 'NA' in temp3:
        temp3.remove('NA')
        temp3 = [e for e in temp3 if e not in ('NA')]
    
    result = compare_three_lists(temp1, temp2, temp3)
    
    _common_all.append(result['common_all'])
    _only_12.append(result['only_12'])
    _only_13.append(result['only_13'])
    _only_23.append(result['only_23'])
    _only_1.append(result['only_1'])
    _only_2.append(result['only_2'])
    _only_3.append(result['only_3'])
    _all_three_diff.append(result['all_three_diff'])

In [21]:
# df3 = pd.DataFrame([_names,_id,_stoich,_subsystem,_gpr_rto,_gpr_ylip,_lst_rto_gene,_lst_ylip_gene,_common_all,_only_12,_only_13,_only_23,_only_1,_only_2,_only_3,_all_three_diff]).T

In [22]:
# df3.to_csv('ylip_rto_overlap_genes.csv')

compare the sce:rto overlap. 

In [23]:
# overlap sce and rto.    
# _names = []
# _gpr_sce = []
# _gpr_ylip = []
# _gpr_rto = []

# _lst_rto_gene = []
# _lst_sce_gene = []
# _lst_ylip_gene = []

# _id = []
# _stoich = []
# _subsystem = []
# _lst_gene = []
# _ylip_gene = []

# _common_all = []
# _only_12 = []
# _only_13 = []
# _only_23 = []
# _only_1 = []
# _only_2 = []
# _only_3 = []
# _all_three_diff = []

for rxn in rto_sce_overlap:
    r = rto.reactions.get_by_id(rxn)
#     y = ylip.reactions.get_by_id(rxn)
    s = sce.reactions.get_by_id(rxn)
    
    _names.append(r.name)
    _gpr_rto.append(r.gene_reaction_rule)
    _gpr_sce.append(s.gene_reaction_rule)
#     _gpr_ylip.append(y.gene_reaction_rule)
    _gpr_ylip.append('None')


    _id.append(r.id)
    _stoich.append(r.reaction)
    _subsystem.append(r.subsystem)
    
    temp1 = []
    for g in r.genes:
        try:
            temp1.append(lst_rto_dict[g.id])
            print(temp1)
        except:
            temp1.append('NA')
            donothing=0

    _lst_rto_gene.append(temp1)
    
    temp2 = []
    for g in s.genes:
        try:
            temp2.append(lst_sce_dict[g.id])
        except:
#             temp2.append('NA')
            donothing=0
            
    _lst_sce_gene.append(temp2)
    
    temp3 = []
    temp3.append('NA')
    
#     for g in [y.gene_reaction_rule]:
#         try:
#             temp.append(lst_ylip_dict[g])
#         except:
#             temp.append('NA')
    _lst_ylip_gene.append(temp3)
    
    # remove NA for comparisions. 
    if 'NA' in temp1:
        temp1.remove('NA')
        temp1 = [e for e in temp1 if e not in ('NA')]
    if 'NA' in temp2:
        temp2.remove('NA')
        temp2 = [e for e in temp2 if e not in ('NA')]

    if 'NA' in temp3:
        temp3.remove('NA')
        temp3 = [e for e in temp3 if e not in ('NA')]
        
        
    result = compare_three_lists(temp1, temp2, temp3)
    
    
    _common_all.append(result['common_all'])
    _only_12.append(result['only_12'])
    _only_13.append(result['only_13'])
    _only_23.append(result['only_23'])
    _only_1.append(result['only_1'])
    _only_2.append(result['only_2'])
    _only_3.append(result['only_3'])
    _all_three_diff.append(result['all_three_diff'])           

['NA', 'NA', 'NA', 'Lipst1_1_6303']
['NA', 'NA', 'NA', 'Lipst1_1_74900']
['Lipst1_1_314464']
['Lipst1_1_5453']
['Lipst1_1_265510']
['Lipst1_1_265510', 'Lipst1_1_127634']
['Lipst1_1_289621']
['Lipst1_1_70370']
['Lipst1_1_76289']
['Lipst1_1_105640']
['Lipst1_1_4472']
['NA', 'NA', 'NA', 'Lipst1_1_6303']
['Lipst1_1_102591']
['Lipst1_1_314768']
['Lipst1_1_423']
['Lipst1_1_1263']
['Lipst1_1_2889']
['Lipst1_1_107753']
['Lipst1_1_73838']
['Lipst1_1_3878']
['Lipst1_1_53536']
['Lipst1_1_72751']
['NA', 'Lipst1_1_2256']
['NA', 'Lipst1_1_2256', 'Lipst1_1_61026']
['Lipst1_1_76289']
['Lipst1_1_338549']
['Lipst1_1_48101']
['Lipst1_1_27755']
['NA', 'Lipst1_1_282']
['Lipst1_1_81155']
['Lipst1_1_46126']
['Lipst1_1_142956']
['Lipst1_1_3974']
['Lipst1_1_3974']
['Lipst1_1_246321']
['Lipst1_1_2195']
['Lipst1_1_314464']
['Lipst1_1_46126']
['Lipst1_1_3313']
['NA', 'Lipst1_1_102207']
['Lipst1_1_71146']
['Lipst1_1_1110']
['Lipst1_1_931']
['Lipst1_1_76359']
['NA', 'NA', 'Lipst1_1_69867']
['NA', 'NA', 'Lipst1_1_69

In [24]:
# df3 = pd.DataFrame([_names,_id,_stoich,_subsystem,_gpr_rto,_gpr_sce,_lst_rto_gene,_lst_sce_gene,_common_all,_only_12,_only_13,_only_23,_only_1,_only_2,_only_3,_all_three_diff]).T

In [25]:
# df3.to_csv('rto_sce_overlap_genes.csv')

In [26]:
# save output. 
# df3 = pd.DataFrame([_names,_id,_stoich,_subsystem,_gpr_rto,_gpr_sce,_gpr_ylip,_lst_rto_gene,_lst_sce_gene,_lst_ylip_gene,_common_all,_only_12,_only_13,_only_23,_only_1,_only_2,_only_3,_all_three_diff]).T
# df3.to_csv('allcomparisions_genes.csv')

#### next 

Automate the GPR for each reaction and model. 


In [27]:
def concatenate_elements(*lists):
    # Ensure that all lists have the same length by filling shorter lists with empty strings
    labels_list = []

        
    max_len = max(len(lst) for lst in lists)
    for lst in lists:
        while len(lst) < max_len:
            lst.append([''])
    
    # Concatenate lists
    result = []
    
    

    for i in range(max_len):
        combined_string = " or ".join(
            " or; ".join(sub_item.replace("None", "").replace("NA"," ") for sub_item in lst[i]) for lst in lists 
            if len(lst) > i and lst[i]
            )
        # potential alternative.
#         combined_string = ' or '.join(item.replace("None", "").replace("NA"," ") for sublist in nested_lists for subsublist in sublist for item in subsublist)

        result.append([combined_string])
        
    
        # Generate label for this concatenated string
        if combined_string:
            current_labels = []
            for j, lst in enumerate(lists):
                if len(lst) > 0 and lst[i]:
                    if j == 0:
                        current_labels.append("rto,sce,ylip")
                    elif j == 1:
                        current_labels.append("rto,sce")
                    elif j == 2:
                        current_labels.append("rto,ylip")
                    elif j == 3:
                        current_labels.append("sce,ylip")
                    elif j == 4:
                        current_labels.append("rto")
                    elif j == 5:
                        current_labels.append("sce")
                    elif j == 6:
                        current_labels.append("ylip")
            labels_list.append(" ; ".join(current_labels))
        else:
            labels_list.append('')    
            
            
            
            
#             current_labels = [f"list_{j+1}" for j, lst in enumerate(lists) if len(lst) > i and lst[i]]
#             labels_list.append(" or ".join(current_labels))
    
    return result,labels_list



In [28]:
# gene annotation for each ortholog and the species source of the ortholog. 
result,result_labels = concatenate_elements(_common_all,_only_12,_only_13,_only_23,_only_1,_only_2,_only_3)


In [29]:
for r, label in zip(result, result_labels):
    print(f"{r[0]} (From: {label})")

Lipst1_1_73838 (From: rto,ylip)
Lipst1_1_103302 (From: rto,sce)
Lipst1_1_258919 (From: rto,sce,ylip)
Lipst1_1_63385 (From: rto,sce,ylip)
Lipst1_1_60832 or Lipst1_1_43727 (From: rto,sce,ylip ; rto,ylip)
Lipst1_1_68609 or Lipst1_1_70253 (From: rto,sce,ylip ; rto,sce)
Lipst1_1_3835 or Lipst1_1_57709 (From: sce ; ylip)
Lipst1_1_4166 or Lipst1_1_6739 (From: rto,sce,ylip ; rto)
Lipst1_1_47324 (From: rto,sce,ylip)
Lipst1_1_2102 or Lipst1_1_70967 (From: rto ; sce)
Lipst1_1_54023 (From: rto,sce,ylip)
Lipst1_1_68065 or Lipst1_1_47067 (From: rto ; ylip)
Lipst1_1_302882 (From: rto,sce,ylip)
Lipst1_1_102877 (From: rto,sce,ylip)
Lipst1_1_3600 or Lipst1_1_111440 (From: rto ; sce)
Lipst1_1_72751 (From: rto,sce,ylip)
Lipst1_1_102877 (From: rto,sce,ylip)
Lipst1_1_92627 or Lipst1_1_47067 (From: rto,sce,ylip ; sce,ylip)
Lipst1_1_4094 or Lipst1_1_2516 (From: rto,sce ; ylip)
Lipst1_1_69224 or Lipst1_1_1857 (From: rto,sce,ylip ; ylip)
Lipst1_1_72774 (From: rto,sce,ylip)
Lipst1_1_73707 (From: rto,sce,ylip)
Li

### Note - section above did not add to model - just obtained orthologs and new GPRs.

In [30]:
# save output. 
# df3 = pd.DataFrame([_names,_id,_stoich,_subsystem,_gpr_rto,_gpr_sce,_gpr_ylip,_lst_rto_gene,_lst_sce_gene,_lst_ylip_gene,_common_all,_only_12,_only_13,_only_23,_only_1,_only_2,_only_3,_all_three_diff,result,result_labels]).T
# df3.to_csv('allcomparisions_genes_withGPR_.csv')

In [31]:
# rxnID to new GPR.
__dict = dict(zip(_id,result))

In [32]:
# rxnID to organism source.
__dict2 = dict(zip(_id,result_labels))

verify model and add a needed reaction for growth on lipid compounds. 

In [33]:
# reaction needed to enable growth for lipid compounds. 
model.add_reactions([rto.reactions.GAPP.copy()])

In [34]:
# examine the model genes for non-Lst genes.
genes = model.genes
genes = [g.id for g in genes]
wrong = [g for g in genes if 'Lipst1' not in g]
len(set(wrong))

71

check that the genes are all associated with a reaction. remove the ones that are not. 

In [35]:
wrong_g = []
wrong_rx = []
for g in wrong:
    reac = model.genes.get_by_id(g).reactions
    if not reac:
        print(f'{g} has no associated reaction')
        
        cobra.manipulation.delete.remove_genes(model,[g])
    for rx in reac:
        if rx:
#             print(rx,g)
            wrong_g.append(g)
            wrong_rx.append(rx)
print(len(set(wrong_g)))
print(len(set(wrong_rx)))

71
21


71 genes are associated with 21 reactions that currently do not have Lst genes. 

In [36]:
set(wrong_rx)

{<Reaction ACRNtp at 0x2a1bc3a10>,
 <Reaction ALDD2ym at 0x15b811ad0>,
 <Reaction ATPS3m at 0x15bcd1890>,
 <Reaction C30CPT1 at 0x2a1bc32d0>,
 <Reaction C3STKR1er at 0x2a1bc1e10>,
 <Reaction C3STKR2er at 0x2a17cab10>,
 <Reaction CRNtp at 0x2a1bc8150>,
 <Reaction CYOO6m at 0x2a18c2f10>,
 <Reaction CYOR_u9m at 0x15b829810>,
 <Reaction DNTPPA at 0x2a1841d90>,
 <Reaction DRTPPD at 0x2a1b92f90>,
 <Reaction FA160COAabcp at 0x2a1ba3990>,
 <Reaction FA183COAabcp at 0x2a1920c90>,
 <Reaction FA240tp at 0x2a1a134d0>,
 <Reaction GAPP at 0x16a228f90>,
 <Reaction H2Otm at 0x2a1782c90>,
 <Reaction OIVD2m at 0x2a1bbb6d0>,
 <Reaction PMDPHT at 0x2a1bc2350>,
 <Reaction PRPPS at 0x2a1bba210>,
 <Reaction SERPTer at 0x2a13c8f50>,
 <Reaction TRDR at 0x2a1b93390>}

In [37]:
model_v01 = model.copy()

Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmp_184yos8.lp
Reading time = 0.01 seconds
: 1898 rows, 4166 columns, 16794 nonzeros


In [38]:
model.reactions.get_by_id('ICLp')

0,1
Reaction identifier,ICLp
Name,Isocitrate lyase
Memory address,0x2a1395150
Stoichiometry,icit_x --> glx_x + succ_x  Isocitrate --> Glyoxylate + Succinate
GPR,Lipst1_1_112377
Lower bound,0.0
Upper bound,1000.0


## Cell block below replaces the Lst model reaction GPR with orthoMCL && adds reactions that were not in the model but are in rto and have orthologs.

also keeps gapfilled reactions that enabled growth. 
replace genes.

In [39]:
len(model.reactions)

2083

In [40]:
# model.reactions.GAPP

In [41]:
if 'GAPP' in list(__dict.keys()):
    print('yes')

In [42]:
# list(__dict.keys())

In [43]:
# all the reactions added below come from the rto model. The sources of the Lst genes can come from rto, sce or ylip. 

and_checks = []
maintained_v01_gpr = []
gpr_replaced = []
no_gpr_bothMethods = []
added_to_model = []
not_in_model = []
added_to_model_source = []




# iterate through dict (reaction.id:gpr)
for r in list(__dict.keys()):
    
    # if the r is included in the model. 
    if r in model.reactions:
        
        # obtain the reaction.
        model_rxn = model.reactions.get_by_id(r)
#         print(model_rxn.gene_reaction_rule)
        old_gpr = model_rxn.gene_reaction_rule

        # obtain the new gpr. 
        new_gpr = __dict[r][0]
        
        # a semi-colon represents a potential "and" reaction for further validation.
        if ';' in new_gpr:
            new_gpr = new_gpr.replace(';','')
            
            # further validation.
            and_checks.append(r)
    
    # reaction not in model.
    else:
        value = __dict[r][0]
        if (value) and (r in rto.reactions):
            copy_rxn = rto.reactions.get_by_id(r).copy()
            model.add_reactions([copy_rxn])
            
            
            if ';' in value:
                value = value.replace(';','')
                new_gpr = value
                # further validation
                and_checks.append(r)
            model.reactions.get_by_id(r).gene_reaction_rule = value
            
            
            added_to_model.append(r)
            added_to_model_source.append(__dict2[r])
        
        else:
            not_in_model.append(r)
    
#     if ("Lipst1" not in new_gpr) and (new_gpr):
#         print("new",r, new_gpr)
#     if ("Lipst1" not in old_gpr) and (old_gpr):
#         print("old", r, old_gpr)
        
    if (r=='GAPP'):
        print('new',new_gpr,'old',old_gpr)
    # if there is a gene value from orthoMCL.     
    if (old_gpr) and (new_gpr) and (r in model.reactions):
        gpr_replaced.append(r)
        model_rxn.gene_reaction_rule = new_gpr
        if ("Lipst1" not in new_gpr):
            print("new",r, new_gpr)
    elif (new_gpr) and (r in model.reactions):
        gpr_replaced.append(r)
        model_rxn.gene_reaction_rule = new_gpr
    
        if ("Lipst1" not in new_gpr):
            print("new2",r, new_gpr)

    # no new gpr -- remove reaction?
    # keep old reaction?
    elif (old_gpr) and (r in model.reactions):
        maintained_v01_gpr.append(r)
        if ("Lipst1" not in old_gpr) and (old_gpr):
            print("old", r, old_gpr)

    elif ((r in model.reactions)):
        no_gpr_bothMethods.append(r)
    else: 
        not_in_model.append(r)
        
            
#             print(f'{old_gpr} and {new_gpr}:')

        #model_rxn.gene_reaction_rule = new_gpr

#     print(r,new_gpr,model_rxn.gene_reaction_rule)
        
        # assign new gpr rule.
#         model_rxn.gene_reaction_rule = new_gpr
#         print(model_rxn.gene_reaction_rule)
    
    # if it is in the rto model (so was removed because lack of GPR)
#     elif r in rto.reactions:
        

old H2Otm 9014
old DNTPPA 15385
old PMDPHT 11513
old ACRNtp 10149
old CRNtp 10149
old FA240tp 9912
old FA160COAabcp 13167 and 9637
old DRTPPD 15506


In [44]:
maintained_v01_gpr

['H2Otm',
 'DNTPPA',
 'PMDPHT',
 'ACRNtp',
 'CRNtp',
 'FA240tp',
 'FA160COAabcp',
 'DRTPPD']

In [45]:
genes = model.genes
genes = [g.id for g in genes]
wrong = [g for g in genes if 'Lipst1' not in g]
len(set(wrong))

105

In [46]:
wrong_g2 = []
wrong_rx2 = []
for g in wrong:
    reac = model.genes.get_by_id(g).reactions
    if not reac:
        print(f'{g} has no associated reaction')
        
        cobra.manipulation.delete.remove_genes(model,[g])
    for rx in reac:
        if rx:
#             print(rx,g)
            wrong_g2.append(g)
            wrong_rx2.append(rx)
print(len(set(wrong_g)))
print(len(set(wrong_rx2)))

12737 has no associated reaction
9688 has no associated reaction
12730 has no associated reaction
10848 has no associated reaction
9940 has no associated reaction
16359 has no associated reaction
11958 has no associated reaction
ATP8 has no associated reaction
14786 has no associated reaction
13842 has no associated reaction
9619 has no associated reaction
13759 has no associated reaction
11967 has no associated reaction
ATP9 has no associated reaction
15287 has no associated reaction
13424 has no associated reaction
15880 has no associated reaction
14912 has no associated reaction
13053 has no associated reaction
ATP6 has no associated reaction
13252 has no associated reaction
15589 has no associated reaction
10674 has no associated reaction
9080 has no associated reaction
13426 has no associated reaction
11145 has no associated reaction
16592 has no associated reaction
15900 has no associated reaction
9910 has no associated reaction
14014 has no associated reaction
11290 has no assoc

In [47]:
set(wrong_rx2)

{<Reaction ACRNtp at 0x2a1bc3a10>,
 <Reaction C30CPT1 at 0x2a1bc32d0>,
 <Reaction C3STKR1er at 0x2a1bc1e10>,
 <Reaction C3STKR2er at 0x2a17cab10>,
 <Reaction CRNtp at 0x2a1bc8150>,
 <Reaction CYOO6m at 0x2a18c2f10>,
 <Reaction CYOR_u9m at 0x15b829810>,
 <Reaction DNTPPA at 0x2a1841d90>,
 <Reaction DRTPPD at 0x2a1b92f90>,
 <Reaction FA160COAabcp at 0x2a1ba3990>,
 <Reaction FA183COAabcp at 0x2a1920c90>,
 <Reaction FA240tp at 0x2a1a134d0>,
 <Reaction GAPP at 0x16a228f90>,
 <Reaction H2Otm at 0x2a1782c90>,
 <Reaction OIVD2m at 0x2a1bbb6d0>,
 <Reaction PMDPHT at 0x2a1bc2350>,
 <Reaction SERPTer at 0x2a13c8f50>}

{<Reaction ACRNtp at 0x2a19ea1d0>,
 <Reaction C30CPT1 at 0x2a19e9a90>,
 <Reaction C3STKR1er at 0x2a08e0a50>,
 <Reaction C3STKR2er at 0x2a19e8b10>,
 <Reaction CRNtp at 0x2a19ea8d0>,
 <Reaction CYOO6m at 0x2a10856d0>,
 <Reaction CYOR_u9m at 0x2a19bbad0>,
 <Reaction DNTPPA at 0x2a19df810>,
 <Reaction DRTPPD at 0x2a19c5710>,
 <Reaction FA160COAabcp at 0x2a1092590>,
 <Reaction FA183COAabcp at 0x2a176b410>,
 <Reaction FA240tp at 0x2a1769c90>,
 <Reaction GAPP at 0x2a19ead90>,
 <Reaction H2Otm at 0x2a090d450>,
 <Reaction OIVD2m at 0x2a19a6d90>,
 <Reaction PMDPHT at 0x2a19e92d0>,
 <Reaction SERPTer at 0x2a0bc76d0>}

In [48]:
genes = model.genes
genes = [g.id for g in genes]
wrong = [g for g in genes if 'Lipst1' not in g]
len(set(wrong))

44

In [49]:
wrong_g2 = []
wrong_rx2 = []
for g in wrong:
    reac = model.genes.get_by_id(g).reactions
    if not reac:
        print(f'{g} has no associated reaction')
        
        cobra.manipulation.delete.remove_genes(model,[g])
    for rx in reac:
        if rx:
#             print(rx,g)
            wrong_g2.append(g)
            wrong_rx2.append(rx)
print(len(set(wrong_g)))
print(len(set(wrong_rx2)))

71
17


In [50]:
len(model.reactions)-2083

39

39 reactions added

In [51]:
model.reactions.get_by_id('ICLp')

0,1
Reaction identifier,ICLp
Name,Isocitrate lyase
Memory address,0x2a1395150
Stoichiometry,icit_x --> glx_x + succ_x  Isocitrate --> Glyoxylate + Succinate
GPR,Lipst1_1_112377
Lower bound,0.0
Upper bound,1000.0


In [52]:
maintained_v01_gpr

['H2Otm',
 'DNTPPA',
 'PMDPHT',
 'ACRNtp',
 'CRNtp',
 'FA240tp',
 'FA160COAabcp',
 'DRTPPD']

In [53]:
len(added_to_model)

39

In [54]:
model_v01.summary()

Metabolite,Reaction,Flux,C-Number,C-Flux
ca2_e,EX_ca2_e,5.101e-05,0,0.00%
cu2_e,EX_cu2_e,3.219e-05,0,0.00%
fe3_e,EX_fe3_e,0.0003782,0,0.00%
glc__D_e,EX_glc__D_e,1.0,6,98.86%
k_e,EX_k_e,0.03591,0,0.00%
mg2_e,EX_mg2_e,0.003784,0,0.00%
mn2_e,EX_mn2_e,3.721e-05,0,0.00%
na1_e,EX_na1_e,0.0016,0,0.00%
nh4_e,EX_nh4_e,0.4047,0,0.00%
o2_e,EX_o2_e,2.898,0,0.00%

Metabolite,Reaction,Flux,C-Number,C-Flux
3aap_c,DM_3aap_c,-0.04342,5,6.08%
4oglu_c,DM_4oglu_c,-8.54e-05,5,0.01%
amob_m,DM_amob_m,-0.0001482,15,0.06%
dad_5_m,DM_dad_5_m,-0.0001482,10,0.04%
co2_e,EX_co2_e,-3.126,1,87.51%
fe2_e,EX_fe2_e,-0.0002626,0,0.00%
h2o_e,EX_h2o_e,-4.514,0,0.00%
h_e,EX_h_e,-0.1403,0,0.00%
hco3_e,EX_hco3_e,-0.001536,1,0.04%
zymst_e,EX_zymst_e,-0.008273,27,6.25%


In [55]:
model.summary()

Metabolite,Reaction,Flux,C-Number,C-Flux
ca2_e,EX_ca2_e,5.186e-05,0,0.00%
cu2_e,EX_cu2_e,3.272e-05,0,0.00%
fe3_e,EX_fe3_e,0.0003845,0,0.00%
glc__D_e,EX_glc__D_e,1.0,6,98.84%
k_e,EX_k_e,0.03651,0,0.00%
mg2_e,EX_mg2_e,0.003847,0,0.00%
mn2_e,EX_mn2_e,3.783e-05,0,0.00%
na1_e,EX_na1_e,0.001627,0,0.00%
nh4_e,EX_nh4_e,0.4114,0,0.00%
o2_e,EX_o2_e,2.846,0,0.00%

Metabolite,Reaction,Flux,C-Number,C-Flux
3aap_c,DM_3aap_c,-0.04415,5,6.25%
4oglu_c,DM_4oglu_c,-8.683e-05,5,0.01%
amob_m,DM_amob_m,-0.0001507,15,0.06%
dad_5_m,DM_dad_5_m,-0.0001507,10,0.04%
co2_e,EX_co2_e,-3.078,1,87.16%
fe2_e,EX_fe2_e,-0.000267,0,0.00%
h2o_e,EX_h2o_e,-4.489,0,0.00%
h_e,EX_h_e,-0.1426,0,0.00%
hco3_e,EX_hco3_e,-0.001562,1,0.04%
zymst_e,EX_zymst_e,-0.008411,27,6.43%


In [56]:
model_remove_original_gpr = model.copy()

Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpaoqfjyvs.lp
Reading time = 0.01 seconds
: 1904 rows, 4244 columns, 17194 nonzeros


##### removing the maintained reactions that were gapfilled leads to no growth. 

In [57]:
# with model_remove_original_gpr:
for r in maintained_v01_gpr:
    rxn = model_remove_original_gpr.reactions.get_by_id(r)
    model_remove_original_gpr.remove_reactions([rxn])

In [58]:
model_remove_original_gpr.summary()

Metabolite,Reaction,Flux,C-Number,C-Flux
glc__D_e,EX_glc__D_e,0.488,6,100.00%
o2_e,EX_o2_e,0.122,0,0.00%

Metabolite,Reaction,Flux,C-Number,C-Flux
ac_e,EX_ac_e,-0.732,2,50.00%
btd_RR_e,EX_btd_RR_e,-0.244,4,33.33%
co2_e,EX_co2_e,-0.488,1,16.67%
h2o_e,EX_h2o_e,-0.244,0,0.00%
h_e,EX_h_e,-0.732,0,0.00%


In [59]:
len(model_remove_original_gpr.reactions)

2114

In [60]:
len(model.reactions)

2122

In [61]:
len(model_v01.reactions)

2083

In [62]:
len(maintained_v01_gpr)

8

In [63]:
for r in (maintained_v01_gpr):
    print(r, model.reactions.get_by_id(r).gene_reaction_rule)

H2Otm 9014
DNTPPA 15385
PMDPHT 11513
ACRNtp 10149
CRNtp 10149
FA240tp 9912
FA160COAabcp 13167 and 9637
DRTPPD 15506


summary of above findings.

8 reactions all were gap filled in order to function orginally. 

# look for other orthologs from individual rto genes potentially left out. 

In [64]:
# this cell block looks at the rto model and determines if there are other orthologs that can be added to the lst GSM. 
# previous code only looked at reactions that were in multiple models (i.e., in the rto and ylip model). 

# counter. 
count=0

# storage variables. 
ortholog_available = []
no_ortholog = []
no_rxn = []
genes = []


# obtain list of rto, model reactions.
rto_rxns = list(rto.reactions)
rto_rxns = [x.id for x in rto_rxns]
model_rxns = list(model.reactions)
model_rxns = [x.id for x in model_rxns]

# dictionary with keys of rto genes, values of Lst genes. 
lst_dict_keys = list(lst_rto_dict.keys())

# determine how many reactions were not added to the model. 
diff_rxns = set(rto_rxns) - set(model_rxns)
print(len(diff_rxns))

# check the gpr of the reactions. 
for r in diff_rxns:
    
    # rxn is the rto object.
    rxn = rto.reactions.get_by_id(r)

    # gpr is the rto gene reaction rule. 
    gpr = rxn.genes
    
    
    # temporary storage for new orthologs. 
    temp_gpr = []

    # tracking orthologs. 
    count_ortho=0
    
    
    # determine the number of orthologs for the GPR.
    for i in range(len(gpr)):
        
        # obtain the individual genes. 
        g=list(gpr)[i]

        # evaluate if there are any orthologs lst_orthologs. 
        if g.id in lst_dict_keys:

            # obtain the lst gene ortholog.
            value = lst_rto_dict[g.id]

            # check if the ortholog is blank. 
            if value:
                temp_gpr.append(value)
                count_ortho+=1
                    
    # if there were no orthologs. 
    if count_ortho==0:
        no_ortholog.append(r)
    
    # there was at least one ortholog for the reaction. 
    elif count_ortho>0:
        # append the reaction to this list. 
        ortholog_available.append(r)
        # append the gpr to this list. 
        genes.append(temp_gpr)
        
        # make a GPR rule for the orthologs. 
        temp = " or ".join(sub_item.replace("None", "").replace("NA"," ") for sub_item in temp_gpr)
        
#         print(temp, '\n',temp_gpr)
        
        # add the reaction and GPR to the Lst model. 
        model.add_reactions([rxn.copy()])
        model.reactions.get_by_id(r).gene_reaction_rule = temp

        # track the number of reactions added. 
        count+=1



277


In [65]:
count

26

In [66]:
genes = model.genes
genes = [g.id for g in genes]
wrong = [g for g in genes if 'Lipst1' not in g]
len(set(wrong))

112

In [67]:
wrong_g2 = []
wrong_rx2 = []
for g in wrong:
    reac = model.genes.get_by_id(g).reactions
    if not reac:
        print(f'{g} has no associated reaction')
        
        cobra.manipulation.delete.remove_genes(model,[g])
    for rx in reac:
        if rx:
#             print(rx,g)
            wrong_g2.append(g)
            wrong_rx2.append(rx)
print(len(set(wrong_g)))
print(len(set(wrong_rx2)))

15762 has no associated reaction
13348 has no associated reaction
14170 has no associated reaction
12742 has no associated reaction
9700 has no associated reaction
10293 has no associated reaction
13228 has no associated reaction
11362 has no associated reaction
12752 has no associated reaction
11907 has no associated reaction
13813 has no associated reaction
12570 has no associated reaction
14805 has no associated reaction
11203 has no associated reaction
8885 has no associated reaction
12116 has no associated reaction
9274 has no associated reaction
10704 has no associated reaction
10705 has no associated reaction
14070 has no associated reaction
12867 has no associated reaction
15694 has no associated reaction
16404 has no associated reaction
9113 has no associated reaction
11702 has no associated reaction
14078 has no associated reaction
10607 has no associated reaction
13017 has no associated reaction
10017 has no associated reaction
8446 has no associated reaction
8444 has no ass

In [68]:
wrong_rx2 = [x.id for x in wrong_rx2]
(set(wrong_rx2))

{'ACRNtp',
 'C30CPT1',
 'C3STKR1er',
 'C3STKR2er',
 'CRNtp',
 'CYOO6m',
 'CYOR_u9m',
 'DNTPPA',
 'DRTPPD',
 'FA160COAabcp',
 'FA183COAabcp',
 'FA240tp',
 'GAPP',
 'H2Otm',
 'OIVD2m',
 'PMDPHT',
 'SERPTer'}

## above cell added 26 reactions that had orthologs but were not previously included. 

In [69]:
len(no_ortholog)

251

In [70]:
len(ortholog_available)

26

In [71]:
len(model.reactions)

2148

In [72]:
len(model_v01.reactions)

2083

In [73]:
len(model.reactions)-2083

65

In [74]:
model.summary()

Metabolite,Reaction,Flux,C-Number,C-Flux
ca2_e,EX_ca2_e,6.035e-05,0,0.00%
cu2_e,EX_cu2_e,3.808e-05,0,0.00%
fe3_e,EX_fe3_e,0.0004475,0,0.00%
glc__D_e,EX_glc__D_e,1.0,6,98.66%
k_e,EX_k_e,0.04248,0,0.00%
mg2_e,EX_mg2_e,0.004477,0,0.00%
mn2_e,EX_mn2_e,4.403e-05,0,0.00%
na1_e,EX_na1_e,0.001893,0,0.00%
nh4_e,EX_nh4_e,0.4788,0,0.00%
o2_e,EX_o2_e,2.329,0,0.00%

Metabolite,Reaction,Flux,C-Number,C-Flux
3aap_c,DM_3aap_c,-0.05137,5,8.21%
4oglu_c,DM_4oglu_c,-0.000101,5,0.02%
amob_m,DM_amob_m,-0.0001754,15,0.08%
dad_5_m,DM_dad_5_m,-0.0001754,10,0.06%
co2_e,EX_co2_e,-2.599,1,83.12%
fe2_e,EX_fe2_e,-0.0003107,0,0.00%
h2o_e,EX_h2o_e,-4.242,0,0.00%
h_e,EX_h_e,-0.1659,0,0.00%
hco3_e,EX_hco3_e,-0.001818,1,0.06%
zymst_e,EX_zymst_e,-0.009788,27,8.45%


In [75]:
model_v01.summary()

Metabolite,Reaction,Flux,C-Number,C-Flux
ca2_e,EX_ca2_e,5.101e-05,0,0.00%
cu2_e,EX_cu2_e,3.219e-05,0,0.00%
fe3_e,EX_fe3_e,0.0003782,0,0.00%
glc__D_e,EX_glc__D_e,1.0,6,98.86%
k_e,EX_k_e,0.03591,0,0.00%
mg2_e,EX_mg2_e,0.003784,0,0.00%
mn2_e,EX_mn2_e,3.721e-05,0,0.00%
na1_e,EX_na1_e,0.0016,0,0.00%
nh4_e,EX_nh4_e,0.4047,0,0.00%
o2_e,EX_o2_e,2.898,0,0.00%

Metabolite,Reaction,Flux,C-Number,C-Flux
3aap_c,DM_3aap_c,-0.04342,5,6.08%
4oglu_c,DM_4oglu_c,-8.54e-05,5,0.01%
amob_m,DM_amob_m,-0.0001482,15,0.06%
dad_5_m,DM_dad_5_m,-0.0001482,10,0.04%
co2_e,EX_co2_e,-3.126,1,87.51%
fe2_e,EX_fe2_e,-0.0002626,0,0.00%
h2o_e,EX_h2o_e,-4.514,0,0.00%
h_e,EX_h_e,-0.1403,0,0.00%
hco3_e,EX_hco3_e,-0.001536,1,0.04%
zymst_e,EX_zymst_e,-0.008273,27,6.25%


# look at sce and ylip to determine individual reactions that could be added. 

following cell does not add reactions - finds sce individual reactions that have orthologs for Lst.

In [76]:
# this cell now looks at the sce model to find individual reactions that have an Lst gene ortholog. 
# this cell does not modify the model but rather tracks the reactions that have potential for additions. 

# track objects. 
sce_ortholog_available = []
no_ortholog = []
no_rxn = []
genes = []


# obtain list of sce and model reactions. 
sce_rxns = list(sce.reactions)
sce_rxns = [x.id for x in sce_rxns]
model_rxns = list(model.reactions)
model_rxns = [x.id for x in model_rxns]

# obtain keys of the sce:lst dictionary. Keys are the sce genes. 
sce_dict_keys = list(lst_sce_dict.keys())

# determine the reactions that have not yet been added to the model.
diff_rxns = set(sce_rxns) - set(model_rxns)

print(len(diff_rxns))

# iterate through the not added reactions.
for r in diff_rxns:
    
    # rxn is the sce reaction IDs.
    rxn = sce.reactions.get_by_id(r)

    # gpr are the sce reaction genes. 
    gpr = rxn.genes
    
    # holder for the Lst orthologs. 
    temp_gpr = []


    count_ortho=0
        
    # determine the number of orthologs for the GPR.
    for i in range(len(gpr)):
        
        # g is the individual genes in the rxns. 
        g=list(gpr)[i]

        # examine lst_orthologs. 
        if g.id in sce_dict_keys:

            # obtain the lst gene ortholog.
            value = lst_sce_dict[g.id]

            # check if the ortholog is blank. 
            if value:
                temp_gpr.append(value)
                count_ortho+=1
                    
    # if there were no orthologs. 
    if count_ortho==0:
        no_ortholog.append(r)
    
    # if one ortholog exist for a reaction, append it sce_ortholog_available list. 
    elif count_ortho>0:
        sce_ortholog_available.append(r)
        genes.append(temp_gpr)
        


3234


In [77]:
len(sce_ortholog_available)

1668

In [78]:
len(model.reactions)

2148

In [79]:
2148-2083

65

potential to add 1668 more reactions. 

#### ylip orthologs. 

following cell does not add reactions - finds ylip reactions that have orthologs to Lst.


In [80]:
# this cell now looks at the ylip model to find individual reactions that have an Lst gene ortholog. 
# this cell does not modify the model but rather tracks the reactions that have potential for additions. 

# track objects.
ylip_ortholog_available = []
no_ortholog = []
no_rxn = []
genes = []

# obtain the ylip and model reaction lists. 
ylip_rxns = list(ylip.reactions)
ylip_rxns = [x.id for x in ylip_rxns]
model_rxns = list(model.reactions)
model_rxns = [x.id for x in model_rxns]

# obtain keys of the ylip:lst dictionary. Keys are the ylip genes.
ylip_dict_keys = list(lst_ylip_dict.keys())

# determine the reactions that are in the ylip GSM but not in the Lst GSM model. 
diff_rxns = set(ylip_rxns) - set(model_rxns)
print(len(diff_rxns))

# iterate through the reactions not included. 
for r in diff_rxns:
    
    # rxn holds the ylip reactions. 
    rxn = ylip.reactions.get_by_id(r)
    
    # gpr holds the ylip reaction genes. 
    gpr = rxn.genes
    
    # list object to hold the Lst orthologs. 
    temp_gpr = []

    count_ortho=0
        
    # determine the number of orthologs for the GPR.
    for i in range(len(gpr)):
        
        # g is the individual ylip genes in the reaction. 
        g=list(gpr)[i]

        # examine lst_orthologs. 
        if g.id in ylip_dict_keys:

            # obtain the lst gene ortholog.
            value = lst_ylip_dict[g.id]

            # check if the ortholog is blank. 
            if value:
                temp_gpr.append(value)
                count_ortho+=1
                    
    # if there were no orthologs. 
    if count_ortho==0:
        no_ortholog.append(r)
    
    # append reactions for which there is one Lst ortholog to the ylip_ortholog_available list. 
    elif count_ortho>0:
        ylip_ortholog_available.append(r)
        genes.append(temp_gpr)
        
#         temp = " or ".join(sub_item.replace("None", "").replace("NA"," ") for sub_item in temp_gpr)


633


In [81]:
len(model.reactions)-2083

65

In [82]:
len(ylip_ortholog_available)

260

In [83]:
len(sce_ortholog_available)

1668

There are 260 potential ylip reactions that can be added. 

## following section seeks to determine if there are reactions in the individual sce or ylip GSMs that are in the Lst draft GSM, but may have had seperate ID conventions. 

i.e.

- ReacR1 = a+b --> c
- ReacR_1 = a+b --> c

The two reactions are the same but would not have turned up the same in the previous iteration. 

Now we try to match the naming conventions instead of IDs to identify these reactions. 

i.e., 
- ReacR1 = a+b --> c : name = Reaction 1
- ReacR_1 = a+b --> c : name = Reaction 1


In [84]:
# This cell gathers the reactions that have orthologs from the sce model that will be used for making a dictionary. 

# list holders for reaction details. 
_names = []
_gpr = []
_id = []
_stoich = []
_subsystem = []


# gather the reaction details from the possible ortholog reactions. 
for rx in sce_ortholog_available:
    r = sce.reactions.get_by_id(rx)
    _names.append(r.name)
    _gpr.append(r.gene_reaction_rule)
    _id.append(r.id)
    _stoich.append(r.reaction)
    
# store the names here. 
sce_rxns_name = _names


In [85]:
# here we obtain the model reactions and names. 
model_rxns_name = model.reactions
model_rxns_id = [x.id for x in model_rxns_name]
model_rxns_name = [x.name for x in model_rxns_name] 


now we check how many names match the sce and model reactions. 

In [86]:
temp = []
for r in sce_rxns_name:
    if r in model_rxns_name:
        temp.append(r)
print(len(temp))

12


In [87]:
print(temp)

['C-3 sterol dehydrogenase (4-methylzymosterol)', '3-mercaptopyruvate sulfurtransferase', '3-methyl-2-oxobutanoate hydroxymethyltransferase', '2-keto-4-methylthiobutyrate transamination', 'C-14 sterol reductase', 'Phytosphingosine phosphate lyase', 'UDP-N-acetylglucosamine diphosphorylase', 'C-4 sterol methyl oxidase (4,4-dimethylzymosterol)', 'NAD kinase', 'GTP cyclohydrolase II', 'fatty-acid--CoA ligase (decanoate)', 'NAD diphosphatase']


## match reaction names so we can add orthologs even if reaction IDs do not match.

Many reaction names have different writing conventions (i.e., spaces, dashes, captilization). We will use the python 're' library to replace many of those names to try and match reactions. 

In [88]:
import re


In [89]:
# replace the model reaction special characters and make all characters lower case. 
model_rxns_name = [x.replace(" ","").replace('-','').replace(':',"").replace(',',"").replace(')',"").replace(')',"").replace('\'',"").replace('mitochondrialmembrane',"").lower() for x in model_rxns_name]
model_rxns_name = [re.sub(r'[^a-zA-Z0-9]', '', x) for x in model_rxns_name]



In [90]:
len(sce_rxns_name)

1668

In [91]:
len(sce.reactions)

4131

In [92]:
# replace the sce reaction name special characters and make all characters lower case. 
sce_rxns_name = [re.sub(r' c$',"",x) for x in sce_rxns_name]
sce_rxns_name = [re.sub(r' v$',"",x) for x in sce_rxns_name]

sce_rxns_name = [x.replace(" ","").replace('-','').replace(':',"").replace(',',"").replace(')',"").replace(')',"").replace('\'',"").replace('+',"").replace('mitochondrialmembrane',"").lower() for x in sce_rxns_name]

sce_rxns_name = [re.sub(r'[^a-zA-Z0-9]', '', x) for x in sce_rxns_name]

# the yeast-8 scaffold model has many membrane/envelope compartments. 
# Here we will replace those names for initial matching purposes.
sce_rxns_name = [x.replace('mitochondrialmembrane',"").replace('golgimembrane',"").replace('ermembrane',"") for x in sce_rxns_name]
sce_rxns_name = [x.replace('cellenvelopecytoplasm',"").replace('vacuolarmembrane',"") for x in sce_rxns_name]
sce_rxns_name = [re.sub(r'mitochondrial$',"",x) for x in sce_rxns_name]
sce_rxns_name = [re.sub(r'mitochondria$',"",x) for x in sce_rxns_name]
sce_rxns_name = [re.sub(r'cellenvelope$',"",x) for x in sce_rxns_name]
sce_rxns_name = [re.sub(r'cytoplasm$',"",x) for x in sce_rxns_name]
sce_rxns_name = [re.sub(r'cytoplasmic$',"",x) for x in sce_rxns_name]



In [93]:
for r in sce_rxns_name:
    if 'mitochondria' in r:
        print(r)

mitochondrialalcoholdehydrogenase
glutamyltrnaglnamidotransferasesubunitbmitochondrialgluadtsubunitbec635cytochromecoxidaseassemblyfactorpet112
magnesiumtransportermrs2mitochondrialrnasplicingproteinmrs2
ironsulfurclusterassemblyprotein2mitochondrialironsulfurclusterscaffoldprotein2
putativelipoateproteinligaseaec63120alteredinheritancerateofmitochondriaprotein22


In [94]:
for r in sce_rxns_name:
    if 'cytoplasm' in r:
        print(r)

In [95]:
len(sce_rxns_name)

1668

potentially 1386 reactions can be added from sce. 

In [96]:

len(set(sce_rxns_name))

1386

however, many reactions have the same name. 

In [97]:
phrase = 'ipsphospholipasec'
try:
    position = sce_rxns_name.index(phrase)
    positions = [i for i, x in enumerate(sce_rxns_name) if x == phrase]
    if positions:
        print(f"'{phrase}' found at positions {positions} in the list.")
#     print(f"'{phrase}' found at position {position} in the list.")
except ValueError:
    print(f"'{phrase}' not found in the list.")

'ipsphospholipasec' found at positions [54, 66, 84, 119, 135, 156, 166, 235, 300, 322, 325, 421, 429, 462, 534, 536, 553, 587, 593, 595, 617, 649, 682, 690, 716, 781, 815, 827, 947, 978, 980, 982, 1177, 1186, 1189, 1194, 1198, 1200, 1202, 1203, 1212, 1220, 1245, 1255, 1264, 1299, 1324, 1328, 1339, 1369, 1410, 1425, 1435, 1470, 1512, 1520, 1552, 1580, 1587, 1654] in the list.


## but first, lets check how many reactions are already in the model. 

88 reactions are already in model and need updated GPRs. 

In [98]:
temp = []
for r in sce_rxns_name:
    if r in model_rxns_name:
        temp.append(r)
print(len(temp))

88


88 reactions are already in model and need updated GPRs for the sce genes. 

In [99]:
## this cell removes more of the unique identifiers from the model reaction names. 
model_rxns_name = [re.sub(r' c$',"",x) for x in model_rxns_name]
model_rxns_name = [re.sub(r' v$',"",x) for x in model_rxns_name]

model_rxns_name = [x.replace(" ","").replace('-','').replace(':',"").replace(',',"").replace(')',"").replace(')',"").replace('\'',"").replace('+',"").replace('mitochondrialmembrane',"").lower() for x in model_rxns_name]

model_rxns_name = [re.sub(r'[^a-zA-Z0-9]', '', x) for x in model_rxns_name]

# model_rxns_name = [x.replace('mitochondrialmembrane',"").replace('golgimembrane',"").replace('ermembrane',"") for x in model_rxns_name]
model_rxns_name = [x.replace('vacuolar',"").replace('vacuolarmembrane',"") for x in model_rxns_name]
model_rxns_name = [re.sub(r'mitochondrial$',"",x) for x in model_rxns_name]
model_rxns_name = [re.sub(r'mitochondria$',"",x) for x in model_rxns_name]
model_rxns_name = [re.sub(r'cellenvelope$',"",x) for x in model_rxns_name]
model_rxns_name = [re.sub(r'cytoplasm$',"",x) for x in model_rxns_name]
model_rxns_name = [re.sub(r'cytoplasmic$',"",x) for x in model_rxns_name]

model_rxns_name = [re.sub(r'peroxisomal$',"",x) for x in model_rxns_name]
model_rxns_name = [re.sub(r'cytoplasmic$',"",x) for x in model_rxns_name]

# sce_rxns_name = [x.replace('mitochondrialmembrane',"").replace('golgimembrane',"").replace('ermembrane',"") for x in sce_rxns_name]

correction. 102 reactions need updated GPRs.

In [100]:
temp = []
for r in sce_rxns_name:
    if r in model_rxns_name:
        temp.append(r)
print(len(temp))
print(len(set(temp)))

110
102


In [101]:
# create a dictionary of the model rxn names to the original ids. 
model_dict_sce = dict(zip(model_rxns_name,model_rxns_id))

In [102]:
model_dict_sce

{'aldehydedehydrogenase2phenylethanolnadp': 'ALCD25yi',
 'methenyltetrahydrifikatecyclohydrolase': 'MTHFCm',
 'ampnucleosidase': 'AMPN',
 'diacylglycerolcholinephosphotransferaseyeastspecific': 'DAGCPTer_RT',
 'pyruvatetransportinviaprotonsymport': 'PYRt2',
 'nicotinatenucleotidediphosphorylasecarboxylating': 'NNDPR',
 'hydroxymethylglutarylcoasynthase': 'HMGCOAS',
 '35cyclicnucleotidephosphodiesterase': 'PDE3',
 'phosphoadenylylsulfatereductasethioredoxin': 'PAPSR',
 'acylcoaoxidasedecanoylcoa': 'ACOAO4p',
 '3isopropylmalate3methyltransferase': '3IPM3MT',
 'udpglucose6dehydrogenase': 'UDPGD',
 'cytidinekinasegtp': 'CYTDK2',
 'fattyacidcoaligasetetradecenoate': 'FACOAL141p',
 'acetylcoaacyltransferasetetradecanoylcoa': 'ACACT7p',
 'polyamineoxidase': 'POLYAO',
 'mevalonatekinaseatp': 'MEVK1',
 'datpuridine5phosphotransferase': 'DATUP',
 'taurinedioxygenase': 'TAUDO',
 'citratesynthase': 'CSm',
 'phosphoserinetransaminase': 'PSERT',
 'guanylatekinasegmpdatp': 'GK2',
 'lcysteinereversibl

In [103]:
# here we create two lists that have the model IDs from the sce scaffold and the modified reaction names. 

# holders for the names and IDs.
temp = []
temp2 = []
model_temp_id_sce = []

# iterate through the sce modified reaction names. 
for i,r in enumerate(sce_rxns_name):
    
    # check if the modified sce modified reaction name matches the model modified reaction name. 
    if r in model_rxns_name:
        
        # append the modified reaction name to temp.
        temp.append(r)
        
        # append the reaction _id to temp 2. 
        temp2.append(_id[i])
        
        # append the Lst model reaction ID that corresponds to the reaction name
        model_temp_id_sce.append(model_dict_sce[r])

# output should be 110 and 102. 
print(len(temp),len(temp2))
print(len(set(temp)))

110 110
102


In [104]:
# create a dictionary of the modified reaction names to the model reaction IDs.

# dictionary of modified reaction names to the model reaction ID. 
sce_additional_rxns_toModel = dict(zip(temp,model_temp_id_sce))

# dictionary of modified reaction names to the sce GSM reaction ID. 
sce_additional_rxns = dict(zip(temp,temp2))

### Repeat this process for *Y. lipolytica* GSM. 

Determine reactins in ylip that share names with the model reactions, but have diferent IDs. 

In [105]:
# output the ortholog from Ylip to csv to validate reactions
_names = []
_gpr = []
_id = []
_stoich = []
_subsystem = []

# gather the reaction details from the possible ortholog reactions. 
for rx in ylip_ortholog_available:
    r = ylip.reactions.get_by_id(rx)
    _names.append(r.name)
    _gpr.append(r.gene_reaction_rule)
    _id.append(r.id)
    _stoich.append(r.reaction)

# store the ylip GSM names here. 
ylip_rxns_name = _names


In [106]:
# gather the reaction names in the model.
# note - this variable name is re-used from the sce section and will be overwritten by running this cell. 
model_rxns_name = model.reactions
model_rxns_name = [x.name for x in model_rxns_name] 

In [107]:
# replace naming conventions that can be 'messy'. 
model_rxns_name = [re.sub(r' c$',"",x) for x in model_rxns_name]
model_rxns_name = [re.sub(r' v$',"",x) for x in model_rxns_name]
model_rxns_name = [x.replace(" ","").replace('-','').replace(':',"").replace(',',"").replace(')',"").replace(')',"").replace('\'',"").replace('+',"").replace('mitochondrialmembrane',"").lower() for x in model_rxns_name]

model_rxns_name = [re.sub(r'[^a-zA-Z0-9]', '', x) for x in model_rxns_name]

model_rxns_name = [x.replace('vacuolar',"").replace('vacuolarmembrane',"") for x in model_rxns_name]
model_rxns_name = [re.sub(r'mitochondrial$',"",x) for x in model_rxns_name]
model_rxns_name = [re.sub(r'mitochondria$',"",x) for x in model_rxns_name]
model_rxns_name = [re.sub(r'cellenvelope$',"",x) for x in model_rxns_name]
model_rxns_name = [re.sub(r'cytoplasm$',"",x) for x in model_rxns_name]
model_rxns_name = [re.sub(r'cytoplasmic$',"",x) for x in model_rxns_name]

model_rxns_name = [re.sub(r'peroxisomal$',"",x) for x in model_rxns_name]
model_rxns_name = [re.sub(r'cytoplasmic$',"",x) for x in model_rxns_name]

# sce_rxns_name = [x.replace('mitochondrialmembrane',"").replace('golgimembrane',"").replace('ermembrane',"") for x in sce_rxns_name]

replace the ylip naming conventions that can be 'messy'.

In [108]:
ylip_rxns_name = [re.sub(r' c$',"",x) for x in ylip_rxns_name]
ylip_rxns_name = [re.sub(r' v$',"",x) for x in ylip_rxns_name]

ylip_rxns_name = [x.replace(" ","").replace('-','').replace(':',"").replace(',',"").replace(')',"").replace(')',"").replace('\'',"").replace('+',"").replace('mitochondrialmembrane',"").lower() for x in ylip_rxns_name]

ylip_rxns_name = [re.sub(r'[^a-zA-Z0-9]', '', x) for x in ylip_rxns_name]

ylip_rxns_name = [x.replace('mitochondrialmembrane',"").replace('golgimembrane',"").replace('ermembrane',"") for x in ylip_rxns_name]
ylip_rxns_name = [x.replace('cellenvelopecytoplasm',"").replace('vacuolarmembrane',"") for x in ylip_rxns_name]
ylip_rxns_name = [re.sub(r'mitochondrial$',"",x) for x in ylip_rxns_name]
ylip_rxns_name = [re.sub(r'mitochondria$',"",x) for x in ylip_rxns_name]
ylip_rxns_name = [re.sub(r'cellenvelope$',"",x) for x in ylip_rxns_name]
ylip_rxns_name = [re.sub(r'cytoplasm$',"",x) for x in ylip_rxns_name]
ylip_rxns_name = [re.sub(r'cytoplasmic$',"",x) for x in ylip_rxns_name]

# sce_rxns_name = [x.replace('mitochondrialmembrane',"").replace('golgimembrane',"").replace('ermembrane',"") for x in sce_rxns_name]

260 reactions have orthologs in the ylip model. 

In [109]:
# length of reactions with orthologs only in the ylip model.
len(ylip_rxns_name)

260

In [110]:
# create dictionary of the replaced model reaction names to the reaction IDs. 
model_dict = dict(zip(model_rxns_name,model_rxns_id))

In [111]:
# here we create two lists that have the model IDs from the sce scaffold and the modified reaction names. 

# holders for the names and IDs.
temp = []
temp2 = []
model_temp_id = []

# iterate through the ylip modified reaction names. 
for i,r in enumerate(ylip_rxns_name):

    # check if the modified sce modified reaction name matches the model modified reaction name. 
    if r in model_rxns_name:

        # append the reaction _id to temp 2. 
        temp.append(r)
        
        # append the reaction _id to temp 2. 
        temp2.append(_id[i])
        
        # append the Lst model reaction ID that corresponds to the reaction name
        model_temp_id.append(model_dict[r])

print(len(temp),len(temp2))
print(len(set(temp)))

108 108
101


In [112]:
# create a dictionary of the modified reaction names to the model reaction IDs. 

# dictionary of modified reaction names to the model reaction ID. 
ylip_additional_rxns_toModel = dict(zip(temp,model_temp_id))

# dictionary of modified reaction names to the sce GSM reaction ID. 
ylip_additional_rxns = dict(zip(temp,temp2))

In [113]:
# create a dictionary of the modified reaction names to the ylip GSM reaction IDs. 

ylip_additional_rxns = dict(zip(temp,temp2))

In [114]:
ylip_additional_rxns['isocitratelyase']

'ICL'

In [115]:
# this cell contains a modified function that was previously used to compare three lists. 
# this cell instead will take two lists of genes and determine if any of the genes overlap or are unique.
# the function returns the unique genes. 

# definition to compare the genes in common reactions. 
def compare_two_lists(list1, list2):

    # Convert lists to sets.
    set1, set2 = set(list1), set(list2)
    
    # Find all intersections and differences.
    common_all = set1 & set2
        
    only_1 = set1 - set2
    only_2 = set2 - set1
    
    all_diff = (set1 | set2 ) - common_all - only_1 - only_2
    
    # Construct verbose explanation.
    explanation = {}

    
    explanation['common_all'] = list(common_all)

    explanation['only_1'] = list(only_1)
    explanation['only_2'] = list(only_2)
    explanation['all_diff'] = list(all_diff)
    
    
    return explanation



## This section uses the collected reaction names and modifies the Lst model GPR as needed. 
following cell updates the model GPR for overlapping reactions in ylip.

In [116]:
# this cell updates the model GPR with the overlapping ylip reactions. 

# storage lists. 
# gene overlap in the ylip orthologs and the model genes for each reaction. 
_names = []
_gpr_sce = []
_gpr_ylip = []
_gpr_model = []

_lst_model_gene = []
_lst_sce_gene = []
_lst_ylip_gene = []

_id = []
_stoich = []
_subsystem = []
_lst_gene = []
_ylip_gene = []


consensus = []
consensus_count = []
consensus_source = []
_output = []


_common_all = []
_only_1 = []
_only_2 = []
_all_diff = []

# iterate through the modified reaction names that are in ylip and the Lst model. 
for rx in list(ylip_additional_rxns.keys()):
    
    # get actual GSM IDs.
    model_rx = ylip_additional_rxns_toModel[rx]
    ylip_rx = ylip_additional_rxns[rx]
    
    # obtain the reaction objects.
    m = model.reactions.get_by_id(model_rx)
    y = ylip.reactions.get_by_id(ylip_rx)
    
    # obtain the reaction GPRs.
    _gpr_model = (m.gene_reaction_rule)
    _gpr_ylip = (y.gene_reaction_rule)
    
#     # check. 
#     if y.id == 'ICL':
#         print(_gpr_ylip,_gpr_model)

    # obtain the reaction meta data.
    _names.append(m.name)
    _id.append(m.id)
    _stoich.append(m.reaction)
    _subsystem.append(m.subsystem)
    
    ## to do - insert check to make sure not adding multiple GPRs. 
    
    # Lst model current GPR.
    temp1 = []
    for g in m.genes:
        temp1.append(g.id)
    _lst_model_gene.append(temp1)
    if y.id == 'ICL':
        print(temp1)

    
    # ylip genes. 
    temp3 = []
    gpr = y.genes
    
    # iterate over each individual gene. 
    for i in range(len(gpr)):
        g=list(gpr)[i]

        # examine lst_orthologs. 
        if g.id in ylip_dict_keys:

            # obtain the lst gene ortholog.
            value = lst_ylip_dict[g.id]

            # check if the ortholog is blank. 
            if value:
                temp3.append(value)
#                 count_ortho+=1

    if y.id == 'ICL':
        print(temp3)
    _lst_ylip_gene.append(temp3)
    
    # call the function to compare the genes in each GSM reaction, which are stored as list temp1 and temp3. 
    result = compare_two_lists(temp1, temp3)
    
    if y.id == 'ICL':
        print(result)

    # assign the results here. 
    _common_all = (result['common_all'])
    _only_1 = (result['only_1'])
    _only_2 = (result['only_2'])
    _all_three_diff = (result['all_diff'])
    
    # append them into a single list.
    trifecta = []
    trifecta.append(_common_all)
    trifecta.append(_only_1)
    trifecta.append(_only_2)
    
    if y.id =='ICL':
        print(trifecta)   

    # if there is a new gene that needs to be added. 
    if result['only_2']:
        
        
        max_len = max(len(lst) for lst in trifecta)
        
        # join them as 'or' gpr rules. 
        for i in range(max_len):
            temp_gpr = ' or '.join(item for sublist in trifecta for item in sublist)
    
    # else keep the original gene reaction rule. 
    else:
        temp_gpr = m.gene_reaction_rule

    if y.id == 'ICL':
        print(temp_gpr)
        print(y.gene_reaction_rule)

    # assign new GPR. 
    m.gene_reaction_rule = temp_gpr
    


['Lipst1_1_112377']
['Lipst1_1_112377', 'Lipst1_1_75138']
{'common_all': ['Lipst1_1_112377'], 'only_1': [], 'only_2': ['Lipst1_1_75138'], 'all_diff': []}
[['Lipst1_1_112377'], [], ['Lipst1_1_75138']]
Lipst1_1_112377 or Lipst1_1_75138
YALI0C16885g or YALI0F31999g


In [117]:
ylip

0,1
Name,model
Memory address,15a49dd50
Number of metabolites,1121
Number of reactions,1348
Number of genes,646
Number of groups,69
Objective expression,1.0*biomass_C - 1.0*biomass_C_reverse_c1d5c
Compartments,"c, e, m, n, x, r, g, v"


In [118]:
model.reactions.get_by_id('ICLp')

0,1
Reaction identifier,ICLp
Name,Isocitrate lyase
Memory address,0x2a1395150
Stoichiometry,icit_x --> glx_x + succ_x  Isocitrate --> Glyoxylate + Succinate
GPR,Lipst1_1_112377 or Lipst1_1_75138
Lower bound,0.0
Upper bound,1000.0


In [119]:
len(model.reactions)-2083

65

In [120]:
model.summary()

Metabolite,Reaction,Flux,C-Number,C-Flux
ca2_e,EX_ca2_e,6.035e-05,0,0.00%
cu2_e,EX_cu2_e,3.808e-05,0,0.00%
fe3_e,EX_fe3_e,0.0004475,0,0.00%
glc__D_e,EX_glc__D_e,1.0,6,98.66%
k_e,EX_k_e,0.04248,0,0.00%
mg2_e,EX_mg2_e,0.004477,0,0.00%
mn2_e,EX_mn2_e,4.403e-05,0,0.00%
na1_e,EX_na1_e,0.001893,0,0.00%
nh4_e,EX_nh4_e,0.4788,0,0.00%
o2_e,EX_o2_e,2.329,0,0.00%

Metabolite,Reaction,Flux,C-Number,C-Flux
3aap_c,DM_3aap_c,-0.05137,5,8.21%
4oglu_c,DM_4oglu_c,-0.000101,5,0.02%
amob_m,DM_amob_m,-0.0001754,15,0.08%
dad_5_m,DM_dad_5_m,-0.0001754,10,0.06%
co2_e,EX_co2_e,-2.599,1,83.12%
fe2_e,EX_fe2_e,-0.0003107,0,0.00%
h2o_e,EX_h2o_e,-4.242,0,0.00%
h_e,EX_h_e,-0.1659,0,0.00%
hco3_e,EX_hco3_e,-0.001818,1,0.06%
zymst_e,EX_zymst_e,-0.009788,27,8.45%


In [121]:
genes = model.genes
genes = [g.id for g in genes]
wrong = [g for g in genes if 'Lipst1' not in g]
len(set(wrong))

44

In [122]:

wrong_g2 = []
wrong_rx2 = []
for g in wrong:
    reac = model.genes.get_by_id(g).reactions
    if not reac:
        print(f'{g} has no associated reaction')
        
        cobra.manipulation.delete.remove_genes(model,[g])
    for rx in reac:
        if rx:
#             print(rx,g)
            wrong_g2.append(g)
            wrong_rx2.append(rx)
print(len(set(wrong_g2)))
print(len(set(wrong_rx2)))

44
17


In [123]:
genes = model.genes
genes = [g.id for g in genes]
wrong = [g for g in genes if 'Lipst1' not in g]
len(set(wrong))

44

## This section uses the collected reaction names and modifies the Lst model GPR as needed. 
following cell updates the model GPR for overlapping reactions in sce.
#### sce

following cell updates gpr from SCE reactions that have naming overlaps. 

In [124]:
# this cell updates the model GPR with the overlapping ylip reactions. 

# storage lists. 
# gene overlap in the sce orthologs and the model genes for each reaction. 
_names = []
_gpr_sce = []
_gpr_ylip = []
_gpr_model = []

_lst_model_gene = []
_lst_sce_gene = []
_lst_ylip_gene = []

_id = []
_stoich = []
_subsystem = []
_lst_gene = []
_ylip_gene = []


consensus = []
consensus_count = []
consensus_source = []
_output = []


_common_all = []

_only_1 = []
_only_2 = []
_all_diff = []

# iterate through the sce reactions that have orthologs and are also in the model reactions (based on name matching).
for rx in list(sce_additional_rxns.keys()):
    
    # get actual GSM IDs.
    model_rx = sce_additional_rxns_toModel[rx]
    sce_rx = sce_additional_rxns[rx]
    
    # obtain the reaction objects.
    m = model.reactions.get_by_id(model_rx)
    s = sce.reactions.get_by_id(sce_rx)
    
    # obtain the reaction GPRs.
    _gpr_model.append(m.gene_reaction_rule)
    _gpr_sce.append(s.gene_reaction_rule)

    # obtain the reaction meta data.
    _names.append(m.name)
    _id.append(m.id)
    _stoich.append(m.reaction)
    _subsystem.append(m.subsystem)
    
    ## to do - insert check to make sure not adding multiple GPRs. 
    
    # model current GPR.
    temp1 = []
    for g in m.genes:
        temp1.append(g.id)
    _lst_model_gene.append(temp1)
    

    # sce genes. 
    temp3 = []
    gpr = s.genes
    
    # iterate through the genes in the sce GSM. 
    for i in range(len(gpr)):
        g=list(gpr)[i]

        # examine lst_orthologs. 
        if g.id in sce_dict_keys:

            # obtain the lst gene ortholog.
            value = lst_sce_dict[g.id]

            # check if the ortholog is blank. 
            if value:
                temp3.append(value)
#                 count_ortho+=1

    _lst_sce_gene.append(temp3)
    

    # call comparing function for two lists. 
    result = compare_two_lists(temp1, temp3)

    # assign the results. 
    _common_all = (result['common_all'])
    _only_1 = (result['only_1'])
    _only_2 = (result['only_2'])
    _all_three_diff = (result['all_diff'])
    
    # concate into a single list. 
    trifecta = []
    trifecta.append(_common_all)
    trifecta.append(_only_1)
    trifecta.append(_only_2)
    
    # check. 
    if s.id == 'ICL':
        print("list:",_common_all,_only_2,_only_1)
        print("trifecta",trifecta)
    
    # if there is a new gene to add. 
    if result['only_2']:
        max_len = max(len(lst) for lst in trifecta)
        
        # concate to new gpr with 'or' gene reaction rules. 
        for i in range(max_len):
            temp_gpr = ' or '.join(item for sublist in trifecta for item in sublist)
                
    # no new genes - keep original reaction gene reaction rule. 
    else:
        temp_gpr = m.gene_reaction_rule

    # check. 
    if s.id =='ICL':
        print(temp_gpr)

    # assign new GPR. 
    m.gene_reaction_rule = temp_gpr


list: ['Lipst1_1_112377'] [] ['Lipst1_1_75138']
trifecta [['Lipst1_1_112377'], ['Lipst1_1_75138'], []]
Lipst1_1_112377 or Lipst1_1_75138


In [125]:
len(model.reactions)

2148

In [126]:
2148-2083

65

In [127]:
genes = model.genes
genes = [g.id for g in genes]
wrong = [g for g in genes if 'Lipst1' not in g]
print(len(set(wrong)))
wrong_g2 = []
wrong_rx2 = []
for g in wrong:
    reac = model.genes.get_by_id(g).reactions
    if not reac:
        print(f'{g} has no associated reaction')
        
        cobra.manipulation.delete.remove_genes(model,[g])
    for rx in reac:
        if rx:
#             print(rx,g)
            wrong_g2.append(g)
            wrong_rx2.append(rx)
print(len(set(wrong_g2)))
print(len(set(wrong_rx2)))

44
44
17


## Progess status.

We:
- created a scaffold GSM originally built from rto bidirectional best hits.
- compiled orthologs using orthoMCL.

then we: 
- obtained reactions in multiple GSMs.
- concated GPRs using the unique genes in the overlapping reaction IDs.
- updated the Lst bbh model with the multi-species GPRs. 
- added rto reactions to the Lst model that were only in rto AND had orthologs. 
- matched ylip and sce reaction names to find other overlapping reactions (multi-species).
- modifed Lst model with multi-species GPRs. 

Next we will:
- deterimine ylip/sce reactions that have Lst orthologs that ARE not in the GSM (i.e., genes not already in the model). 
- identify reactions that include metabolites already in the Lst GSM (such that we do not add orphan metabolites).
- screen reactions to make sure that there are not comparable reactions in the Lst GSM.
- update GPR with multi-species GPRs or add reactions as necessary. 

## section peruses the ylip GSM reactions and identifies ones with orthologs to Lst that are not included in the Lst model already. 
ylip - individual reactions

the following cell blocks find reactions from ylip that have orthologs not yet added to the model. we create a dataframe with these reactions. then, we determine if the metabolites in the reactions were in the model before adding.

In [128]:
# first, obtain the Lst genes included in the Lst model.
model_genes = model.genes
model_genes = [x.id for x in model_genes]

In [129]:
# this code finds reactions from ylip to the model that may have potential orthologs that have not been included yet. 

# list storage. 
_gpr = []
_id = []
_stoich = []
_subsystem = []
gpr_holder = []

# obtain list of reactions that had orthologs but were not added.
# (the ylip_ortholog_available was obtained from the ortholog section looking at individual models above).
# (the ylip_additional_rxns was obtained by through matching of the reaction names to Lst model).
new_ylip_only_reactions = set(ylip_ortholog_available)-set(list(ylip_additional_rxns.values()))

# iterate through reactions with Lst ortholog gene candidates. 
for rx in new_ylip_only_reactions:
    
    # obtain reaction in ylip GSM.
    r = ylip.reactions.get_by_id(rx)
    
 
    
    # counters and list tracker for new GPR. 
    count_ortho = 0
    temp_gpr = []
    
    # iterate through the genes in the ylip reaction.
    for i in range(len(r.genes)):
        
        # obtain a list of the ylip genes. 
        g=list(r.genes)[i]
        g = g.id
        
        # check if the particular gene has an Lst ortholog. 
        try:
            value = lst_ylip_dict[g]
            
            # if there is an ortholog &
            # the gene is not in the Lst model &
            # the reaction is in the list (assured based on iteration)
            if (value) and (value not in model_genes) and (rx in list(new_ylip_only_reactions)):
                
                # count how many of the genes have orthologs.
                count_ortho+=1
        
        # there is no ortholog. 
        except:
            continue
#             donothing=0
    
    # there is an ortholog for at least one gene associated with the reaction. 
    if count_ortho>0:
        # obtain reaction metadata.
        _names.append(r.name)
        _id.append(r.id)  
        _gpr.append(r.gene_reaction_rule)
        _stoich.append(r.reaction)
        
#         temp = " or ".join(sub_item.replace("None", "").replace("NA"," ") for sub_item in temp_gpr)
#         gpr_holder.append(temp)       


# this next section now looks at the reactions that had an ortholog. 
new_ylip_only_reactions2 = set(_id)-set(list(ylip_additional_rxns.values()))                             
                             
print(len(new_ylip_only_reactions2))

# re-initailize metadata storing lists. 
_names = []
_gpr = []
_id = []
_stoich = []
_subsystem = []
gpr_holder = []

# iterate through the reactions that have orthologs. 
for rx in new_ylip_only_reactions2:
    # obtain reaction. 
    r = ylip.reactions.get_by_id(rx)

    # store reaction metadata. 
    _gpr.append(r.gene_reaction_rule)
    _stoich.append(r.reaction)
    _names.append(r.name)
    _id.append(r.id)
    
    # objects for tracking and storing progress.
    count_ortho = 0
    temp_gpr = []
    
    # iterate through reaction genes. 
    for i in range(len(r.genes)):

        # obtain the individual genes. 
        g=list(r.genes)[i]
        g = g.id

        # obtain the ortholog.
        try:
            value = lst_ylip_dict[g]
            if (value):
                count_ortho+=1
        except:
            continue
            
    # this implies that at least one gene has been found that was not included in the reaction. re-run section to catpure all geens in teh reaction.
    if count_ortho>0:
        
        # iterate through genes again for each reaction.
        for i in range(len(r.genes)):
            
            # obtain the indivdual gene. 
            g=list(r.genes)[i]
            g = g.id
            
            # obtain the new GPR as a list. 
            try:
                value = lst_ylip_dict[g]
                if (value):
                    temp_gpr.append(value)
            except:
                continue
                
        # create a new GPR rule. 
        temp = " or ".join(sub_item.replace("None", "").replace("NA"," ") for sub_item in temp_gpr)
        gpr_holder.append(temp)

# create a dataframe with the candidate reactions. 
ylip_added_orthologs_orhpans = pd.DataFrame([_id,_names,_stoich,_gpr,gpr_holder]).T

24


In [130]:
ylip_added_orthologs_orhpans

Unnamed: 0,0,1,2,3,4
0,PSP_L,phosphoserine phosphatase L serine,h2o_c + pser__L_c --> pi_c + ser__L_c,YALI0B20438g,Lipst1_1_1740
1,FAS160ACPm,fatty acyl ACP synthase n C160ACP mitochondrial,3.0 h_m + malACP_m + myrsACP_m + 2.0 nadph_m -...,YALI0D14850g and YALI0D24629g and YALI0F30679g...,Lipst1_1_3273 or Lipst1_1_2955 or Lipst1_1_106903
2,ATPS3g,ATP synthase Golgi Apparatus,adp_g + 3.0 h_c + pi_g --> atp_g + h2o_g + 2.0...,YALI0A09900g and YALI0A11143g and YALI0B03982g...,Lipst1_1_149291 or Lipst1_1_69438 or Lipst1_1_...
3,DOLK,Dolichol kinase,ctp_c + dolichol_c --> cdp_c + dolp_c + h_c,YALI0E18942g,Lipst1_1_55005
4,PETHCT,phosphoethanolamine cytidyltransferase,ctp_c + ethamp_c + h_c --> cdpea_c + ppi_c,YALI0C06303g,Lipst1_1_4188
5,FAS161ACPm,fatty acyl ACP synthase n C161ACP mitochondrial,4.0 h_m + malACP_m + myrsACP_m + 3.0 nadph_m +...,YALI0D14850g and YALI0D24629g and YALI0F30679g...,Lipst1_1_3273 or Lipst1_1_2955 or Lipst1_1_106903
6,FAS80ACPm_L,fatty acyl ACP synthase n C80ACP mitochondrial...,acACP_m + 9.0 h_m + 3.0 malACP_m + 6.0 nadph_m...,YALI0D14850g and YALI0D24629g and YALI0F30679g...,Lipst1_1_3273 or Lipst1_1_2955 or Lipst1_1_106903
7,FAS140ACPm,fatty acyl ACP synthase n C140ACP mitochondrial,ddcaACP_m + 3.0 h_m + malACP_m + 2.0 nadph_m -...,YALI0D14850g and YALI0D24629g and YALI0F30679g...,Lipst1_1_3273 or Lipst1_1_2955 or Lipst1_1_106903
8,ACt2r,acetate reversible transport via proton symport,ac_e + h_e <=> ac_c + h_c,YALI0E04312g,Lipst1_1_7130
9,HBZOPT6,Hydroxybenzoate octaprenyltransferase,4hbz_c + octdp_5_c --> 3ophb_5_c + ppi_c,(YALI0E09276g and YALI0B16126g) or (YALI0E1366...,Lipst1_1_302018 or Lipst1_1_114557 or Lipst1_1...


In [131]:
# check one of the gene dictionaries: for PETHCT.
lst_ylip_dict['YALI0C06303g']

'Lipst1_1_4188'

#### metabolite determination. This section verifies that the reaction candidates only have metabolites that are already included in the GSM. 
metabolite section. 


In [132]:
# obtain the metabolites in the Lst model. 
mets = model.metabolites
mets = [x.id for x in mets]

In [133]:
# this cell checks the metabolites in the reactions and ensures that they are all in the GSM. 

# reaction names.
react = ylip_added_orthologs_orhpans[0].to_list()
# new GPR.
gpr = ylip_added_orthologs_orhpans[4].to_list()

# list for tracking. 
orphan_ylip_reactions_actually_added = []

# iterate through the reactions.
for u,rx in enumerate(react):
    
    # obtain the ylip GSM reaction. 
    y = ylip.reactions.get_by_id(rx)
    
    # obtain a list of the metabolites. 
    gsm_mets = list(y.metabolites.keys())
    gsm_mets = [x.id for x in gsm_mets]
    
    # counter. 
    j=0
    
    # iterate through the metabolites.
    for i in range(len(gsm_mets)):
        
        # check if metabolite is in model.
        if gsm_mets[i] in mets:
            j+=1
        else:
#             print(gsm_mets[i])
            continue
    
    # check if all the reaction metabolites were in the model. 
    if j==len(gsm_mets):
        
        print('yes - ', rx)
#         r = y.copy()
#         r.gene_reaction_rule = gpr[u]

        # add to model. 
#         model.add_reactions([r])

        # store the reactions. 
        orphan_ylip_reactions_actually_added.append(r)

yes -  PSP_L
yes -  ACt2r
yes -  CYOOm
yes -  SPRMti
yes -  ATPCitL


In [134]:
for x in orphan_ylip_reactions_actually_added:
    print(x.id, x.name, x.reaction)

ATPCitL ATP Citrate Lyase YL atp_c + cit_c + coa_c --> accoa_c + adp_c + oaa_c + pi_c
ATPCitL ATP Citrate Lyase YL atp_c + cit_c + coa_c --> accoa_c + adp_c + oaa_c + pi_c
ATPCitL ATP Citrate Lyase YL atp_c + cit_c + coa_c --> accoa_c + adp_c + oaa_c + pi_c
ATPCitL ATP Citrate Lyase YL atp_c + cit_c + coa_c --> accoa_c + adp_c + oaa_c + pi_c
ATPCitL ATP Citrate Lyase YL atp_c + cit_c + coa_c --> accoa_c + adp_c + oaa_c + pi_c


#### here we validate and potentially add reactions to the Lst model. 

metabolic reaction #1.
- ACt2r acetate reversible transport via proton symport ac_e + h_e <=> ac_c + h_c
- CYOOm cytochrome c oxidase mitochondrial 4.0 focytc_m + 6.0 h_m + o2_m --> 4.0 ficytc_m + 2.0 h2o_m + 6.0 h_c
- PSP_L phosphoserine phosphatase L serine h2o_c + pser__L_c --> pi_c + ser__L_c
- ATPCitL ATP Citrate Lyase YL atp_c + cit_c + coa_c --> accoa_c + adp_c + oaa_c + pi_c
- SPRMti spermine irreversible uniport sprm_e --> sprm_c


In [135]:
ylip.reactions.ACt2r

0,1
Reaction identifier,ACt2r
Name,acetate reversible transport via proton symport
Memory address,0x17feef1d0
Stoichiometry,ac_e + h_e <=> ac_c + h_c  Acetate + H <=> Acetate + H
GPR,YALI0E04312g
Lower bound,-1000.0
Upper bound,1000.0


In [136]:
for r in model.metabolites.ac_e.reactions:
    print(r, r.name)

EX_ac_e: ac_e -->  Acetate exchange
ACtr: ac_e <=> ac_c Acetate transporter


In [137]:
model.reactions.ACtr

0,1
Reaction identifier,ACtr
Name,Acetate transporter
Memory address,0x15b813dd0
Stoichiometry,ac_e <=> ac_c  Acetate <=> Acetate
GPR,Lipst1_1_102591
Lower bound,-1000.0
Upper bound,1000.0


two potential orthologs for acetate transporter. One is specified as a proton symport. Both will be included in the GSM. 

In [138]:
r_copy = ylip.reactions.ACt2r.copy()
x = ylip_added_orthologs_orhpans.loc[ylip_added_orthologs_orhpans[0]=='ACt2r'][4]

In [139]:
x.values

array(['Lipst1_1_7130'], dtype=object)

In [140]:
model.add_reactions([r_copy])
model.reactions.ACt2r.gene_reaction_rule = 'Lipst1_1_7130'

reaction #2. 
- CYOOm cytochrome c oxidase mitochondrial 4.0 focytc_m + 6.0 h_m + o2_m --> 4.0 ficytc_m + 2.0 h2o_m + 6.0 h_c

In [141]:
ylip.reactions.CYOOm

0,1
Reaction identifier,CYOOm
Name,cytochrome c oxidase mitochondrial
Memory address,0x2a09cdc10
Stoichiometry,4.0 focytc_m + 6.0 h_m + o2_m --> 4.0 ficytc_m + 2.0 h2o_m + 6.0 h_c  4.0 Ferrocytochrome_c + 6.0 H + O2 --> 4.0 Ferricytochrome_c + 2.0 H2O + 6.0 H
GPR,YALI0E10144g and YALI0E12628g and YALI0E16709g and YALI0E19723g and YALI0F03201g and YALI0F03567g...
Lower bound,0.0
Upper bound,1000.0


In [142]:
ylip.reactions.CYOOm.check_mass_balance()

{}

In [143]:
for r in model.metabolites.focytc_m.reactions:
    print(r, r.name)

CYOR_u9m: 2.0 ficytc_m + 2.0 h_m + q9h2_m --> 2.0 focytc_m + 4.0 h_c + q9_m Ubiquinol 6 cytochrome c reductase
CCP2_m: 2.0 focytc_m + h2o2_m + 2.0 h_m --> 2.0 ficytc_m + 2.0 h2o_m Cytochrome c peroxidase, mitochondrial
CYOO6m: 4.0 focytc_m + 8.0 h_m + o2_m --> 4.0 ficytc_m + 2.0 h2o_m + 4.0 h_c Cytochrome c oxidase Complex IV
D_LACDcm: 2.0 ficytc_m + lac__D_c --> 2.0 focytc_m + 2.0 h_c + pyr_c D lactate dehydrogenase  cytosolicmitochondrial
L_LACDcm: 2.0 ficytc_m + lac__L_c --> 2.0 focytc_m + 2.0 h_c + pyr_c L-Lactate dehydrogenase, cytosolic/mitochondrial


one potential reaction already in Lst GSM. 

In [144]:
# model.reactions.CCP2_m

In [145]:
model.reactions.CYOO6m

0,1
Reaction identifier,CYOO6m
Name,Cytochrome c oxidase Complex IV
Memory address,0x2a18c2f10
Stoichiometry,4.0 focytc_m + 8.0 h_m + o2_m --> 4.0 ficytc_m + 2.0 h2o_m + 4.0 h_c  4.0 Ferrocytochrome c C42H53FeN8O6S2 + 8.0 H+ + Oxygen --> 4.0 Ferricytochrome c C42H52FeN8O6S2 + 2.0 H2O H2O + 4.0 H+
GPR,COX1 and COX2 and COX3 and 9165 and 13628 and 10685 and 12506 and 11352 and 13275 and 11769 and...
Lower bound,0.0
Upper bound,1000.0


In [146]:
model.reactions.CYOO6m.check_mass_balance()

{}

these two reactions are encoding the same reaction, but with slightly different mass balances in terms of protons moved. will keep the original reaction stoichiometry but add the new GPR. 

two potential orthologs for acetate transporter. One is specified as a proton symport. Both will be included in the GSM. 

In [147]:
# r_copy = ylip.reactions.CYOOm.copy()
x = ylip_added_orthologs_orhpans.loc[ylip_added_orthologs_orhpans[0]=='CYOOm'][4]

In [148]:
x.values

array(['Lipst1_1_5671 or Lipst1_1_55045 or Lipst1_1_3895 or Lipst1_1_54407 or Lipst1_1_65503 or Lipst1_1_67449'],
      dtype=object)

In [149]:
model.reactions.CYOO6m.gene_reaction_rule = 'Lipst1_1_54407 or Lipst1_1_65503 or Lipst1_1_67449 or Lipst1_1_3895 or Lipst1_1_55045 or Lipst1_1_5671'

reaction #3.
- PSP_L phosphoserine phosphatase L serine h2o_c + pser__L_c --> pi_c + ser__L_c


In [150]:
ylip.reactions.PSP_L

0,1
Reaction identifier,PSP_L
Name,phosphoserine phosphatase L serine
Memory address,0x16a7ad610
Stoichiometry,h2o_c + pser__L_c --> pi_c + ser__L_c  H2O + O_Phospho_L_serine --> Phosphate + L_Serine
GPR,YALI0B20438g
Lower bound,0.0
Upper bound,1000.0


In [151]:
for r in model.metabolites.pser__L_c.reactions:
    print(r, r.name)

PSERT: 3php_c + glu__L_c --> akg_c + pser__L_c Phosphoserine transaminase
PSERtr: pser__L_c <=> pser__L_e Transport of O-Phospho-L-Serine, Diffusion


In [152]:
r_copy = ylip.reactions.PSP_L.copy()
x = ylip_added_orthologs_orhpans.loc[ylip_added_orthologs_orhpans[0]=='PSP_L'][4]

In [153]:
x.values

array(['Lipst1_1_1740'], dtype=object)

In [154]:
model.add_reactions([r_copy])
model.reactions.PSP_L.gene_reaction_rule = 'Lipst1_1_1740'

reaction #4. 
- ATPCitL ATP Citrate Lyase YL atp_c + cit_c + coa_c --> accoa_c + adp_c + oaa_c + pi_c?

In [155]:
ylip.reactions.ATPCitL

0,1
Reaction identifier,ATPCitL
Name,ATP Citrate Lyase YL
Memory address,0x16a4b3ed0
Stoichiometry,atp_c + cit_c + coa_c --> accoa_c + adp_c + oaa_c + pi_c  ATP + Citrate + Coenzyme_A --> Acetyl_CoA + ADP + Oxaloacetate + Phosphate
GPR,YALI0E34793g and YALI0D24431g
Lower bound,0.0
Upper bound,1000.0


In [156]:
for r in model.metabolites.cit_c.reactions:
    print(r, r.name)

ACITL: atp_c + cit_c + coa_c --> accoa_c + adp_c + oaa_c + pi_c ATP-Citrate lyase
CITtam: cit_c + mal__L_m <=> cit_m + mal__L_c Citrate transport  mitochondrial
AKGCITtm: akg_c + cit_m --> akg_m + cit_c Dicarboxylate/tricarboxylate carrier (akg:cit), mitochondrial
CITtcm: cit_c + icit_m <=> cit_m + icit_c Citrate transport  mitochondrial


one reaction already included for ATP citrate lyase. 

In [157]:
model.reactions.ACITL

0,1
Reaction identifier,ACITL
Name,ATP-Citrate lyase
Memory address,0x15b8b9510
Stoichiometry,atp_c + cit_c + coa_c --> accoa_c + adp_c + oaa_c + pi_c  ATP C10H12N5O13P3 + Citrate + Coenzyme A --> Acetyl-CoA + ADP C10H12N5O10P2 + Oxaloacetate + Phosphate
GPR,Lipst1_1_5446
Lower bound,0.0
Upper bound,1000.0


In [158]:
x = ylip_added_orthologs_orhpans.loc[ylip_added_orthologs_orhpans[0]=='ATPCitL'][4]

In [159]:
x.values

array(['Lipst1_1_5446 or Lipst1_1_5447'], dtype=object)

update the ATPcitrate lyase model GPR. 

In [160]:
model.reactions.ACITL.gene_reaction_rule = 'Lipst1_1_5446 or Lipst1_1_5447'

reaction #5.
- SPRMti spermine irreversible uniport sprm_e --> sprm_c

In [161]:
ylip.reactions.SPRMti

0,1
Reaction identifier,SPRMti
Name,spermine irreversible uniport
Memory address,0x15bbc4510
Stoichiometry,sprm_e --> sprm_c  Spermine --> Spermine
GPR,YALI0A07447g or YALI0F03751g or YALI0C04477g or YALI0F07062g or YALI0E10483g or YALI0B17776g or...
Lower bound,0.0
Upper bound,1000.0


In [162]:
for r in model.metabolites.sprm_e.reactions:
    print(r, r.name)

SPRMt2i: h_c + sprm_e --> h_e + sprm_c Spermine transport via proton antiport irreversible
EX_sprm_e: sprm_e -->  Spermine C10H30N4 exchange


In [163]:
x = ylip_added_orthologs_orhpans.loc[ylip_added_orthologs_orhpans[0]=='SPRMti'][4]

In [164]:
x.values

array(['Lipst1_1_61594 or Lipst1_1_6033'], dtype=object)

In [165]:
model.reactions.get_by_id('SPRMt2i')

0,1
Reaction identifier,SPRMt2i
Name,Spermine transport via proton antiport irreversible
Memory address,0x2a13cbd90
Stoichiometry,h_c + sprm_e --> h_e + sprm_c  H+ + Spermine C10H30N4 --> H+ + Spermine C10H30N4
GPR,Lipst1_1_2256 or Lipst1_1_61026 or Lipst1_1_6033 or Lipst1_1_52551 or Lipst1_1_97209
Lower bound,0.0
Upper bound,1000.0


one additional gene not included in the transporter. 

will modify the existing genes to include the not added gene. 

In [166]:
model.reactions.get_by_id('SPRMt2i').gene_reaction_rule = 'Lipst1_1_2256 or Lipst1_1_61026 or Lipst1_1_6033 or Lipst1_1_97209 or Lipst1_1_52551 or Lipst1_1_61594'

In [167]:
len(model.reactions)

2150

In [168]:
2153-2083

70

5 additional reactions added from ylip. 

In [169]:
model.summary()

Metabolite,Reaction,Flux,C-Number,C-Flux
ca2_e,EX_ca2_e,6.49e-05,0,0.00%
cu2_e,EX_cu2_e,4.095e-05,0,0.00%
fe3_e,EX_fe3_e,0.0004812,0,0.00%
glc__D_e,EX_glc__D_e,1.0,6,98.56%
k_e,EX_k_e,0.04569,0,0.00%
mg2_e,EX_mg2_e,0.004814,0,0.00%
mn2_e,EX_mn2_e,4.735e-05,0,0.00%
na1_e,EX_na1_e,0.002036,0,0.00%
nh4_e,EX_nh4_e,0.4596,0,0.00%
o2_e,EX_o2_e,2.356,0,0.00%

Metabolite,Reaction,Flux,C-Number,C-Flux
4oglu_c,DM_4oglu_c,-0.0001087,5,0.02%
amob_m,DM_amob_m,-0.0001886,15,0.10%
dad_5_m,DM_dad_5_m,-0.0001886,10,0.06%
co2_e,EX_co2_e,-2.619,1,89.99%
fe2_e,EX_fe2_e,-0.0003342,0,0.00%
h2o_e,EX_h2o_e,-4.275,0,0.00%
h_e,EX_h_e,-0.1232,0,0.00%
hco3_e,EX_hco3_e,-0.001955,1,0.07%
zymst_e,EX_zymst_e,-0.01053,27,9.76%


In [171]:
genes = model.genes
genes = [g.id for g in genes]
wrong = [g for g in genes if 'Lipst1' not in g]
len(set(wrong))
wrong_g2 = []
wrong_rx2 = []
for g in wrong:
    reac = model.genes.get_by_id(g).reactions
    if not reac:
        print(f'{g} has no associated reaction')
        
        cobra.manipulation.delete.remove_genes(model,[g])
    for rx in reac:
        if rx:
#             print(rx,g)
            wrong_g2.append(g)
            wrong_rx2.append(rx)
print(len(set(wrong_g2)))
print(len(set(wrong_rx2)))

11317 has no associated reaction
11352 has no associated reaction
9165 has no associated reaction
13628 has no associated reaction
COX3 has no associated reaction
COX2 has no associated reaction
12506 has no associated reaction
COX1 has no associated reaction
11769 has no associated reaction
13275 has no associated reaction
13007 has no associated reaction
10685 has no associated reaction
12210 has no associated reaction
YALI0E04312g has no associated reaction
YALI0B20438g has no associated reaction
31
16


In [172]:
wrong_rx2 = [x.id for x in wrong_rx2]
set(wrong_rx2)

{'ACRNtp',
 'C30CPT1',
 'C3STKR1er',
 'C3STKR2er',
 'CRNtp',
 'CYOR_u9m',
 'DNTPPA',
 'DRTPPD',
 'FA160COAabcp',
 'FA183COAabcp',
 'FA240tp',
 'GAPP',
 'H2Otm',
 'OIVD2m',
 'PMDPHT',
 'SERPTer'}

## This section repeats the process for sce. The code takes the sce reactions that have orthologs and checks that the reaction contains metabolites in the Lst GSM.  

In [173]:
# this code finds reactions from sce to the model that may have potential orthologs that have been missed. 

# list objects for storing information. 
_gpr = []
_id = []
_stoich = []
_subsystem = []
gpr_holder = []


# obtain reactions that have orthologs but have not already been added the GSM. 
# (sce_ortholog_available) is the complete list of sce reactions that have orthologs that arent in rto or ylip.
# (sce_additional_rxns) is the list of sce reactions that had no ID match but were able to match rxn names. 
new_sce_only_reactions = set(sce_ortholog_available)-set(list(sce_additional_rxns.values()))

# iterate through the reactions. 
for rx in new_sce_only_reactions:
    
    # obtain the sce reaction object.
    r = sce.reactions.get_by_id(rx)
    
    
    # objects for tracking. 
    count_ortho = 0
    temp_gpr = []
    
    # iterate over the sce reaction genes. 
    for i in range(len(r.genes)):
        
        
        g=list(r.genes)[i]
        g = g.id
        
        # find the ortholog.
        try:
            
            # find the ortholog.
            value = lst_sce_dict[g]
            
            # verify that the ortholog exists & 
            # is not in the model genes &
            # is in the ortholog reactions (guaranteed based on cell logic and for loops)
            if (value) and (value not in model_genes) and (rx in list(new_sce_only_reactions)):

                count_ortho+=1
        except:
            continue
    
    # if the reaction had at least one gene ortholog in Lst. 
    if count_ortho>0:
        # append reaction metadata. 
        _gpr.append(r.gene_reaction_rule)
        _stoich.append(r.reaction)        
        _names.append(r.name)
        _id.append(r.id)  
        
# obtain the sce reactions that have orthologs.      
new_sce_only_reactions2 = set(_id)-set(list(sce_additional_rxns.values()))                             
                             
print(len(new_sce_only_reactions2))


# reset the information holding lists. 
_names = []
_gpr = []
_id = []
_stoich = []
_subsystem = []
gpr_holder = []

# iterate over the reactions. 
for rx in new_sce_only_reactions2:
    
    # obtain the sce reaction object. 
    r = sce.reactions.get_by_id(rx)
    
    # store reaction metadata. 
    _gpr.append(r.gene_reaction_rule)
    _stoich.append(r.reaction)
    _names.append(r.name)
    _id.append(r.id)
    
    # objects for tracking during loop. 
    count_ortho =0
    temp_gpr = []
    
    # iterate over the sce reaction genes. 
    for i in range(len(r.genes)):

        g=list(r.genes)[i]
        g = g.id

        try:
            
            # obtain the Lst ortholog. 
            value = lst_sce_dict[g]
            
            if (value):
                count_ortho+=1
        except:
            continue

    # this implies that at least one gene has been found that was not included in the reaction.
    if count_ortho>0:
        
        # re-run the gene iteration to capture all orthologs. 
        for i in range(len(r.genes)):
            
            g=list(r.genes)[i]
            g = g.id
            
            try:
                # obtain the Lst ortholog. 
                value = lst_sce_dict[g]
                if (value):
                    temp_gpr.append(value)
            except:
                continue
        
        # join multiple genes for a single reaction together using 'or' rules. 
        temp = " or ".join(sub_item.replace("None", "").replace("NA"," ") for sub_item in temp_gpr)
        gpr_holder.append(temp)

# pd.DataFrame() for storing the reactions.         
sce_added_orthologs_orhpans = pd.DataFrame([_id,_names,_stoich,_gpr,gpr_holder]).T

457


potentially 457 sce genes to add. 

## check candidate reactions to ensure that they included metabolites already in the GSM. 
metabolite checking section. 

In [174]:
# this cell checks the metabolites in the reactions and ensures that they are all in the GSM. 

# reaction names.
react = sce_added_orthologs_orhpans[0].to_list()

# gene reaction rules.
gpr = sce_added_orthologs_orhpans[4].to_list()

# list for storing reactions. 
orphan_sce_reactions_actually_added = []

# iterate over the reactions. 
for u,rx in enumerate(react):
    
    # grab the sce GSM reaction object.
    s = sce.reactions.get_by_id(rx)
    
    # grab the metabolites in the reaction.
    gsm_mets = list(s.metabolites.keys())
    gsm_mets = [x.id for x in gsm_mets]
    
    j=0
    
    # iterate over the number of metabolits in the reaction. 
    for i in range(len(gsm_mets)):
        
        # if metabolite is in model already.
        if gsm_mets[i] in mets:
            j+=1
#         else:
#             print(gsm_mets[i])

    # store reactions that have every metabolite in the Lst model.
    if j==len(gsm_mets):
#         print('yes - ', gsm_mets[i])
        r = s.copy()
        r.gene_reaction_rule = gpr[u]
#         print(r,r.gene_reaction_rule,'\n',s.gene_reaction_rule)

        # add to model. 
#         model.add_reactions([r])
        orphan_sce_reactions_actually_added.append(s)

In [175]:
len(orphan_sce_reactions_actually_added)

29

In [176]:
for x in orphan_sce_reactions_actually_added:
    print(x.id, x.name, x.reaction)

r2073_1 Zn(2+) transport h_e + zn2_e <=> h_c + zn2_c
r_4162 Probable vacuolar amino acid transporter YPQ3 (PQ-loop repeat-containing protein 3) (Protein RTC2) (Restriction of telomere capping protein 2) his__L_v <=> his__L_c
r_4331 [1,4-(N-Acetyl-beta-D-glucosaminyl)]n glycanohydrolase chtn_c + h2o_c <=> acgam_c
GTHDH glutathione:dehydroascorbate oxidoreductase dhdascb_c + 2.0 gthrd_c <=> ascb__L_c + gthox_c
r_1837 L-histidine transport, mitochondrial his__L_c --> his__L_m
NTP3 nucleoside triphosphatase gtp_c + h2o_c --> gdp_c + h_c + pi_c
MNt2 Mn(2+) transport h_e + mn2_e <=> h_c + mn2_c
ARGt2m arginine transport arg__L_c + h_c --> arg__L_m + h_m
r_4592 Mg(2+) transport h_e + mg2_e <=> h_c + mg2_c
r_0438 ferrocytochrome-c:oxygen oxidoreductase focytc_m + 1.266 h_m + 0.25 o2_m --> ficytc_m + 0.5 h2o_m + 0.633 h_c
PSP_L phosphoserine phosphatase (L-serine) h2o_c + pser__L_c --> pi_c + ser__L_c
r_1202 L-isoleucine transport ile__L_v <=> ile__L_c
r_4163 Probable vacuolar amino acid transp

In [177]:
len(model.reactions)-2083

67

28 addition reactions added from sce

## a reaction leads to more hco3 output rather than co2. lets find which added reactions does so. 

In [178]:
model.summary()

Metabolite,Reaction,Flux,C-Number,C-Flux
ca2_e,EX_ca2_e,6.49e-05,0,0.00%
cu2_e,EX_cu2_e,4.095e-05,0,0.00%
fe3_e,EX_fe3_e,0.0004812,0,0.00%
glc__D_e,EX_glc__D_e,1.0,6,98.56%
k_e,EX_k_e,0.04569,0,0.00%
mg2_e,EX_mg2_e,0.004814,0,0.00%
mn2_e,EX_mn2_e,4.735e-05,0,0.00%
na1_e,EX_na1_e,0.002036,0,0.00%
nh4_e,EX_nh4_e,0.4596,0,0.00%
o2_e,EX_o2_e,2.356,0,0.00%

Metabolite,Reaction,Flux,C-Number,C-Flux
4oglu_c,DM_4oglu_c,-0.0001087,5,0.02%
amob_m,DM_amob_m,-0.0001886,15,0.10%
dad_5_m,DM_dad_5_m,-0.0001886,10,0.06%
co2_e,EX_co2_e,-2.619,1,89.99%
fe2_e,EX_fe2_e,-0.0003342,0,0.00%
h2o_e,EX_h2o_e,-4.275,0,0.00%
h_e,EX_h_e,-0.1232,0,0.00%
hco3_e,EX_hco3_e,-0.001955,1,0.07%
zymst_e,EX_zymst_e,-0.01053,27,9.76%


In [179]:
temp_model = model.copy()    

Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpgmjymr5r.lp
Reading time = 0.01 seconds
: 1905 rows, 4300 columns, 17560 nonzeros


In [180]:
sol = model.optimize()
print(sol.fluxes['EX_zymst_e'],sol.fluxes['EX_hco3_e'],sol.fluxes['EX_co2_e'],sol.objective_value)
for x in orphan_sce_reactions_actually_added:
    print(x.id, x.name, x.reaction)

    with temp_model:
        temp_model.add_reactions([x])
        sol = temp_model.optimize()
        print(sol.fluxes['EX_zymst_e'],sol.fluxes['EX_hco3_e'],sol.fluxes['EX_co2_e'],sol.objective_value)
        

0.010526362525668749 0.0 2.6212240608171524 0.07800483549348808
r2073_1 Zn(2+) transport h_e + zn2_e <=> h_c + zn2_c
0.010526362525668768 0.0 2.6212240608171564 0.07800483549348822
r_4162 Probable vacuolar amino acid transporter YPQ3 (PQ-loop repeat-containing protein 3) (Protein RTC2) (Restriction of telomere capping protein 2) his__L_v <=> his__L_c
0.010526362525668721 0.0 2.6212240608171467 0.0780048354934879
r_4331 [1,4-(N-Acetyl-beta-D-glucosaminyl)]n glycanohydrolase chtn_c + h2o_c <=> acgam_c
0.010526362525668754 0.0 2.6212240608171533 0.0780048354934881
GTHDH glutathione:dehydroascorbate oxidoreductase dhdascb_c + 2.0 gthrd_c <=> ascb__L_c + gthox_c
0.010526362525668775 0.0019547621750490725 2.619269298642111 0.07800483549348829
r_1837 L-histidine transport, mitochondrial his__L_c --> his__L_m
0.010526362525668746 0.00781904870019627 2.6134050121169596 0.07800483549348809
NTP3 nucleoside triphosphatase gtp_c + h2o_c --> gdp_c + h_c + pi_c
0.010526362525668796 0.0019547621750490

Ignoring reaction 'PSP_L' since it already exists.


PSP_L phosphoserine phosphatase (L-serine) h2o_c + pser__L_c --> pi_c + ser__L_c
0.010526362525668829 0.001954762175049065 2.6192692986421227 0.0780048354934887
r_1202 L-isoleucine transport ile__L_v <=> ile__L_c
0.010526362525668706 0.001954762175049053 2.6192692986420947 0.07800483549348776
r_4163 Probable vacuolar amino acid transporter YPQ3 (PQ-loop repeat-containing protein 3) (Protein RTC2) (Restriction of telomere capping protein 2) lys__L_v <=> lys__L_c
0.010526362525668704 0.0 2.621224060817142 0.07800483549348775
GLYOX3 (R)-lactate hydro-lyase h_c + lac__D_c <=> h2o_c + mthgxl_c
0.01052636252566875 0.0 2.6212240608171533 0.07800483549348809
SERAT serine O-acetyltransferase accoa_c + ser__L_c --> acser_c + coa_c
0.010526362525668744 0.0 2.6212240608171524 0.07800483549348805
r_4275 Fe(II):NADP+ oxidoreductase 2.0 fe3_e + nadph_c <=> 2.0 fe2_c + h_c + nadp_c
0.018071505434529594 0.0 0.19936681848978102 0.13391756222557039
r_4589 Cu2(+) transport cu2_e + h_e <=> cu2_c + h_c
0.01

r_4275 Fe(II):NADP+ oxidoreductase 2.0 fe3_e + nadph_c <=> 2.0 fe2_c + h_c + nadp_c

This reaction leads to no co2 outlux and to more relative outflux from zymst

lets see if we can figure out what is going on by adding that reaction. We will create an output flux file for visualization on escher.io.

In [181]:
for x in orphan_sce_reactions_actually_added:
    print(x)

r2073_1: h_e + zn2_e <=> h_c + zn2_c
r_4162: his__L_v <=> his__L_c
r_4331: chtn_c + h2o_c <=> acgam_c
GTHDH: dhdascb_c + 2.0 gthrd_c <=> ascb__L_c + gthox_c
r_1837: his__L_c --> his__L_m
NTP3: gtp_c + h2o_c --> gdp_c + h_c + pi_c
MNt2: h_e + mn2_e <=> h_c + mn2_c
ARGt2m: arg__L_c + h_c --> arg__L_m + h_m
r_4592: h_e + mg2_e <=> h_c + mg2_c
r_0438: focytc_m + 1.266 h_m + 0.25 o2_m --> ficytc_m + 0.5 h2o_m + 0.633 h_c
PSP_L: h2o_c + pser__L_c --> pi_c + ser__L_c
r_1202: ile__L_v <=> ile__L_c
r_4163: lys__L_v <=> lys__L_c
GLYOX3: h_c + lac__D_c <=> h2o_c + mthgxl_c
SERAT: accoa_c + ser__L_c --> acser_c + coa_c
r_4275: 2.0 fe3_e + nadph_c <=> 2.0 fe2_c + h_c + nadp_c
r_4589: cu2_e + h_e <=> cu2_c + h_c
CHOLtu: chol_c <=> chol_e
r_1086: atp_c + h2o_c + h_c --> adp_c + 2.0 h_v + pi_c
r_4587: ca2_e + h_e <=> ca2_c + h_c
r_1919: lys__L_c --> lys__L_m
r_4312: amet_c + carn_c <=> ahcys_c + ans_c + h_c
r_1220: tyr__L_v <=> tyr__L_c
Clt: cl_e <=> cl_c
SERD_D: ser__D_c --> nh4_c + pyr_c
GGCLUT2: gt

we can separate the above reactions that need to be added into a few categories:

transport (vacuole to cytoplasm) 5 rxns:
- r_4162: his__L_v <=> his__L_c
- r_1220: tyr__L_v <=> tyr__L_c
- r_4163: lys__L_v <=> lys__L_c
- r_1208: leu__L_v <=> leu__L_c
- r_1202: ile__L_v <=> ile__L_c

transport (mitochondria to cytoplasm) 3 rxns:
- r_1919: lys__L_c --> lys__L_m
- ARGt2m: arg__L_c + h_c --> arg__L_m + h_m
- r_1837: his__L_c --> his__L_m

transport (extracellular to cytoplasm) 8 rxns:
- r_4587: ca2_e + h_e <=> ca2_c + h_c
- r2073_1: h_e + zn2_e <=> h_c + zn2_c
- MNt2: h_e + mn2_e <=> h_c + mn2_c
- Clt: cl_e <=> cl_c
- CHOLtu: chol_c <=> chol_e
- r_4589: cu2_e + h_e <=> cu2_c + h_c
- r_4592: h_e + mg2_e <=> h_c + mg2_c
- r_4275: 2.0 fe3_e + nadph_c <=> 2.0 fe2_c + h_c + nadp_c

metabolic reactions 13 rxns: 
- r_4331: chtn_c + h2o_c <=> acgam_c -- verfied, consolidated into CHTNASE. 
- r_4312: amet_c + carn_c <=> ahcys_c + ans_c + h_c
- GGCLUT2: gthrd_c --> 5oxpro_c + cgly_c
- NTP3: gtp_c + h2o_c --> gdp_c + h_c + pi_c
- r_4403: h2o_c + pppi_c --> pi_c + ppi_c
- GLYOX3: h_c + lac__D_c <=> h2o_c + mthgxl_c
- GTHDH: dhdascb_c + 2.0 gthrd_c <=> ascb__L_c + gthox_c verfied, consolidated into DHAOX_c.
- SERD_D: ser__D_c --> nh4_c + pyr_c
- r_0438: focytc_m + 1.266 h_m + 0.25 o2_m --> ficytc_m + 0.5 h2o_m + 0.633 h_c
- SERAT: accoa_c + ser__L_c --> acser_c + coa_c
- r_1086: atp_c + h2o_c + h_c --> adp_c + 2.0 h_v + pi_c
- r_1085: atp_c + h2o_c + h_c --> adp_c + 2.0 h_g + pi_c (not added to model, same genes as vacuole ATPase).
- PSP_L: h2o_c + pser__L_c --> pi_c + ser__L_c


In [182]:
sce.reactions.r_4331

0,1
Reaction identifier,r_4331
Name,"[1,4-(N-Acetyl-beta-D-glucosaminyl)]n glycanohydrolase"
Memory address,0x16aa2e990
Stoichiometry,chtn_c + h2o_c <=> acgam_c  Chitin + H2O H2O <=> N-Acetyl-D-glucosamine
GPR,YDR371W or YLR286C
Lower bound,-1000.0
Upper bound,1000.0


In [183]:
for m in model.metabolites.acgam_c.reactions:
    print(m)

ENGASE: h2o_c + s2l2n2m2m_c --> acgam_c + s2l2n2m2mn_c
CHTNASE: chtn_c + 2.0 h2o_c --> 3.0 acgam_c
ENGASE2: h2o_c + n2m2nm_c --> acgam_c + n2m2nmn_c


In [184]:
model.reactions.CHTNASE

0,1
Reaction identifier,CHTNASE
Name,CHTNASE
Memory address,0x2a184bc90
Stoichiometry,chtn_c + 2.0 h2o_c --> 3.0 acgam_c  Chitin + 2.0 H2O H2O --> 3.0 N-Acetyl-D-glucosamine
GPR,Lipst1_1_149679 or Lipst1_1_102785
Lower bound,0.0
Upper bound,1000.0


In [185]:
model.reactions.CHTNASE.check_mass_balance()

{}

In [186]:
sce.reactions.r_4331.check_mass_balance()

{'C': -16.0, 'H': -28.0, 'N': -2.0, 'O': -11.0}

In [187]:
sce.reactions.r_4331

0,1
Reaction identifier,r_4331
Name,"[1,4-(N-Acetyl-beta-D-glucosaminyl)]n glycanohydrolase"
Memory address,0x16aa2e990
Stoichiometry,chtn_c + h2o_c <=> acgam_c  Chitin + H2O H2O <=> N-Acetyl-D-glucosamine
GPR,YDR371W or YLR286C
Lower bound,-1000.0
Upper bound,1000.0


sce reaction r_4331 has a different formula for chtn (c8) versus model chtn (c24). Add genes to CHTNASE in the model.

In [188]:
sce_added_orthologs_orhpans.loc[sce_added_orthologs_orhpans[0]=='r_4331']

Unnamed: 0,0,1,2,3,4
50,r_4331,"[1,4-(N-Acetyl-beta-D-glucosaminyl)]n glycanoh...",chtn_c + h2o_c <=> acgam_c,YDR371W or YLR286C,Lipst1_1_102785 or Lipst1_1_42376


gene Lipst1_1_42376 is not yet in the model. 

In [189]:
model.reactions.CHTNASE.gene_reaction_rule = 'Lipst1_1_149679 or Lipst1_1_102785 or Lipst1_1_42376'

second metabolic reaction.
- r_4312 amet_c + carn_c <=> ahcys_c + ans_c + h_c

In [190]:
sce.reactions.r_4312

0,1
Reaction identifier,r_4312
Name,S-adenosyl-L-methionine:carnosine N-methyltransferase
Memory address,0x16aa26ed0
Stoichiometry,amet_c + carn_c <=> ahcys_c + ans_c + h_c  S-Adenosyl-L-methionine + L-Carnosine <=> S-Adenosyl-L-homocysteine + Anserine + H+
GPR,YNL092W
Lower bound,-1000.0
Upper bound,1000.0


In [191]:
for r in model.metabolites.ahcys_c.reactions:
    print(r, r.name)

NNMT: amet_c + ncam_c --> 1mncam_c + ahcys_c Nicotinamide N methyltransferase
SALMCOM2: adrnl_c + amet_c --> ahcys_c + h_c + mepi_c S-Adenosyl-L-methionine:catechol O-methyltransferase
SALMCOM: amet_c + nrpphr_c --> ahcys_c + h_c + normete__L_c S-Adenosyl-L-methionine:catechol O-methyltransferase
TM1819Z1819Z: 3.0 amet_c + dghs1819Z1819Z_c --> 3.0 ahcys_c + dgts1819Z1819Z_c + 3.0 h_c Betaine lipid synthase (trimethylase) (18:1(9Z)/18:1(9Z))
DIPS: amet_c + caphis_c --> ahcys_c + cmaphis_c + h_c Diphthine synthase
TM1601819Z: 3.0 amet_c + dghs1601819Z_c --> 3.0 ahcys_c + dgts1601819Z_c + 3.0 h_c Betaine lipid synthase (trimethylase) (16:0/18:1(9Z))
AHCYStr: ahcys_c <=> ahcys_r S-Adenosyl-L-homocysteine intracellular diffusion
MALCOAMT: amet_c + malcoa_c --> ahcys_c + malcoame_c Malonyl-CoA methyltransferase
CCOAOMT: amet_c + caffcoa_c <=> ahcys_c + ferulcoa_c + h_c Caffeoyl-CoA O-methyltransferase
UPP3MT: 2.0 amet_c + uppg3_c --> 2.0 ahcys_c + dscl_c + h_c Uroporphyrinogen methyltransfer

reaction not in model. add. 

In [192]:
r_copy = sce.reactions.r_4312.copy()

In [193]:
sce_added_orthologs_orhpans.loc[sce_added_orthologs_orhpans[0]=='r_4312']

Unnamed: 0,0,1,2,3,4
361,r_4312,S-adenosyl-L-methionine:carnosine N-methyltran...,amet_c + carn_c <=> ahcys_c + ans_c + h_c,YNL092W,Lipst1_1_71580


In [194]:
model.add_reactions([r_copy])
model.reactions.r_4312.gene_reaction_rule = 'Lipst1_1_71580'

metabolic reaction #3
- GGCLUT2: gthrd_c --> 5oxpro_c + cgly_c


In [195]:
sce.reactions.GGCLUT2

0,1
Reaction identifier,GGCLUT2
Name,Glutathione-specific gamma-glutamylcyclotransferase (Gamma-GCG) (EC 4.3.2.-)
Memory address,0x16a9ec410
Stoichiometry,gthrd_c --> 5oxpro_c + cgly_c  Glutathione --> 5-Oxoproline + Cys-Gly
GPR,YER163C
Lower bound,0.0
Upper bound,1000.0


In [196]:
for r in model.metabolites.cgly_c.reactions:
    print(r, r.name)

CGLYt3_2: cgly_e + 2.0 h_e --> cgly_c + 2.0 h_c Cys-Gly transport in via proton symport
GTHRDH_syn: gthrd_c + h2o_c --> cgly_c + glu__L_c Glutathione hydralase (periplasmic)
GTMLT: ala__L_c + gthrd_c --> cgly_c + gluala_c G glutamyltransferase
AMPTASECG: cgly_c + h2o_c --> cys__L_c + gly_c Alanyl aminopeptidase (cys-gly)


In [197]:
for r in model.metabolites.get_by_id('5oxpro_c').reactions:
    print(r, r.name)

OPAH: 5oxpro_c + atp_c + 2.0 h2o_c --> adp_c + glu__L_c + h_c + pi_c 5-Oxoproline amidohydrolase (ATP-hydrolysing) (ir)


In [198]:
r_copy = sce.reactions.GGCLUT2.copy()
sce_added_orthologs_orhpans.loc[sce_added_orthologs_orhpans[0]=='GGCLUT2']

Unnamed: 0,0,1,2,3,4
412,GGCLUT2,Glutathione-specific gamma-glutamylcyclotransf...,gthrd_c --> 5oxpro_c + cgly_c,YER163C,Lipst1_1_47595


In [199]:
model.add_reactions([r_copy])
model.reactions.GGCLUT2.gene_reaction_rule = 'Lipst1_1_47595'

metabolic reaction #4.
- NTP3: gtp_c + h2o_c --> gdp_c + h_c + pi_c


In [200]:
sce.reactions.NTP3

0,1
Reaction identifier,NTP3
Name,nucleoside triphosphatase
Memory address,0x16a550810
Stoichiometry,gtp_c + h2o_c --> gdp_c + h_c + pi_c  GTP C10H12N5O14P3 + H2O H2O --> GDP C10H12N5O11P2 + H+ + Phosphate
GPR,YAL035W or YLL001W or YER005W
Lower bound,0.0
Upper bound,1000.0


In [201]:
for r in model.metabolites.gdp_c.reactions:
    print(r, r.name)

DOLPMTcer: dolp_c + gdpmann_c --> dolmanp_r + gdp_c Dolichyl phosphate D mannosyltransferase
GK2: datp_c + gmp_c <=> dadp_c + gdp_c Guanylate kinase  GMPdATP 
G16MT: doldpglcnacglcnacmanman_c + gdpmann_c --> doldpglcnacglcnacman_man_man_c + gdp_c + h_c Alpha-1,6-mannosyltransferase (doldpglcnacglcnacman(man)man)
NDPK1: atp_c + gdp_c <=> adp_c + gtp_c Nucleoside-diphosphate kinase (ATP:GDP)
DOLPMT: dolp_c + gdpmann_c --> dolmanp_c + gdp_c Dolichyl-phosphate beta-D-mannosyltransferase
UMPK4: gtp_c + ump_c <=> gdp_c + udp_c UMP kinase (GTP)
GK1: atp_c + gmp_c <=> adp_c + gdp_c Guanylate kinase (GMP:ATP)
CYTDK2: cytd_c + gtp_c --> cmp_c + gdp_c + h_c Cytidine kinase (GTP)
G13MT_U: gdpmann_c + 0.1 mpdol_U_c --> gdp_c + h_c + 0.1 m1mpdol_U_c Glycolipid 1,3-alpha-D-mannosyltransferase (uterus)
ATPATF3: gdp_c + gtp_c + h_c --> gp4g_c + pi_c ATP adenylyltransferase
GTPt2m: gdp_m + gtp_c + h_c --> gdp_c + gtp_m + h_m GTPGDP translocase  mitochondrial  electroneutral 
G12MT1: doldpglcnacglcnacman

In [202]:
r_copy = sce.reactions.NTP3.copy()
x = sce_added_orthologs_orhpans.loc[sce_added_orthologs_orhpans[0]=='NTP3'][4]

In [203]:
x.values

array(['Lipst1_1_124163 or Lipst1_1_73932 or Lipst1_1_5321'], dtype=object)

In [204]:
model.add_reactions([r_copy])
model.reactions.NTP3.gene_reaction_rule = 'Lipst1_1_73932 or Lipst1_1_124163 or Lipst1_1_5321'

metabolic reaction #5

In [205]:
sce.reactions.r_4403

0,1
Reaction identifier,r_4403
Name,Triphosphate phosphohydrolase
Memory address,0x16a8efa50
Stoichiometry,h2o_c + pppi_c --> pi_c + ppi_c  H2O H2O + Inorganic triphosphate --> Phosphate + Diphosphate
GPR,YHR201C or YDR452W
Lower bound,0.0
Upper bound,1000.0


In [206]:
for r in model.metabolites.pppi_c.reactions:
    print(r, r.name)

PPPIt2r: h_e + pppi_e <=> h_c + pppi_c Triphosphate reversible transport via symport


In [207]:
r_copy = sce.reactions.r_4403.copy()
x = sce_added_orthologs_orhpans.loc[sce_added_orthologs_orhpans[0]=='r_4403'][4]

In [208]:
x.values

array(['Lipst1_1_72748'], dtype=object)

In [209]:
model.add_reactions([r_copy])
model.reactions.r_4403.gene_reaction_rule = 'Lipst1_1_72748'

metabolic reaction #6.
- GLYOX3: h_c + lac__D_c <=> h2o_c + mthgxl_c



In [210]:
sce.reactions.GLYOX3

0,1
Reaction identifier,GLYOX3
Name,(R)-lactate hydro-lyase
Memory address,0x16a709e90
Stoichiometry,h_c + lac__D_c <=> h2o_c + mthgxl_c  H+ + D-Lactate <=> H2O H2O + Methylglyoxal
GPR,YDR533C or YMR322C or YOR391C or YPL280W
Lower bound,-1000.0
Upper bound,1000.0


In [211]:
for r in model.metabolites.mthgxl_c.reactions:
    print(r, r.name)

ALR2: h_c + mthgxl_c + nadph_c --> acetol_c + nadp_c Aldose reductase (methylglyoxal)
LGTHL: gthrd_c + mthgxl_c --> lgt__S_c Lactoylglutathione lyase
LALDO3: h_c + mthgxl_c + nadph_c --> lald__L_c + nadp_c L-Lactaldehyde:NADP+ 1-oxidoreductase
LALDO2: h_c + mthgxl_c + nadph_c --> lald__D_c + nadp_c D-Lactaldehyde:NADP+ 1-oxidoreductase


In [212]:
r_copy = sce.reactions.GLYOX3.copy()
x = sce_added_orthologs_orhpans.loc[sce_added_orthologs_orhpans[0]=='GLYOX3'][4]

In [213]:
x.values

array(['Lipst1_1_3208'], dtype=object)

In [214]:
model.add_reactions([r_copy])
model.reactions.GLYOX3.gene_reaction_rule = 'Lipst1_1_3208'

metabolic reaction #7.
- GTHDH: dhdascb_c + 2.0 gthrd_c <=> ascb__L_c + gthox_c


In [215]:
sce.reactions.GTHDH

0,1
Reaction identifier,GTHDH
Name,glutathione:dehydroascorbate oxidoreductase
Memory address,0x16a9f88d0
Stoichiometry,dhdascb_c + 2.0 gthrd_c <=> ascb__L_c + gthox_c  Dehydroascorbate + 2.0 Glutathione <=> L-Ascorbate + Oxidized glutathione
GPR,YKR076W
Lower bound,-1000.0
Upper bound,1000.0


In [216]:
for r in model.metabolites.gthox_c.reactions:
    print(r, r.name)

GRXR: grxox_c + 2.0 gthrd_c --> grxrd_c + gthox_c Glutaredoxin reductase
GTHOr: gthox_c + h_c + nadph_c <=> 2.0 gthrd_c + nadp_c Glutathione oxidoreductase
GTHPi: 2.0 gthrd_c + h2o2_c --> gthox_c + 2.0 h2o_c Glutathione peridoxase
DHAOX_c: dhdascb_c + 2.0 gthrd_c --> ascb__L_c + gthox_c + h_c Dehydroascorbate oxidoreductase


In [217]:
model.reactions.DHAOX_c

0,1
Reaction identifier,DHAOX_c
Name,Dehydroascorbate oxidoreductase
Memory address,0x2a191b950
Stoichiometry,dhdascb_c + 2.0 gthrd_c --> ascb__L_c + gthox_c + h_c  Dehydroascorbate + 2.0 Glutathione --> L-Ascorbate + Oxidized glutathione + H+
GPR,Lipst1_1_5188
Lower bound,0.0
Upper bound,1000.0


In [218]:
# r_copy = sce.reactions.GTHDH.copy()
x = sce_added_orthologs_orhpans.loc[sce_added_orthologs_orhpans[0]=='GTHDH'][4]

In [219]:
x.values

array(['Lipst1_1_308673'], dtype=object)

In [220]:
model.reactions.DHAOX_c.check_mass_balance()

{}

In [221]:
sce.reactions.GTHDH.check_mass_balance()

{'charge': -1.0, 'H': -1.0}

add reaction gene rule to model. it is the same reaction as DHAOX_c

In [222]:
model.reactions.DHAOX_c.gene_reaction_rule = 'Lipst1_1_5188 or Lipst1_1_308673'

metabolic reaction #8.
- SERD_D: ser__D_c --> nh4_c + pyr_c



In [223]:
sce.reactions.SERD_D

0,1
Reaction identifier,SERD_D
Name,D-serine ammonia-lyase
Memory address,0x16aa0c290
Stoichiometry,ser__D_c --> nh4_c + pyr_c  D-Serine --> Ammonium + Pyruvate
GPR,YGL196W
Lower bound,0.0
Upper bound,1000.0


In [224]:
for r in model.metabolites.ser__D_c.reactions:
    print(r, r.name)

DSERDHr: nadp_c + ser__D_c <=> 2amsa_c + h_c + nadph_c D-serine dehydrogenase
SER_Dt: ser__D_e --> ser__D_c D-serine transport


In [225]:
r_copy = sce.reactions.SERD_D.copy()
x = sce_added_orthologs_orhpans.loc[sce_added_orthologs_orhpans[0]=='SERD_D'][4]

In [226]:
x.values

array(['Lipst1_1_1096'], dtype=object)

In [227]:
model.add_reactions([r_copy])
model.reactions.SERD_D.gene_reaction_rule = 'Lipst1_1_1096'

metabolic reaction #9.
- r_0438: focytc_m + 1.266 h_m + 0.25 o2_m --> ficytc_m + 0.5 h2o_m + 0.633 h_c



In [228]:
sce.reactions.r_0438

0,1
Reaction identifier,r_0438
Name,ferrocytochrome-c:oxygen oxidoreductase
Memory address,0x15b04a550
Stoichiometry,focytc_m + 1.266 h_m + 0.25 o2_m --> ficytc_m + 0.5 h2o_m + 0.633 h_c  Ferrocytochrome c C42H53FeN8O6S2 + 1.266 H+ + 0.25 Oxygen --> Ferricytochrome c C42H52FeN8O6S2 + 0.5 H2O H2O + 0.633 H+
GPR,(Q0045 and Q0250 and Q0275 and YDL067C and YEL039C and YGL187C and YGL191W and YHR051W and...
Lower bound,0.0
Upper bound,1000.0


In [229]:
for r in model.metabolites.focytc_m.reactions:
    print(r, r.name)

CYOR_u9m: 2.0 ficytc_m + 2.0 h_m + q9h2_m --> 2.0 focytc_m + 4.0 h_c + q9_m Ubiquinol 6 cytochrome c reductase
CCP2_m: 2.0 focytc_m + h2o2_m + 2.0 h_m --> 2.0 ficytc_m + 2.0 h2o_m Cytochrome c peroxidase, mitochondrial
CYOO6m: 4.0 focytc_m + 8.0 h_m + o2_m --> 4.0 ficytc_m + 2.0 h2o_m + 4.0 h_c Cytochrome c oxidase Complex IV
D_LACDcm: 2.0 ficytc_m + lac__D_c --> 2.0 focytc_m + 2.0 h_c + pyr_c D lactate dehydrogenase  cytosolicmitochondrial
L_LACDcm: 2.0 ficytc_m + lac__L_c --> 2.0 focytc_m + 2.0 h_c + pyr_c L-Lactate dehydrogenase, cytosolic/mitochondrial


In [230]:
model.reactions.CYOO6m.check_mass_balance()

{}

In [231]:
sce.reactions.r_0438.check_mass_balance()

{'charge': 0.367, 'H': 0.367}

In [232]:
# r_copy = sce.reactions.SERD_D.copy()
x = sce_added_orthologs_orhpans.loc[sce_added_orthologs_orhpans[0]=='r_0438'][4]

In [233]:
x.values

array(['Lipst1_1_3895 or Lipst1_1_65274 or Lipst1_1_65503 or Lipst1_1_54407 or Lipst1_1_25868 or Lipst1_1_4546 or Lipst1_1_67449 or Lipst1_1_55045 or Lipst1_1_3870 or Lipst1_1_41423 or Lipst1_1_106947'],
      dtype=object)

In [234]:
temp = x.values
temp = [t.split(' or ') for t in temp]

In [235]:
temp2 = []
for g in model.reactions.CYOO6m.genes:
    temp2.append(g.id)

In [236]:
temp=temp[0]


In [237]:
set(temp)-set(temp2)

{'Lipst1_1_106947',
 'Lipst1_1_25868',
 'Lipst1_1_3870',
 'Lipst1_1_41423',
 'Lipst1_1_4546',
 'Lipst1_1_65274'}

In [238]:
set(temp2)-set(temp)

{'Lipst1_1_5671'}

In [239]:
model.reactions.CYOO6m.gene_reaction_rule

'Lipst1_1_54407 or Lipst1_1_65503 or Lipst1_1_67449 or Lipst1_1_3895 or Lipst1_1_55045 or Lipst1_1_5671'

These reactions above need consolidation. 

In [240]:
# model.add_reactions([r_copy])
model.reactions.CYOO6m.gene_reaction_rule = 'Lipst1_1_3870 or Lipst1_1_54407 or Lipst1_1_65503 or Lipst1_1_25868 or Lipst1_1_4546 or Lipst1_1_55045 or Lipst1_1_106947 or Lipst1_1_3895 or Lipst1_1_41423 or Lipst1_1_65274 or Lipst1_1_67449 or Lipst1_1_5671'

metabolic reaction #9.
- SERAT: accoa_c + ser__L_c --> acser_c + coa_c



In [241]:
sce.reactions.SERAT

0,1
Reaction identifier,SERAT
Name,serine O-acetyltransferase
Memory address,0x16a57a6d0
Stoichiometry,accoa_c + ser__L_c --> acser_c + coa_c  Acetyl-CoA + L-Serine --> O-Acetyl-L-serine + Coenzyme A
GPR,YHR013C and YDL040C and YOR253W
Lower bound,0.0
Upper bound,1000.0


In [242]:
for r in model.metabolites.acser_c.reactions:
    print(r, r.name)

CYSS: acser_c + h2s_c --> ac_c + cys__L_c + h_c Cysteine synthase


In [243]:
r_copy = sce.reactions.SERAT.copy()
x = sce_added_orthologs_orhpans.loc[sce_added_orthologs_orhpans[0]=='SERAT'][4]

In [244]:
x.values

array(['Lipst1_1_4043 or Lipst1_1_164418'], dtype=object)

In [245]:
model.add_reactions([r_copy])
model.reactions.SERAT.gene_reaction_rule = 'Lipst1_1_164418 or Lipst1_1_4043'

metabolic reaction #10.
- r_1086: atp_c + h2o_c + h_c --> adp_c + 2.0 h_v + pi_c
- r_1085: atp_c + h2o_c + h_c --> adp_c + 2.0 h_g + pi_c


In [246]:
sce.reactions.r_1086

0,1
Reaction identifier,r_1086
Name,"V-ATPase, vacuole"
Memory address,0x16a593c50
Stoichiometry,atp_c + h2o_c + h_c --> adp_c + 2.0 h_v + pi_c  ATP C10H12N5O13P3 + H2O H2O + H+ --> ADP C10H12N5O10P2 + 2.0 H+ + Phosphate
GPR,(YBR127C and YDL185W and YEL027W and YEL051W and YGR020C and YHR026W and YHR039C-A and YKL080W...
Lower bound,0.0
Upper bound,1000.0


In [247]:
for r in model.metabolites.h_v.reactions:
    print(r, r.name)

GLUt7: glu__L_v + h_v --> glu__L_c + h_c L glutamate transport out via proton symport  vacuolar
PSERDv_RT: h_v + 0.01 ps_RT_v --> co2_v + 0.01 pe_RT_v Phosphatidylserine decarboxylase  yeast specific  vacuolar
TYRt6: h_v + tyr__L_c --> h_c + tyr__L_v L tyrosine transport in via proton antiport  vacuolar 
TYRt7: h_v + tyr__L_v --> h_c + tyr__L_c L tyrosine transport out via proton symport  vacuolar
ATPS2v: atp_c + h2o_c + h_c --> adp_c + 2.0 h_v + pi_c Vacuolar H+-ATPase
ILEt7: h_v + ile__L_v --> h_c + ile__L_c L isoleucine transport out via proton symport  vacuolar
LEUt7: h_v + leu__L_v --> h_c + leu__L_c L leucine transport out via proton symport  vacuolar
LYSt6: h_v + lys__L_c --> h_c + lys__L_v L lysine transport in via proton antiport  vacuolar 
GLNt6: gln__L_c + h_v --> gln__L_v + h_c L glutamine transport in via proton antiport  vacuolar 
ASPt7: asp__L_v + h_v --> asp__L_c + h_c L aspartate transport out via proton symport  vacuolar
GTHRDabcv: atp_v + gthrd_c + h2o_v --> adp_v + 

In [248]:
model.reactions.ATPS2v.gene_reaction_rule

'Lipst1_1_27636 and Lipst1_1_4281 and Lipst1_1_103286 and Lipst1_1_4502 and Lipst1_1_72599 and Lipst1_1_71403 and Lipst1_1_149291 and Lipst1_1_67780 and Lipst1_1_4471 and Lipst1_1_2781 and Lipst1_1_60868 and Lipst1_1_69438 and Lipst1_1_102936 and Lipst1_1_64406'

In [249]:
r_copy = sce.reactions.r_1086.copy()
x = sce_added_orthologs_orhpans.loc[sce_added_orthologs_orhpans[0]=='r_1086'][4]

In [250]:
x.values

array(['Lipst1_1_2781 or Lipst1_1_102936 or Lipst1_1_60868 or Lipst1_1_71403 or Lipst1_1_69438 or Lipst1_1_4502 or Lipst1_1_149291 or Lipst1_1_72599 or Lipst1_1_103286 or Lipst1_1_4471 or Lipst1_1_67780 or Lipst1_1_4281 or Lipst1_1_27636 or Lipst1_1_64406 or Lipst1_1_51333'],
      dtype=object)

In [251]:
temp = x.values
temp = [t.split(' or ') for t in temp]

In [252]:
temp

[['Lipst1_1_2781',
  'Lipst1_1_102936',
  'Lipst1_1_60868',
  'Lipst1_1_71403',
  'Lipst1_1_69438',
  'Lipst1_1_4502',
  'Lipst1_1_149291',
  'Lipst1_1_72599',
  'Lipst1_1_103286',
  'Lipst1_1_4471',
  'Lipst1_1_67780',
  'Lipst1_1_4281',
  'Lipst1_1_27636',
  'Lipst1_1_64406',
  'Lipst1_1_51333']]

In [253]:
temp = set(temp[0])
temp

{'Lipst1_1_102936',
 'Lipst1_1_103286',
 'Lipst1_1_149291',
 'Lipst1_1_27636',
 'Lipst1_1_2781',
 'Lipst1_1_4281',
 'Lipst1_1_4471',
 'Lipst1_1_4502',
 'Lipst1_1_51333',
 'Lipst1_1_60868',
 'Lipst1_1_64406',
 'Lipst1_1_67780',
 'Lipst1_1_69438',
 'Lipst1_1_71403',
 'Lipst1_1_72599'}

In [254]:
temp2 = []
for g in model.reactions.ATPS2v.genes:
    temp2.append(g.id)

In [255]:
temp-set(temp2)

{'Lipst1_1_51333'}

In [256]:
set(temp2)-temp

set()

one extra gene was not included in the model. will add

In [257]:
# model.add_reactions([r_copy])
model.reactions.ATPS2v.gene_reaction_rule = 'Lipst1_1_71403 or Lipst1_1_51333 or Lipst1_1_103286 or Lipst1_1_27636 or Lipst1_1_149291 or Lipst1_1_72599 or Lipst1_1_2781 or Lipst1_1_4471 or Lipst1_1_67780 or Lipst1_1_4502 or Lipst1_1_102936 or Lipst1_1_69438 or Lipst1_1_4281 or Lipst1_1_60868 or Lipst1_1_64406'

metabolic reaction #10.
- r_1086: atp_c + h2o_c + h_c --> adp_c + 2.0 h_v + pi_c
- r_1085: atp_c + h2o_c + h_c --> adp_c + 2.0 h_g + pi_c


In [258]:
sce.reactions.r_1085

0,1
Reaction identifier,r_1085
Name,"V-ATPase, Golgi"
Memory address,0x16a57b850
Stoichiometry,atp_c + h2o_c + h_c --> adp_c + 2.0 h_g + pi_c  ATP C10H12N5O13P3 + H2O H2O + H+ --> ADP C10H12N5O10P2 + 2.0 H+ + Phosphate
GPR,(YPR170W-B and YBR127C and YDL185W and YOR332W and YHR026W and YEL027W and YKL080W and YLR447C...
Lower bound,0.0
Upper bound,1000.0


In [259]:
for r in model.metabolites.h_g.reactions:
    print(r, r.name)

CERGT18g: cer6_18_g + udpg_g --> glccer_18_g + h_g + udp_g Ceramide glucosyltransferase  18C
MIPCS326g_RT: gdpmann_g + 0.01 ipc326_RT_g --> gdp_g + h_g + 0.01 mipc326_RT_g Mannose inositol phosphorylceramide synthase  ceramide 3  26C   yeast specific
NAt3_1g: h_g + na1_c <=> h_c + na1_g Sodium proton antiporter (H:NA is 1:1)
PSERDg_RT: h_g + 0.01 ps_RT_g --> co2_g + 0.01 pe_RT_g Phosphatidylserine decarboxylase  yeast specific  Golgi
MIPCS324g_RT: gdpmann_g + 0.01 ipc324_RT_g --> gdp_g + h_g + 0.01 mipc324_RT_g Mannose inositol phosphorylceramide synthase  ceramide 3  24C   yeast specific
IPC324PLCg_RT: h2o_g + 0.01 ipc324_RT_g --> cer3_24_g + h_g + mi1p__D_g Inositol phosphorylceramide  ceramide 3  24C  phospholipase C
MIPCS126g_RT: gdpmann_g + 0.01 ipc126_RT_g --> gdp_g + h_g + 0.01 mipc126_RT_g Mannose inositol phosphorylceramide synthase  ceramide 1  26C   yeast specific
UDPGLDCg: h_g + udpglcur_g --> co2_g + udpxyl_g UDP-D-glucuronate decarboxylase, Golgi apparatus
MIPC124PLCg_RT:

no atpase reaction in golgi

In [260]:
r_copy = sce.reactions.r_1085.copy()
x = sce_added_orthologs_orhpans.loc[sce_added_orthologs_orhpans[0]=='r_1085'][4]

In [261]:
x.values

array(['Lipst1_1_2781 or Lipst1_1_102936 or Lipst1_1_60868 or Lipst1_1_71403 or Lipst1_1_69438 or Lipst1_1_4502 or Lipst1_1_149291 or Lipst1_1_72599 or Lipst1_1_103286 or Lipst1_1_4471 or Lipst1_1_4281 or Lipst1_1_27636 or Lipst1_1_64406 or Lipst1_1_51333'],
      dtype=object)

In [262]:
temp = x.values
temp = [t.split(' or ') for t in temp]

In [263]:
temp

[['Lipst1_1_2781',
  'Lipst1_1_102936',
  'Lipst1_1_60868',
  'Lipst1_1_71403',
  'Lipst1_1_69438',
  'Lipst1_1_4502',
  'Lipst1_1_149291',
  'Lipst1_1_72599',
  'Lipst1_1_103286',
  'Lipst1_1_4471',
  'Lipst1_1_4281',
  'Lipst1_1_27636',
  'Lipst1_1_64406',
  'Lipst1_1_51333']]

In [264]:
temp = set(temp[0])
temp

{'Lipst1_1_102936',
 'Lipst1_1_103286',
 'Lipst1_1_149291',
 'Lipst1_1_27636',
 'Lipst1_1_2781',
 'Lipst1_1_4281',
 'Lipst1_1_4471',
 'Lipst1_1_4502',
 'Lipst1_1_51333',
 'Lipst1_1_60868',
 'Lipst1_1_64406',
 'Lipst1_1_69438',
 'Lipst1_1_71403',
 'Lipst1_1_72599'}

In [265]:
temp2 = []
for g in model.reactions.ATPS2v.genes:
    temp2.append(g.id)

In [266]:
temp-set(temp2)

set()

no differences from the vacuoloe ATPASE. not added to model. 

In [267]:
model.reactions.ATPS

0,1
Reaction identifier,ATPS
Name,ATPase cytosolic
Memory address,0x15b7fdb10
Stoichiometry,atp_c + h2o_c --> adp_c + h_e + pi_c  ATP C10H12N5O13P3 + H2O H2O --> ADP C10H12N5O10P2 + H+ + Phosphate
GPR,Lipst1_1_3900 or Lipst1_1_73932
Lower bound,0.0
Upper bound,1000.0


In [268]:
rto.reactions.ATPS

0,1
Reaction identifier,ATPS
Name,ATPase cytosolic
Memory address,0x15a0ef290
Stoichiometry,atp_c + h2o_c --> adp_c + h_e + pi_c  ATP C10H12N5O13P3 + H2O H2O --> ADP C10H12N5O10P2 + H+ + Phosphate
GPR,13617 or 15584
Lower bound,0.0
Upper bound,1000.0


cytosolic ATPase may needed looking at for the GPR. 

In [269]:
# model.add_reactions([r_copy])
# model.reactions.ATPS2v.gene_reaction_rule = 'Lipst1_1_71403 or Lipst1_1_51333 or Lipst1_1_103286 or Lipst1_1_27636 or Lipst1_1_149291 or Lipst1_1_72599 or Lipst1_1_2781 or Lipst1_1_4471 or Lipst1_1_67780 or Lipst1_1_4502 or Lipst1_1_102936 or Lipst1_1_69438 or Lipst1_1_4281 or Lipst1_1_60868 or Lipst1_1_64406'

reaction 13.
- PSP_L: h2o_c + pser__L_c --> pi_c + ser__L_c


In [270]:
sce.reactions.PSP_L

0,1
Reaction identifier,PSP_L
Name,phosphoserine phosphatase (L-serine)
Memory address,0x16a4fe890
Stoichiometry,h2o_c + pser__L_c --> pi_c + ser__L_c  H2O + 3-phospho-serine --> phosphate + L-serine
GPR,YGR208W
Lower bound,0.0
Upper bound,1000.0


In [271]:
model.reactions.PSP_L

0,1
Reaction identifier,PSP_L
Name,phosphoserine phosphatase L serine
Memory address,0x2a6985910
Stoichiometry,h2o_c + pser__L_c --> pi_c + ser__L_c  H2O H2O + O-Phospho-L-serine --> Phosphate + L-Serine
GPR,Lipst1_1_1740
Lower bound,0.0
Upper bound,1000.0


In [272]:
r_copy = sce.reactions.PSP_L.copy()
x = sce_added_orthologs_orhpans.loc[sce_added_orthologs_orhpans[0]=='PSP_L'][4]

In [273]:
x.values

array(['Lipst1_1_1740'], dtype=object)

In [274]:
# model.add_reactions([r_copy])
# model.reactions.PSP_L.gene_reaction_rule = 'Lipst1_1_1740'

In [275]:
model.reactions.PSP_L

0,1
Reaction identifier,PSP_L
Name,phosphoserine phosphatase L serine
Memory address,0x2a6985910
Stoichiometry,h2o_c + pser__L_c --> pi_c + ser__L_c  H2O H2O + O-Phospho-L-serine --> Phosphate + L-Serine
GPR,Lipst1_1_1740
Lower bound,0.0
Upper bound,1000.0


In [276]:
genes = model.genes
genes = [g.id for g in genes]
wrong = [g for g in genes if 'Lipst1' not in g]
len(set(wrong))
wrong_g2 = []
wrong_rx2 = []
for g in wrong:
    reac = model.genes.get_by_id(g).reactions
    if not reac:
        print(f'{g} has no associated reaction')
        
        cobra.manipulation.delete.remove_genes(model,[g])
    for rx in reac:
        if rx:
#             print(rx,g)
            wrong_g2.append(g)
            wrong_rx2.append(rx)
print(len(set(wrong_g2)))
print(len(set(wrong_rx2)))

YNL092W has no associated reaction
YER163C has no associated reaction
YAL035W has no associated reaction
YER005W has no associated reaction
YLL001W has no associated reaction
YHR201C has no associated reaction
YDR452W has no associated reaction
YOR391C has no associated reaction
YDR533C has no associated reaction
YMR322C has no associated reaction
YPL280W has no associated reaction
YGL196W has no associated reaction
YHR013C has no associated reaction
YDL040C has no associated reaction
YOR253W has no associated reaction
31
16


In [277]:
len(set(wrong_rx2))

16

In [278]:
wrong_rx2 = [x.id for x in wrong_rx2]
set(wrong_rx2)

{'ACRNtp',
 'C30CPT1',
 'C3STKR1er',
 'C3STKR2er',
 'CRNtp',
 'CYOR_u9m',
 'DNTPPA',
 'DRTPPD',
 'FA160COAabcp',
 'FA183COAabcp',
 'FA240tp',
 'GAPP',
 'H2Otm',
 'OIVD2m',
 'PMDPHT',
 'SERPTer'}

In [279]:
maintained_v01_gpr

['H2Otm',
 'DNTPPA',
 'PMDPHT',
 'ACRNtp',
 'CRNtp',
 'FA240tp',
 'FA160COAabcp',
 'DRTPPD']

## here we will try to adjust the 16 reactions to look for Lst GPRs. 

In [280]:

# obtain reactions to iterate over.
iterate_rxns = set(wrong_rx2)

# reset the information holding lists. 
_names = []
_gpr = []
_id = []
_stoich = []
_subsystem = []
gpr_holder = []

# iterate through the reactions. 
for rx in iterate_rxns:
    
    # obtain the rto reaction object.
    r = rto.reactions.get_by_id(rx)
    
    
    # objects for tracking. 
    count_ortho = 0
    temp_gpr = []
    
    # iterate over the rto reaction genes. 
    for i in range(len(r.genes)):
        
        
        g=list(r.genes)[i]
        g = g.id
        
        # find the ortholog.
        try:
            
            # find the ortholog.
            value = lst_rto_dict[g]
            
            # verify that the ortholog exists & 
            # is not in the model genes &
            if (value):

                count_ortho+=1
        except:
            continue
    
    # if the reaction had at least one gene ortholog in Lst. 
    if count_ortho>0:
        # append reaction metadata. 
        _gpr.append(r.gene_reaction_rule)
        _stoich.append(r.reaction)        
        _names.append(r.name)
        _id.append(r.id)  
        
# obtain the sce reactions that have orthologs.      
new_iterate_rxns = set(_id)                            
                             
print(len(new_iterate_rxns))


# reset the information holding lists. 
_names = []
_gpr = []
_id = []
_stoich = []
_subsystem = []
gpr_holder = []

# iterate over the reactions. 
for rx in new_iterate_rxns:
    
    # obtain the rto reaction object. 
    r = rto.reactions.get_by_id(rx)
    
    # store reaction metadata. 
    _gpr.append(r.gene_reaction_rule)
    _stoich.append(r.reaction)
    _names.append(r.name)
    _id.append(r.id)
    
    # objects for tracking during loop. 
    count_ortho =0
    temp_gpr = []
    
    # iterate over the sce reaction genes. 
    for i in range(len(r.genes)):

        g=list(r.genes)[i]
        g = g.id

        try:
            
            # obtain the Lst ortholog. 
            value = lst_rto_dict[g]
            
            if (value):
                count_ortho+=1
        except:
            continue

    # this implies that at least one gene has been found that was not included in the reaction.
    if count_ortho>0:
        
        # re-run the gene iteration to capture all orthologs. 
        for i in range(len(r.genes)):
            
            g=list(r.genes)[i]
            g = g.id
            
            try:
                # obtain the Lst ortholog. 
                value = lst_rto_dict[g]
                if (value):
                    temp_gpr.append(value)
            except:
                continue
        
        # join multiple genes for a single reaction together using 'or' rules. 
        temp = " or ".join(sub_item.replace("None", "").replace("NA"," ") for sub_item in temp_gpr)
        gpr_holder.append(temp)

# pd.DataFrame() for storing the reactions.         
rto_added_orthologs_orhpans = pd.DataFrame([_id,_names,_stoich,_gpr,gpr_holder]).T

3


In [281]:
rto_added_orthologs_orhpans

Unnamed: 0,0,1,2,3,4
0,SERPTer,Palmitoyl-CoA:L-serine C-palmitoyltransferase ...,h_r + pmtcoa_r + ser__L_r --> 3dsphgn_r + co2_...,10303 and 9394 and 9425,Lipst1_1_72156 or Lipst1_1_71191
1,CYOR_u9m,Ubiquinol 6 cytochrome c reductase,2.0 ficytc_m + 2.0 h_m + q9h2_m --> 2.0 focytc...,COB and 9705 and 9198 and 11618 and 15758 and ...,Lipst1_1_70538 or Lipst1_1_112667 or Lipst1_1_...
2,OIVD2m,2-oxoisovalerate dehydrogenase (acylating; 3-m...,3mob_m + coa_m + nad_m --> co2_m + ibcoa_m + n...,(10040 and 11183 and 12566 and 15436) or (1004...,Lipst1_1_67480


In [282]:
for r in iterate_rxns:
    print(r, model.reactions.get_by_id(r).gene_reaction_rule)

DNTPPA 15385
ACRNtp 10149
CRNtp 10149
FA240tp 9912
H2Otm 9014 or Lipst1_1_156565
SERPTer 10303 and 9394 and 9425
CYOR_u9m 15681 or 15758 or 13608 or 13614 or 15231 or COB or 9830 or 9705 or 8802 or 11618 or 12966 or 9198 or Lipst1_1_6808 or Lipst1_1_38038 or Lipst1_1_3376 or Lipst1_1_70538 or Lipst1_1_42388 or Lipst1_1_112667
PMDPHT 11513
C3STKR2er 14213 or Lipst1_1_50180
C30CPT1 9315
GAPP 8460
FA160COAabcp 13167 and 9637
C3STKR1er 14213 or Lipst1_1_50180
DRTPPD 15506
OIVD2m (10040 and 11183 and 12566 and 15436) or (10040 and 11188 and 12566 and 15436)
FA183COAabcp 13167 and 9637


In [283]:
model.reactions.H2Otm.gene_reaction_rule = 'Lipst1_1_156565'
model.reactions.C3STKR2er.gene_reaction_rule = 'Lipst1_1_50180'
model.reactions.C3STKR1er.gene_reaction_rule = 'Lipst1_1_50180'


In [284]:
model.reactions.SERPTer.gene_reaction_rule = 'Lipst1_1_71191 or Lipst1_1_72156'
model.reactions.OIVD2m.gene_reaction_rule = 'Lipst1_1_67480'
# model.reactions.C3STKR1er.gene_reaction_rule = 'Lipst1_1_50180'


In [285]:
x = rto_added_orthologs_orhpans[rto_added_orthologs_orhpans[0]=='CYOR_u9m'][4].values

15681 or 15758 or 13608 or 12966 or 15231 or 9198 or 9830 or 9705 or 11618 or COB or 8802 or 13614 or Lipst1_1_3376 or Lipst1_1_38038 or Lipst1_1_42388 or Lipst1_1_70538 or Lipst1_1_6808 or Lipst1_1_112667

In [286]:
x

array(['Lipst1_1_70538 or Lipst1_1_112667 or Lipst1_1_49747 or Lipst1_1_72848 or Lipst1_1_3376 or Lipst1_1_65274 or Lipst1_1_42388 or Lipst1_1_6808 or Lipst1_1_38038 or Lipst1_1_47735'],
      dtype=object)

In [287]:
temp = x
temp = [t.split(' or ') for t in temp]

In [288]:
temp2 = []
for g in model.reactions.CYOR_u9m.genes:
    temp2.append(g.id)

In [289]:
temp=temp[0]


In [290]:
set(temp)-set(temp2)

{'Lipst1_1_47735', 'Lipst1_1_49747', 'Lipst1_1_65274', 'Lipst1_1_72848'}

In [291]:
set(temp2)-set(temp)

{'11618',
 '12966',
 '13608',
 '13614',
 '15231',
 '15681',
 '15758',
 '8802',
 '9198',
 '9705',
 '9830',
 'COB'}

In [292]:
model.reactions.CYOR_u9m.gene_reaction_rule = 'Lipst1_1_49747 or Lipst1_1_112667 or Lipst1_1_70538 or Lipst1_1_42388 or Lipst1_1_65274 or Lipst1_1_6808 or Lipst1_1_38038 or Lipst1_1_72848 or Lipst1_1_3376 or Lipst1_1_47735'

In [293]:
genes = model.genes
genes = [g.id for g in genes]
wrong = [g for g in genes if 'Lipst1' not in g]
len(set(wrong))
wrong_g2 = []
wrong_rx2 = []
for g in wrong:
    reac = model.genes.get_by_id(g).reactions
    if not reac:
        print(f'{g} has no associated reaction')
        
        cobra.manipulation.delete.remove_genes(model,[g])
    for rx in reac:
        if rx:
#             print(rx,g)
            wrong_g2.append(g)
            wrong_rx2.append(rx)
print(len(set(wrong_g2)))
print(len(set(wrong_rx2)))


15231 has no associated reaction
15681 has no associated reaction
15758 has no associated reaction
8802 has no associated reaction
9198 has no associated reaction
9705 has no associated reaction
9830 has no associated reaction
13608 has no associated reaction
11618 has no associated reaction
COB has no associated reaction
12966 has no associated reaction
13614 has no associated reaction
9014 has no associated reaction
9394 has no associated reaction
10303 has no associated reaction
9425 has no associated reaction
11183 has no associated reaction
11188 has no associated reaction
15436 has no associated reaction
12566 has no associated reaction
10040 has no associated reaction
14213 has no associated reaction
9
10


In [294]:
wrong_rx2 = [x.id for x in wrong_rx2]
set(wrong_rx2)

{'ACRNtp',
 'C30CPT1',
 'CRNtp',
 'DNTPPA',
 'DRTPPD',
 'FA160COAabcp',
 'FA183COAabcp',
 'FA240tp',
 'GAPP',
 'PMDPHT'}

In [295]:
# lst_rto_dict['9705']

In [296]:
# set(wrong_rx2)-set(maintained_v01_gpr)

In [297]:
temp = set(wrong_rx2)
for x in temp:
    r = model.reactions.get_by_id(x)
    print(r.id)
    print(r.name)
    print(r.gene_reaction_rule)
    print('\n')

DNTPPA
Dihydroneopterin triphosphate pyrophosphatase
15385


ACRNtp
Acetylcarnitine transport out of peroxisome
10149


FA240tp
Fatty acid peroxisomal transport
9912


PMDPHT
Pyrimidine phosphatase
11513


C30CPT1
Production of propionylcarnitine
9315


GAPP
Glyceraldehyde-3-phosphate phosphatase
8460


FA160COAabcp
Fatty acyl CoA peroxisomal transport via ABC system
13167 and 9637


DRTPPD
2 5 diamino 6 ribitylamino 4 3H  pyrimidinone 5  phosphate deaminase
15506


CRNtp
Carnitine transport into peroxsiome
10149


FA183COAabcp
Fatty acid peroxisomal transport via ABC system
13167 and 9637




In [298]:
ylip_added_orthologs_orhpans

Unnamed: 0,0,1,2,3,4
0,PSP_L,phosphoserine phosphatase L serine,h2o_c + pser__L_c --> pi_c + ser__L_c,YALI0B20438g,Lipst1_1_1740
1,FAS160ACPm,fatty acyl ACP synthase n C160ACP mitochondrial,3.0 h_m + malACP_m + myrsACP_m + 2.0 nadph_m -...,YALI0D14850g and YALI0D24629g and YALI0F30679g...,Lipst1_1_3273 or Lipst1_1_2955 or Lipst1_1_106903
2,ATPS3g,ATP synthase Golgi Apparatus,adp_g + 3.0 h_c + pi_g --> atp_g + h2o_g + 2.0...,YALI0A09900g and YALI0A11143g and YALI0B03982g...,Lipst1_1_149291 or Lipst1_1_69438 or Lipst1_1_...
3,DOLK,Dolichol kinase,ctp_c + dolichol_c --> cdp_c + dolp_c + h_c,YALI0E18942g,Lipst1_1_55005
4,PETHCT,phosphoethanolamine cytidyltransferase,ctp_c + ethamp_c + h_c --> cdpea_c + ppi_c,YALI0C06303g,Lipst1_1_4188
5,FAS161ACPm,fatty acyl ACP synthase n C161ACP mitochondrial,4.0 h_m + malACP_m + myrsACP_m + 3.0 nadph_m +...,YALI0D14850g and YALI0D24629g and YALI0F30679g...,Lipst1_1_3273 or Lipst1_1_2955 or Lipst1_1_106903
6,FAS80ACPm_L,fatty acyl ACP synthase n C80ACP mitochondrial...,acACP_m + 9.0 h_m + 3.0 malACP_m + 6.0 nadph_m...,YALI0D14850g and YALI0D24629g and YALI0F30679g...,Lipst1_1_3273 or Lipst1_1_2955 or Lipst1_1_106903
7,FAS140ACPm,fatty acyl ACP synthase n C140ACP mitochondrial,ddcaACP_m + 3.0 h_m + malACP_m + 2.0 nadph_m -...,YALI0D14850g and YALI0D24629g and YALI0F30679g...,Lipst1_1_3273 or Lipst1_1_2955 or Lipst1_1_106903
8,ACt2r,acetate reversible transport via proton symport,ac_e + h_e <=> ac_c + h_c,YALI0E04312g,Lipst1_1_7130
9,HBZOPT6,Hydroxybenzoate octaprenyltransferase,4hbz_c + octdp_5_c --> 3ophb_5_c + ppi_c,(YALI0E09276g and YALI0B16126g) or (YALI0E1366...,Lipst1_1_302018 or Lipst1_1_114557 or Lipst1_1...


In [299]:
# sce_added_orthologs_orhpans
pd.set_option('display.max_rows', None)
sce_added_orthologs_orhpans

Unnamed: 0,0,1,2,3,4
0,r_2594,"CL (1-18:0, 2-16:1, 3-16:0, 4-18:1) phospholip...",h2o_mm + s_3250 --> h_mm + ocdca_mm + s_3275,YGR110W,Lipst1_1_102653
1,r_2852,"phosphatidylinositol 3-kinase (1-16:0, 2-16:1)...",atp_vm + h_vm + s_3340 --> adp_vm + s_3392,(YBR097W and YLR240W) or (YBR097W and YLR240W ...,Lipst1_1_68143 or Lipst1_1_5996 or Lipst1_1_6287
2,r_2586,"CL (1-16:0, 2-16:1, 3-18:0, 4-16:1) phospholip...",h2o_mm + s_3236 --> h_mm + hdca_mm + s_3273,YGR110W,Lipst1_1_102653
3,r_2719,"MLCL (1-16:0, 2-16:1, 4-16:1):PC (1-18:1, 2-16...",s_3284 + s_3302 <=> 1agpc181_mm + s_3235,YPR140W,Lipst1_1_68925
4,r_2731,"MLCL (1-16:0, 2-16:1, 4-18:1):PC (1-18:1, 2-18...",s_3285 + s_3307 <=> 1agpc181_mm + s_3314,YPR140W,Lipst1_1_68925
5,r_2632,"MLCL (2-16:1, 3-16:1, 4-16:1):PC (1-16:0, 2-18...",s_3272 + s_3304 <=> 1agpc160_mm + s_3253,YPR140W,Lipst1_1_68925
6,r_3266,"TAG lipase (1-16:0, 2-16:1, 3-18:0), lipid par...",h2o_lp + s_3067 --> h_lp + ocdca_lp + s_3043,YBR204C or YKR089C or YMR313C or YOR081C,Lipst1_1_167596 or Lipst1_1_253586 or Lipst1_1...
7,r_2764,"MLCL (1-18:1, 2-16:1, 4-16:1):PC (1-16:0, 2-16...",s_3290 + s_3296 <=> 1agpc160_mm + s_3253,YPR140W,Lipst1_1_68925
8,r_3293,"TAG lipase (1-18:1, 2-18:1, 3-16:1), lipid par...",h2o_lp + s_3066 --> 12dgr181_lp + h_lp + hdcea_lp,YBR204C or YKR089C or YMR313C or YOR081C,Lipst1_1_167596 or Lipst1_1_253586 or Lipst1_1...
9,r_2328,"acyl dhap reductase (16:0), lipid particle",h_lp + nadph_lp + s_2948 --> 1hdecg3p_lp + nad...,YIL124W,Lipst1_1_71724


In [300]:
# cobra.io.save_json_model(model,'Lst_v0.1_model_withOrthoMCLallAdded_noSCEtransporters.json')

## try removing the reactions with only rto genes.

In [301]:
temp

{'ACRNtp',
 'C30CPT1',
 'CRNtp',
 'DNTPPA',
 'DRTPPD',
 'FA160COAabcp',
 'FA183COAabcp',
 'FA240tp',
 'GAPP',
 'PMDPHT'}

In [302]:
temp = set(wrong_rx2)

print(model.optimize().objective_value,'default\n')

for x in temp:
    with model:
        r = model.reactions.get_by_id(x)
        print(r.id)
        print(r.name)
        print(r.gene_reaction_rule)
        model.remove_reactions([r])
        print(model.optimize().objective_value)
        print('\n')


0.07800483549348808 default

DNTPPA
Dihydroneopterin triphosphate pyrophosphatase
15385
0.0


ACRNtp
Acetylcarnitine transport out of peroxisome
10149
0.0


FA240tp
Fatty acid peroxisomal transport
9912
0.07800483549348884


PMDPHT
Pyrimidine phosphatase
11513
0.0


C30CPT1
Production of propionylcarnitine
9315
0.0


GAPP
Glyceraldehyde-3-phosphate phosphatase
8460
0.07800318735919652


FA160COAabcp
Fatty acyl CoA peroxisomal transport via ABC system
13167 and 9637
0.07679940446092794


DRTPPD
2 5 diamino 6 ribitylamino 4 3H  pyrimidinone 5  phosphate deaminase
15506
0.0


CRNtp
Carnitine transport into peroxsiome
10149
0.0


FA183COAabcp
Fatty acid peroxisomal transport via ABC system
13167 and 9637
0.07800483549348679




Fatty acid transporters can be removed. however will set the gene_reaction_rule to none.

In [303]:
model.reactions.FA183COAabcp.gene_reaction_rule = ''
model.reactions.FA240tp.gene_reaction_rule = ''
model.reactions.FA160COAabcp.gene_reaction_rule = ''


model.reactions.ACRNtp.gene_reaction_rule = ''
model.reactions.CRNtp.gene_reaction_rule = ''
# model.reactions.FA160COAabcp.gene_reaction_rule = ''


In [304]:
genes = model.genes
genes = [g.id for g in genes]
wrong = [g for g in genes if 'Lipst1' not in g]
len(set(wrong))
wrong_g2 = []
wrong_rx2 = []
for g in wrong:
    reac = model.genes.get_by_id(g).reactions
    if not reac:
        print(f'{g} has no associated reaction')
        
        cobra.manipulation.delete.remove_genes(model,[g])
    for rx in reac:
        if rx:
#             print(rx,g)
            wrong_g2.append(g)
            wrong_rx2.append(rx)
print(len(set(wrong_g2)))
print(len(set(wrong_rx2)))


9912 has no associated reaction
13167 has no associated reaction
9637 has no associated reaction
10149 has no associated reaction
5
5


In [305]:
wrong_rx2 = [x.id for x in wrong_rx2]
set(wrong_rx2)

{'C30CPT1', 'DNTPPA', 'DRTPPD', 'GAPP', 'PMDPHT'}

In [306]:
for rx in wrong_rx2:
    r = model.reactions.get_by_id(rx)
    print(r,r.id,r.name)
    for metabolites in r.metabolites:
        try:
            ylip.metabolites.get_by_id(metabolites.id)
        except:
            print(f'{metabolites} not in ylip')

DRTPPD: 25dthpp_c + h2o_c + h_c --> 5aprbu_c + nh4_c DRTPPD 2 5 diamino 6 ribitylamino 4 3H  pyrimidinone 5  phosphate deaminase
25dthpp_c not in ylip
DNTPPA: ahdt_c + h2o_c --> dhpmp_c + h_c + ppi_c DNTPPA Dihydroneopterin triphosphate pyrophosphatase
PMDPHT: 5aprbu_c + h2o_c --> 4r5au_c + pi_c PMDPHT Pyrimidine phosphatase
C30CPT1: crn_c + ppcoa_c <=> coa_c + pcrn_c C30CPT1 Production of propionylcarnitine
pcrn_c not in ylip
ppcoa_c not in ylip
GAPP: g3p_c + h2o_c --> glyald_c + pi_c GAPP Glyceraldehyde-3-phosphate phosphatase


In [326]:
with model:
    model.remove_reactions(['GAPP'])
    gapfiller = cobra.flux_analysis.gapfilling.GapFiller(model, sce, demand_reactions=False, integer_threshold=1e-10)
    gapfiller.model.solver.configuration.tolerances.feasibility = 1e-9
    gapfiller.model.solver.configuration.tolerances.integrality = 1e-10
#     gapfiller.model.solver.configuration.tolerances.optimality = 1e-9
    solution = gapfiller.fill()

for reaction in solution[0]:
    print(reaction.id)

In [327]:
with model:
    model.remove_reactions(['GAPP'])
    gapfiller = cobra.flux_analysis.gapfilling.GapFiller(model, ylip, demand_reactions=False, integer_threshold=1e-10)
    gapfiller.model.solver.configuration.tolerances.feasibility = 1e-9
    gapfiller.model.solver.configuration.tolerances.integrality = 1e-10
#     gapfiller.model.solver.configuration.tolerances.optimality = 1e-9
    solution = gapfiller.fill()

for reaction in solution[0]:
    print(reaction.id)

In [307]:
sce.reactions.G3PT

0,1
Reaction identifier,G3PT
Name,glycerol-3-phosphatase
Memory address,0x16a4f7dd0
Stoichiometry,glyc3p_c + h2o_c --> glyc_c + pi_c  glycerol 3-phosphate + H2O --> glycerol + phosphate
GPR,YER062C or YIL053W
Lower bound,0.0
Upper bound,1000.0


In [308]:
# neither sce gene is available. 
# lst_sce_dict['YER062C']
# lst_sce_dict['YIL053W']

In [309]:
ylip.reactions.GLYC3Ptm

0,1
Reaction identifier,GLYC3Ptm
Name,glycerol 3 phosphate shuttle
Memory address,0x15bb3d1d0
Stoichiometry,glyc3p_c --> glyc3p_m  Glycerol_3_phosphate --> Glycerol_3_phosphate
GPR,
Lower bound,0.0
Upper bound,1000.0


In [310]:
# JGI shows the following gene as a glycerol-1-phosphatase.
model.genes.Lipst1_1_67195

0,1
Gene identifier,Lipst1_1_67195
Name,
Memory address,0x15b7a3490
Functional,True
In 9 reaction(s),"F1PP, GNP, G6PP, R5PP, G1PP, E4PP, F6PP, 2DOXG6PP, RU5PP"


In [311]:
model.reactions.G6PP

0,1
Reaction identifier,G6PP
Name,Glucose-6-phosphate phosphatase
Memory address,0x2a150b4d0
Stoichiometry,g6p_c + h2o_c --> glc__D_c + pi_c  D-Glucose 6-phosphate + H2O H2O --> alpha-D-Glucose + Phosphate
GPR,Lipst1_1_67195
Lower bound,0.0
Upper bound,1000.0


In [338]:
for rx in wrong_rx2:
    print(rx,'sce')
    with model:
        model.remove_reactions([rx])
        gapfiller = cobra.flux_analysis.gapfilling.GapFiller(model, sce, demand_reactions=False, integer_threshold=1e-10)
        gapfiller.model.solver.configuration.tolerances.feasibility = 1e-9
        gapfiller.model.solver.configuration.tolerances.integrality = 1e-10
    #     gapfiller.model.solver.configuration.tolerances.optimality = 1e-9
        solution = gapfiller.fill()

    for reaction in solution[0]:
        print(reaction.id)

    print(rx,'ylip')
    
    if rx!='C30CPT1': # reaction fails in ylip. 
        with model:
            model.remove_reactions([rx])
            gapfiller = cobra.flux_analysis.gapfilling.GapFiller(model, ylip, demand_reactions=False, integer_threshold=1e-10)
            gapfiller.model.solver.configuration.tolerances.feasibility = 1e-9
            gapfiller.model.solver.configuration.tolerances.integrality = 1e-10
        #     gapfiller.model.solver.configuration.tolerances.optimality = 1e-9
            solution = gapfiller.fill()

        for reaction in solution[0]:
            print(reaction.id)

DRTPPD sce
DRTPPD
DRTPPD ylip
APRAUR
DHPPDA
DNTPPA sce
DNTPPA
DNTPPA ylip
AKP1
PMDPHT sce
PMDPHT
PMDPHT ylip
PMDPHT
C30CPT1 sce
r_4570
r_4571
C30CPT1 ylip
GAPP sce
GLYALDDr
GAPP ylip
GLYC3Ptm


In [335]:
with ylip, model:
    ylip.remove_reactions(['PMDPHT'])
    model.remove_reactions(['PMDPHT'])
    print("Ylip growth:", ylip.slim_optimize())
    print("Model growth:", model.slim_optimize())

  warn("need to pass in a list")


Ylip growth: 1.1398166174414033
Model growth: 0.0


In [336]:
# # becomes infeasible. 

#     with ylip,model:
#         ylip.remove_reactions(['PMDPHT'])
#         model.remove_reactions(['PMDPHT'])
        
#         gapfiller = cobra.flux_analysis.gapfilling.GapFiller(model, ylip, demand_reactions=False, integer_threshold=1e-10)
#         gapfiller.model.solver.configuration.tolerances.feasibility = 1e-9
#         gapfiller.model.solver.configuration.tolerances.integrality = 1e-10
#     #     gapfiller.model.solver.configuration.tolerances.optimality = 1e-9
#         solution = gapfiller.fill()

Infeasible: gap filling optimization failed (infeasible).

the sce replacements seem simplier to implement overall. 

In [312]:
model.reactions.DRTPPD

0,1
Reaction identifier,DRTPPD
Name,2 5 diamino 6 ribitylamino 4 3H pyrimidinone 5 phosphate deaminase
Memory address,0x2a1b92f90
Stoichiometry,25dthpp_c + h2o_c + h_c --> 5aprbu_c + nh4_c  2 5 diamino 6 ribitylamino 4 3H pyrimidinone 5 phosphate C9H16N5O8P + H2O H2O + H+ --> 5-Amino-6-(5'-phosphoribitylamino)uracil + Ammonium
GPR,15506
Lower bound,0.0
Upper bound,1000.0


In [313]:
ylip.reactions.APRAUR

0,1
Reaction identifier,APRAUR
Name,5 amino 6 5 phosphoribosylamino uracil reductase
Memory address,0x17fefe090
Stoichiometry,5apru_c + h_c + nadph_c --> 5aprbu_c + nadp_c  5_Amino_6__5__phosphoribosylamino_uracil + H + Nicotinamide_adenine_dinucleotide_phosphate___reduced --> 5_Amino_6__5__phosphoribitylamino_uracil + Nicotinamide_adenine_dinucleotide_phosphate
GPR,YALI0D06501g
Lower bound,0.0
Upper bound,1000.0


In [314]:
lst_ylip_dict['YALI0D06501g']

'Lipst1_1_74905'

In [315]:
model.genes.Lipst1_1_74905

0,1
Gene identifier,Lipst1_1_74905
Name,
Memory address,0x15b78ed10
Functional,True
In 1 reaction(s),DROPPRx


In [316]:
model.reactions.DROPPRx

0,1
Reaction identifier,DROPPRx
Name,2 5 diamino 6 ribosylamino 4 3H pyrimidinone 5 phosphate reductase nadh
Memory address,0x15b9efe90
Stoichiometry,25dhpp_c + h_c + nadh_c --> 25dthpp_c + nad_c  2 5 Diamino 6 hydroxy 4 5 phosphoribosylamino pyrimidine C9H14N5O8P + H+ + Nicotinamide adenine dinucleotide - reduced --> 2 5 diamino 6 ribitylamino 4 3H pyrimidinone 5 phosphate C9H16N5O8P +...
GPR,Lipst1_1_74905
Lower bound,0.0
Upper bound,1000.0


In [317]:
ylip.reactions.DHPPDA

0,1
Reaction identifier,DHPPDA
Name,diaminohydroxyphosphoribosylaminopyrimidine deaminase
Memory address,0x16a7ad410
Stoichiometry,25dhpp_c + h2o_c + h_c --> 5apru_c + nh4_c  2_5_Diamino_6_hydroxy_4__5__phosphoribosylamino__pyrimidine + H2O + H --> 5_Amino_6__5__phosphoribosylamino_uracil + Ammonium
GPR,
Lower bound,0.0
Upper bound,1000.0


In [318]:
wrong_rx2

['DRTPPD', 'DNTPPA', 'PMDPHT', 'C30CPT1', 'GAPP']

In [319]:
sce.reactions.AKP1

0,1
Reaction identifier,AKP1
Name,alkaline phosphatase (dihydroneopterin)
Memory address,0x15b095a90
Stoichiometry,"ahdt_c + 3.0 h2o_c --> dhnpt_c + 2.0 h_c + 3.0 pi_c  7,8-dihydroneopterin 3'-triphosphate + 3.0 H2O --> 7,8-dihydroneopterin + 2.0 H+ + 3.0 phosphate"
GPR,YDR481C
Lower bound,0.0
Upper bound,1000.0


In [320]:
lst_sce_dict['YDR481C']

'Lipst1_1_2594'

In [321]:
model.genes.Lipst1_1_2594

0,1
Gene identifier,Lipst1_1_2594
Name,
Memory address,0x15b787910
Functional,True
In 2 reaction(s),"NMNHYD, ALKP"


In [322]:
model.reactions.ALKP

0,1
Reaction identifier,ALKP
Name,Alkaline phosphatase
Memory address,0x17ff19a50
Stoichiometry,dhap_c + h2o_c --> dha_c + pi_c  Dihydroxyacetone phosphate + H2O H2O --> Dihydroxyacetone + Phosphate
GPR,Lipst1_1_2594
Lower bound,0.0
Upper bound,1000.0


In [323]:
model.remove_reactions(['DRTPPD'])
r = sce.reactions.AKP1.copy()
r.gene_reaction_rule = 'Lipst1_1_2594'
model.add_reactions([r])
model.reactions.get_by_id(r.id).gene_reaction_rule

'Lipst1_1_2594'

In [324]:
model.remove_reactions(['DNTPPA'])


In [325]:
wrong_rx2

['DRTPPD', 'DNTPPA', 'PMDPHT', 'C30CPT1', 'GAPP']

PMDPHT seems to be an essential reaction with no associated gene.

In [326]:
sce.reactions.get_by_id('PMDPHT')

0,1
Reaction identifier,PMDPHT
Name,pyrimidine phosphatase
Memory address,0x16a641310
Stoichiometry,5aprbu_c + h2o_c --> 4r5au_c + pi_c  5-amino-6-(5-phosphoribitylamino)uracil + H2O --> 5-amino-6-(D-ribitylamino)uracil + phosphate
GPR,
Lower bound,0.0
Upper bound,1000.0


In [327]:
ylip.reactions.get_by_id('PMDPHT')

0,1
Reaction identifier,PMDPHT
Name,pyrimidine phosphatase
Memory address,0x15bba2410
Stoichiometry,5aprbu_c + h2o_c --> 4r5au_c + pi_c  5_Amino_6__5__phosphoribitylamino_uracil + H2O --> 4__1_D_Ribitylamino__5_aminouracil + Phosphate
GPR,
Lower bound,0.0
Upper bound,1000.0


In [328]:
model.reactions.get_by_id('PMDPHT')

0,1
Reaction identifier,PMDPHT
Name,Pyrimidine phosphatase
Memory address,0x2a1bc2350
Stoichiometry,5aprbu_c + h2o_c --> 4r5au_c + pi_c  5-Amino-6-(5'-phosphoribitylamino)uracil + H2O H2O --> 4-(1-D-Ribitylamino)-5-aminouracil + Phosphate
GPR,11513
Lower bound,0.0
Upper bound,1000.0


In [329]:
model.reactions.get_by_id('PMDPHT').gene_reaction_rule = ''

In [330]:
wrong_rx2

['DRTPPD', 'DNTPPA', 'PMDPHT', 'C30CPT1', 'GAPP']

C30CPT1 - r_4570
r_4571

In [331]:
model.reactions.get_by_id('C30CPT1')

0,1
Reaction identifier,C30CPT1
Name,Production of propionylcarnitine
Memory address,0x2a1bc32d0
Stoichiometry,crn_c + ppcoa_c <=> coa_c + pcrn_c  L-Carnitine + Propanoyl-CoA <=> Coenzyme A + Propionyl-carnitine
GPR,9315
Lower bound,-1000.0
Upper bound,1000.0


In [332]:
sce.reactions.r_4570

0,1
Reaction identifier,r_4570
Name,phosphoglycerate dehydrogenase
Memory address,0x16aa73650
Stoichiometry,2hog_c + nad_c <=> akg_c + h_c + nadh_c  2-hydroxyglutarate + NAD <=> 2-oxoglutarate + H+ + NADH
GPR,YIL074C or YER081W
Lower bound,-1000.0
Upper bound,1000.0


In [333]:
sce.reactions.r_4571

0,1
Reaction identifier,r_4571
Name,seed:rxn00681
Memory address,0x16a71bc50
Stoichiometry,2hog_c + coa_c + h_c <=> glx_c + h2o_c + ppcoa_c  2-hydroxyglutarate + coenzyme A + H+ <=> glyoxylate + H2O + propionyl-CoA
GPR,
Lower bound,-1000.0
Upper bound,1000.0


gap filled sce solution has no gpr for second reaction. remove gpr for first reaction. 

In [334]:
model.reactions.get_by_id('C30CPT1').gene_reaction_rule = ''

In [335]:
sce.reactions.get_by_id('GLYALDDr')

0,1
Reaction identifier,GLYALDDr
Name,D-Glyceraldehyde:NAD+ oxidoreductase
Memory address,0x16aa1b990
Stoichiometry,glyald_c + h2o_c + nad_c <=> glyc__R_c + 2.0 h_c + nadh_c  D-glyceraldehyde + H2O + NAD <=> D-Glycerate + 2.0 H+ + NADH
GPR,YMR110C
Lower bound,-1000.0
Upper bound,1000.0


In [358]:
# not present in the ortholog dict. 
# lst_sce_dict['YMR110C']

KeyError: 'YMR110C'

In [359]:
        rx = model.reactions.get_by_id('GAPP')
        with model:
            model.remove_reactions([rx])
            gapfiller = cobra.flux_analysis.gapfilling.GapFiller(model, ylip, demand_reactions=False, integer_threshold=1e-10)
            gapfiller.model.solver.configuration.tolerances.feasibility = 1e-9
            gapfiller.model.solver.configuration.tolerances.integrality = 1e-10
        #     gapfiller.model.solver.configuration.tolerances.optimality = 1e-9
            solution = gapfiller.fill()

        for reaction in solution[0]:
            print(reaction.id)

APRAUR
DHPPDA


In [336]:
ylip.reactions.get_by_id('GLYC3Ptm')

0,1
Reaction identifier,GLYC3Ptm
Name,glycerol 3 phosphate shuttle
Memory address,0x15bb3d1d0
Stoichiometry,glyc3p_c --> glyc3p_m  Glycerol_3_phosphate --> Glycerol_3_phosphate
GPR,
Lower bound,0.0
Upper bound,1000.0


In [337]:
ylip.reactions.get_by_id('DHPPDA')

0,1
Reaction identifier,DHPPDA
Name,diaminohydroxyphosphoribosylaminopyrimidine deaminase
Memory address,0x16a7ad410
Stoichiometry,25dhpp_c + h2o_c + h_c --> 5apru_c + nh4_c  2_5_Diamino_6_hydroxy_4__5__phosphoribosylamino__pyrimidine + H2O + H --> 5_Amino_6__5__phosphoribosylamino_uracil + Ammonium
GPR,
Lower bound,0.0
Upper bound,1000.0


In [338]:
model.metabolites.get_by_id('5aprbu_c')

0,1
Metabolite identifier,5aprbu_c
Name,5-Amino-6-(5'-phosphoribitylamino)uracil
Memory address,0x15b7227d0
Formula,C9H15N4O9P
Compartment,c
In 1 reaction(s),PMDPHT


In [339]:
ylip.reactions.get_by_id('APRAUR')

0,1
Reaction identifier,APRAUR
Name,5 amino 6 5 phosphoribosylamino uracil reductase
Memory address,0x17fefe090
Stoichiometry,5apru_c + h_c + nadph_c --> 5aprbu_c + nadp_c  5_Amino_6__5__phosphoribosylamino_uracil + H + Nicotinamide_adenine_dinucleotide_phosphate___reduced --> 5_Amino_6__5__phosphoribitylamino_uracil + Nicotinamide_adenine_dinucleotide_phosphate
GPR,YALI0D06501g
Lower bound,0.0
Upper bound,1000.0


In [340]:
lst_ylip_dict['YALI0D06501g']

'Lipst1_1_74905'

In [341]:
model.genes.Lipst1_1_74905

0,1
Gene identifier,Lipst1_1_74905
Name,
Memory address,0x15b78ed10
Functional,True
In 1 reaction(s),DROPPRx


In [342]:
model.reactions.DROPPRx

0,1
Reaction identifier,DROPPRx
Name,2 5 diamino 6 ribosylamino 4 3H pyrimidinone 5 phosphate reductase nadh
Memory address,0x15b9efe90
Stoichiometry,25dhpp_c + h_c + nadh_c --> 25dthpp_c + nad_c  2 5 Diamino 6 hydroxy 4 5 phosphoribosylamino pyrimidine C9H14N5O8P + H+ + Nicotinamide adenine dinucleotide - reduced --> 2 5 diamino 6 ribitylamino 4 3H pyrimidinone 5 phosphate C9H16N5O8P +...
GPR,Lipst1_1_74905
Lower bound,0.0
Upper bound,1000.0


In [343]:
r = model.reactions.get_by_id('GAPP')
model.remove_reactions([r])

In [344]:
y = ylip.reactions.get_by_id('GLYC3Ptm')

In [345]:
model.add_reactions([y])

In [346]:
y = ylip.reactions.get_by_id('DHPPDA')
model.add_reactions([y])

In [347]:
y = ylip.reactions.get_by_id('APRAUR').copy()
y.gene_reaction_rule = 'Lipst1_1_74905'
model.add_reactions([y])

In [348]:
model.optimize()

Unnamed: 0,fluxes,reduced_costs
ALCD25yi,0.000000,0.000000e+00
MTHFCm,0.000000,-0.000000e+00
AMPN,0.000000,-4.022961e-03
DAGCPTer_RT,0.000000,0.000000e+00
PYRt2,0.000000,0.000000e+00
...,...,...
FA183COAabcp,0.000000,0.000000e+00
AKP1,0.000304,0.000000e+00
GLYC3Ptm,0.000140,0.000000e+00
DHPPDA,0.000136,0.000000e+00


In [349]:
genes = model.genes
genes = [g.id for g in genes]
wrong = [g for g in genes if 'Lipst1' not in g]
len(set(wrong))
wrong_g2 = []
wrong_rx2 = []
for g in wrong:
    reac = model.genes.get_by_id(g).reactions
    if not reac:
        print(f'{g} has no associated reaction')
        
        cobra.manipulation.delete.remove_genes(model,[g])
    for rx in reac:
        if rx:
#             print(rx,g)
            wrong_g2.append(g)
            wrong_rx2.append(rx)
print(len(set(wrong_g2)))
print(len(set(wrong_rx2)))
wrong_rx2 = [x.id for x in wrong_rx2]
set(wrong_rx2)

15506 has no associated reaction
15385 has no associated reaction
11513 has no associated reaction
9315 has no associated reaction
8460 has no associated reaction
0
0


set()

In [350]:
model.summary()

Metabolite,Reaction,Flux,C-Number,C-Flux
ca2_e,EX_ca2_e,6.49e-05,0,0.00%
cu2_e,EX_cu2_e,4.095e-05,0,0.00%
fe3_e,EX_fe3_e,0.0004812,0,0.00%
glc__D_e,EX_glc__D_e,1.0,6,98.56%
k_e,EX_k_e,0.04569,0,0.00%
mg2_e,EX_mg2_e,0.004814,0,0.00%
mn2_e,EX_mn2_e,4.735e-05,0,0.00%
na1_e,EX_na1_e,0.002036,0,0.00%
nh4_e,EX_nh4_e,0.4596,0,0.00%
o2_e,EX_o2_e,2.356,0,0.00%

Metabolite,Reaction,Flux,C-Number,C-Flux
4oglu_c,DM_4oglu_c,-0.0001087,5,0.02%
amob_m,DM_amob_m,-0.0001886,15,0.10%
dad_5_m,DM_dad_5_m,-0.0001886,10,0.06%
co2_e,EX_co2_e,-2.619,1,89.99%
fe2_e,EX_fe2_e,-0.0003342,0,0.00%
h2o_e,EX_h2o_e,-4.275,0,0.00%
h_e,EX_h_e,-0.1232,0,0.00%
hco3_e,EX_hco3_e,-0.001955,1,0.07%
zymst_e,EX_zymst_e,-0.01053,27,9.76%


In [351]:
model.optimize()

Unnamed: 0,fluxes,reduced_costs
ALCD25yi,0.000000,0.000000e+00
MTHFCm,0.000000,-0.000000e+00
AMPN,0.000000,-4.022961e-03
DAGCPTer_RT,0.000000,0.000000e+00
PYRt2,0.000000,0.000000e+00
...,...,...
FA183COAabcp,0.000000,0.000000e+00
AKP1,0.000304,0.000000e+00
GLYC3Ptm,0.000140,0.000000e+00
DHPPDA,0.000136,0.000000e+00


## adjust model subsystems.

In [352]:
x = []
for r in model.reactions:
    if (r.subsystem == 'Cysteine and methionine Metabolism') | (r.subsystem == 'Cysteine and methionine metabolism'):
        r.subsystem = 'Cysteine and Methionine Metabolism'
    if (r.subsystem == 'Glycine, serine and threonine metabolism'):
        r.subsystem = 'Glycine, Serine, and Threonine Metabolism'
    if (r.subsystem == 'Histidine metabolism'):
        r.subsystem = 'Histidine Metabolism'
    if (r.subsystem == 'Purine metabolism'):
        r.subsystem = 'Purine Metabolism'
    if (r.subsystem == 'Pyruvate metabolism'):
        r.subsystem = 'Pyruvate Metabolism'
        
    if (r.subsystem == 'Transport, nucleus'):
        r.subsystem = 'Transport, Nuclear'
    if (r.subsystem == 'Transport, nuclear'):
        r.subsystem = 'Transport, Nuclear'
    if (r.subsystem == 'Transport, Glyoxysome'):
        r.subsystem = 'Transport, Peroxisomal'
        print(r)
    if (r.subsystem == 'Transport, Mitochondriall'):
        r.subsystem = 'Transport, Mitochondrial'
        print(r)
    if (r.subsystem == 'Glycine, Serine, Alanine, and threonine Metabolism'):
        r.subsystem = 'Glycine, Serine, Alanine, and Threonine Metabolism'
#     if (r.subsystem == 'Glyoxylate Metabolism'):
#         print(r)
#     if (r.subsystem == 'Glyoxylate and Dicarboxylate Metabolism'):
#         r.subsystem == 'Glyoxylate Metabolism'
        
#         r.subsystem = 'Histidine Metabolism'
    x.append(r.subsystem)
    
# 'Methionine Metabolism',
#  'Methionine and cysteine Metabolism',

THRtm: thr__L_c <=> thr__L_m
GLYCLTtm: glyclt_c <=> glyclt_m
GLXtm: glx_c <=> glx_m
METtm: met__L_c <=> met__L_m
CYStm: cys__L_c <=> cys__L_m
ACACt2m: acac_c + h_c <=> acac_m + h_m
BHBtm: bhb_c + h_c <=> bhb_m + h_m
GHBtm: ghb_c + h_c <=> ghb_m + h_m
4ABUTtm: 4abut_c <=> 4abut_m
AGMt_m: agm_c <=> agm_m
NADtm: nad_c --> nad_m


In [353]:
set(x)

{'',
 'Alanine and Aspartate Metabolism',
 'Alternate Carbon Metabolism',
 'Amino sugar and nucleotide sugar Metabolism',
 'Aminosugar Metabolism',
 'Anaplerotic Reactions',
 'Arabinose Metabolism',
 'Arginine and Proline Metabolism',
 'Ascorbate Metabolism',
 'B-Ketoadipate Pathway',
 'Bile Acid Biosynthesis',
 'Biomass and maintenance functions',
 'Biosynthesis of steroids: Terpenoid backbone synthesis',
 'Biosynthesis of steroids: Ubiquinone/Terpenoid-quinone Biosynthesis',
 'Biosynthesis of unsaturated fatty acids',
 'Biotin Metabolism',
 'Butanoate Metabolism',
 'Calvin cycle/Pentose Phosphate Pathway',
 'Carbohydrates and related molecules',
 'Carbon fixation;Glutamate Metabolism;Alanine and Aspartate Metabolism',
 'Carnitine shuttle',
 'Cell Envelope Biosynthesis',
 'Cholesterol Metabolism',
 'Citric Acid Cycle',
 'Cofactor Biosynthesis: Carnitine Biosynthesis',
 'Cofactor and Prosthetic Group Biosynthesis',
 'Cofactor recycling',
 'Complex Alcohol Metabolism',
 'Cysteine Metabo

## lets clean the model to remove orphan metabolites. 

In [354]:
# Find orphan metabolites
orphan_metabolites = [m for m in model.metabolites if len(m.reactions) == 0]

# Remove orphan metabolites from the model
model.remove_metabolites(orphan_metabolites)

In [355]:
orphan_metabolites

[]

In [357]:
cobra.io.save_json_model(model,'../models/Lst_v0.1_model_withOrthoMCLallAdded_forPUB.json')
# cobra.io.write_sbml_model(model, "Lst_v0.1_model_verify.xml")
cobra.io.save_json_model(model,'../modelsLst_v0.2_model_forPUB_copy.json')

## finally, we will not add transporters here. 


we can separate the above reactions that need to be added into a few categories:

transport (vacuole to cytoplasm) 5 rxns:
- r_4162: his__L_v <=> his__L_c
- r_1220: tyr__L_v <=> tyr__L_c
- r_4163: lys__L_v <=> lys__L_c
- r_1208: leu__L_v <=> leu__L_c
- r_1202: ile__L_v <=> ile__L_c

transport (mitochondria to cytoplasm) 3 rxns:
- r_1919: lys__L_c --> lys__L_m
- ARGt2m: arg__L_c + h_c --> arg__L_m + h_m
- r_1837: his__L_c --> his__L_m

transport (extracellular to cytoplasm) 8 rxns:
- r_4587: ca2_e + h_e <=> ca2_c + h_c
- r2073_1: h_e + zn2_e <=> h_c + zn2_c
- MNt2: h_e + mn2_e <=> h_c + mn2_c
- Clt: cl_e <=> cl_c
- CHOLtu: chol_c <=> chol_e
- r_4589: cu2_e + h_e <=> cu2_c + h_c
- r_4592: h_e + mg2_e <=> h_c + mg2_c
