# Supplemental Notebook A: Gene Annotation Correction in GSM iYLI647
This Jupyter notebook is dedicated to correcting minor errors in gene annotations within the iYLI647 Genome-Scale Model (GSM) of Yarrowia lipolytica, ensuring that the gene mapping aligns accurately with transcriptomic data.

Objective
The primary goal is to refine the GSM by:
- Identifying and correcting duplicate gene entries in the GSM.
- Removing genes not present in the transcriptomics dataset.
- Fixing typographical errors in gene IDs.
- Analyzing and adjusting the biomass composition in the model for consistency.

In [1]:
import cobra

### Load genome scale model

In [2]:
model = cobra.io.json.load_json_model("../genome_scale_models/iYLI647_corr.json")
model

0,1
Name,model
Memory address,14a397790
Number of metabolites,1121
Number of reactions,1348
Number of genes,646
Number of groups,0
Objective expression,1.0*biomass_C - 1.0*biomass_C_reverse_c1d5c
Compartments,"c, e, m, n, x, r, g, v"


### Add malate synthase reaction to the cytosol

In [3]:
mals = cobra.Reaction('MALS')
mals.name = 'Malate synthase'
mals.subsystem = 'Glyoxylate metabolism'
mals.lower_bound = 0  # This is the default in COBRApy for irreversible reactions
mals.upper_bound = 1000  # This is a common upper bound for reactions in COBRApy

# Add metabolites to the reaction with their stoichiometry
mals.add_metabolites({
    model.metabolites.get_by_id('accoa[c]'): -1.0,  # Reactant, so negative stoichiometry
    model.metabolites.get_by_id('glx[c]'): -1.0,
    model.metabolites.get_by_id('h2o[c]'): -1.0,
    model.metabolites.get_by_id('coa[c]'): 1.0,  # Product, so positive stoichiometry
    model.metabolites.get_by_id('h[c]'): 1.0,
    model.metabolites.get_by_id('mal_L[c]'): 1.0
})

# Add the reaction to the model
model.add_reactions([mals])

model.reactions.get_by_id('MALS')

0,1
Reaction identifier,MALS
Name,Malate synthase
Memory address,0x1767ca350
Stoichiometry,accoa[c] + glx[c] + h2o[c] --> coa[c] + h[c] + mal_L[c]  Acetyl_CoA + Glyoxylate + H2O --> Coenzyme_A + H + L_Malate
GPR,
Lower bound,0
Upper bound,1000


### Allow for reversible flux through the carnitine shuttle

In [4]:
acetylcarnithine_carrier_reaction = model.reactions.get_by_id('CRNCARtm')
old_bounds = acetylcarnithine_carrier_reaction.bounds

# set new bounds
acetylcarnithine_carrier_reaction.bounds = (-1000, 1000)

print(f'Old CRNCARtm bounds: {old_bounds}')
display(acetylcarnithine_carrier_reaction)


Old CRNCARtm bounds: (0.0, 1000.0)


0,1
Reaction identifier,CRNCARtm
Name,carnithine acetylcarnithine carrier mitochondrial
Memory address,0x176ac0730
Stoichiometry,acrn[c] + crn[m] <=> acrn[m] + crn[c]  O_Acetylcarnitine + L_Carnitine <=> O_Acetylcarnitine + L_Carnitine
GPR,YALI0C02431g
Lower bound,-1000
Upper bound,1000


In [5]:
cytosolic_acetyltransferase = model.reactions.get_by_id('CSNAT')
old_bounds = cytosolic_acetyltransferase.bounds

# set new bounds
cytosolic_acetyltransferase.bounds = (-1000, 1000)

print(f'Old CSNAT bounds : {cytosolic_acetyltransferase.bounds}')
display(cytosolic_acetyltransferase)

Old CSNAT bounds : (-1000, 1000)


0,1
Reaction identifier,CSNAT
Name,carnitine O acetyltransferase
Memory address,0x176ac0c70
Stoichiometry,accoa[c] + crn[c] <=> acrn[c] + coa[c]  Acetyl_CoA + L_Carnitine <=> O_Acetylcarnitine + Coenzyme_A
GPR,
Lower bound,-1000
Upper bound,1000


### Add reactions for beta carotene production and transport

The beta carotene production reaction is based on these reactions:
```
Farnesyltranstransferase (FRTT)      = 2 frdp[c] + 2 ipdp[c]   -> 2 ppi[c] + 2 ggdp[c]
Phytoene synthase (PSY)              = 2 ggdp[c]               -> 1 phyto[c] + 2 ppi[c]
4-step phytoene desaturase (PHYDS4S) = 1 phyto[c] + 4.0 nad[c] -> lycop[c] + 4.0 nadh[c]
Lycopene cyclase (LYC)               = 1 lycop[c]              -> 1 caro[c]
```

In [6]:
# create beta carotene metabolite
model.add_metabolites([ 
    cobra.Metabolite('caro[c]'), # Beta carotene
])

# create beta carotene production and exchange reactions
carotene_production = cobra.Reaction('caro_prod') # Beta-carotene production
EX_caro_e = cobra.Reaction('EX_caro_e') # Beta-carotene exchange

# add metabolites to the reactions
model.add_reactions([carotene_production, EX_caro_e])

# Define reaction equations
carotene_production.reaction = '2 frdp[c] + 2 ipdp[c] + 4 nad[c] -> 1 caro[c] + 4 ppi[c] + 4 nadh[c]'
EX_caro_e.reaction = 'caro[c] -> '

carotene_production

0,1
Reaction identifier,caro_prod
Name,
Memory address,0x16c4475e0
Stoichiometry,2.0 frdp[c] + 2.0 ipdp[c] + 4.0 nad[c] --> caro[c] + 4.0 nadh[c] + 4.0 ppi[c]  2.0 Farnesyl_diphosphate + 2.0 Isopentenyl_diphosphate + 4.0 Nicotinamide_adenine_dinucleotide --> + 4.0 Nicotinamide_adenine_dinucleotide___reduced + 4.0 Diphosphate
GPR,
Lower bound,0
Upper bound,1000.0


### Fix reaction that has a duplicate gene

In [7]:
# fix Diphosphoglyceromutase gene_reaction_rule
print(model.reactions.DPGM.gene_reaction_rule)

# remove duplicate genes
model.reactions.DPGM.gene_reaction_rule = 'YALI0B02728g or YALI0D09229g'

display(model.reactions.DPGM)

YALI0B02728g or (YALI0D09229g and YALI0D09229g)


0,1
Reaction identifier,DPGM
Name,Diphosphoglyceromutase
Memory address,0x176b1a560
Stoichiometry,13dpg[c] <=> 23dpg[c] + h[c]  3_Phospho_D_glyceroyl_phosphate <=> 2_3_Disphospho_D_glycerate + H
GPR,YALI0B02728g or YALI0D09229g
Lower bound,-1000.0
Upper bound,1000.0


### Remove genes that are not in the transcriptomics data set

In [8]:
# fix cytochrome c oxidase mitochondrial gene_reaction_rule
print(model.reactions.CYOOm.gene_reaction_rule)

# remove misanotated gene (YALI0F04)
model.reactions.CYOOm.gene_reaction_rule = 'YALI0E10144g and YALI0E12628g and YALI0E16709g and YALI0E19723g and YALI0F03201g and YALI0F03567g and YALI0F04103g'

display(model.reactions.CYOOm)

YALI0E10144g and YALI0E12628g and YALI0E16709g and YALI0E19723g and YALI0F03201g and YALI0F03567g and YALI0F04103g and YALI0F04


0,1
Reaction identifier,CYOOm
Name,cytochrome c oxidase mitochondrial
Memory address,0x176c78d30
Stoichiometry,4.0 focytc[m] + 6.0 h[m] + o2[m] --> 4.0 ficytc[m] + 2.0 h2o[m] + 6.0 h[c]  4.0 Ferrocytochrome_c + 6.0 H + O2 --> 4.0 Ferricytochrome_c + 2.0 H2O + 6.0 H
GPR,YALI0E10144g and YALI0E12628g and YALI0E16709g and YALI0E19723g and YALI0F03201g and YALI0F03567g...
Lower bound,0.0
Upper bound,1000.0


In [9]:
# ubiquinol 6 cytochrome c reductase gene_reaction_rule
print(model.reactions.CYOR_u6m.gene_reaction_rule)

# remove misanotated gene (YALI0F24)
model.reactions.CYOR_u6m.gene_reaction_rule = 'YALI0A02915g and YALI0A17468g and YALI0B01540g and YALI0C12210g and YALI0E34037g and YALI0F01771g and YALI0F08613g'

display(model.reactions.CYOR_u6m)

YALI0A02915g and YALI0A17468g and YALI0B01540g and YALI0C12210g and YALI0E34037g and YALI0F01771g and YALI0F08613g and YALI0F24


0,1
Reaction identifier,CYOR_u6m
Name,ubiquinol 6 cytochrome c reductase
Memory address,0x176c7a680
Stoichiometry,2.0 ficytc[m] + 1.5 h[m] + q6h2[m] --> 2.0 focytc[m] + 1.5 h[c] + q6[m]  2.0 Ferricytochrome_c + 1.5 H + Ubiquinol_6 --> 2.0 Ferrocytochrome_c + 1.5 H + Ubiquinone_6
GPR,YALI0A02915g and YALI0A17468g and YALI0B01540g and YALI0C12210g and YALI0E34037g and YALI0F01771g...
Lower bound,0.0
Upper bound,1000.0


### Fix gene ids that have typos

In [10]:
# fix typo in hexadecanoate transport gene_reaction_rule
print(model.reactions.HCAt.gene_reaction_rule)

# fix typo (YALIB02544g)
model.reactions.HCAt.gene_reaction_rule = 'YALI0C20265g or YALI0B02544g'

display(model.reactions.HCAt)

YALI0C20265g or YALIB02544g


0,1
Reaction identifier,HCAt
Name,Hexanoate n C60 transport in via uniport YL
Memory address,0x17703da20
Stoichiometry,hdca[e] --> hdca[c]  Hexadecanoate__n_C160_ --> Hexadecanoate__n_C160_
GPR,YALI0C20265g or YALI0B02544g
Lower bound,0.0
Upper bound,1000.0


In [11]:
# fix typo in decanoate transport gene_reaction_rule
print(model.reactions.dca_t.gene_reaction_rule)

# fix typo (YALIB02544g)
model.reactions.dca_t.gene_reaction_rule = 'YALI0C20265g or YALI0B02544g'

display(model.reactions.dca_t)

YALI0C20265g or YALIB02544g


0,1
Reaction identifier,dca_t
Name,decanoate transport in via uniport
Memory address,0x17703e2f0
Stoichiometry,dca[e] --> dca[c]  Decanoate__n_C100_ --> Decanoate__n_C100_
GPR,YALI0C20265g or YALI0B02544g
Lower bound,0.0
Upper bound,1000.0


In [12]:
# fix typo in Acetyl-CoA:acetoacetyl-CoA transferase gene_reaction_rule
print(model.reactions.ACACCT.gene_reaction_rule)

# fix typo (YALI00F26587g)
model.reactions.ACACCT.gene_reaction_rule = 'YALI0F26587g'

display(model.reactions.ACACCT)

YALI00F26587g


0,1
Reaction identifier,ACACCT
Name,Acetyl-CoA:acetoacetyl-CoA transferase
Memory address,0x1770998a0
Stoichiometry,acac[c] + accoa[c] --> aacoa[c] + ac[c]  Acetoacetate + Acetyl_CoA --> Acetoacetyl_CoA + Acetate
GPR,YALI0F26587g
Lower bound,0.0
Upper bound,1000.0


### Save the updated model

In [13]:
# save the updated model
cobra.io.save_json_model(model, "../genome_scale_models/iYLI647_corr_2.json")
model

0,1
Name,model
Memory address,14a397790
Number of metabolites,1122
Number of reactions,1351
Number of genes,648
Number of groups,0
Objective expression,1.0*biomass_C - 1.0*biomass_C_reverse_c1d5c
Compartments,"c, e, m, n, x, r, g, v"
