# GECKO

GECKO algorithm can be used to easily adjust the metabolic activity within the
cell to respect many known parameters, measured by proteomics and other
methods.

The original description from GECKO is by: [Sánchez, et. al., "Improving the
phenotype predictions of a yeast genome‐scale metabolic model by incorporating
enzymatic constraints.", Molecular systems biology,
2017](https://doi.org/10.15252/msb.20167411).

The analysis method and implementation in COBREXA is similar to
[sMOMENT](14_smoment.md), but GECKO is able to process and represent much
larger scale of the constraints -- mainly, it supports multiple isozymes for
each reaction, and the isozymes can be grouped into "enzyme mass groups" to
simplify interpretation of data from proteomics.

For demonstration, we will generate artificial random data in a way similar
to the [sMOMENT example](14_smoment.md):

In [1]:
!isfile("e_coli_core.json") &&
    download("http://bigg.ucsd.edu/static/models/e_coli_core.json", "e_coli_core.json")

using COBREXA, GLPK

model = load_model("e_coli_core.json")

import Random
Random.seed!(1) # repeatability

gene_product_masses = Dict(genes(model) .=> randn(n_genes(model)) .* 10 .+ 60)

rxns = filter(
    x ->
        !looks_like_biomass_reaction(x) &&
            !looks_like_exchange_reaction(x) &&
            !isnothing(reaction_gene_association(model, x)),
    reactions(model),
)

69-element Vector{String}:
 "PFK"
 "PFL"
 "PGI"
 "PGK"
 "PGL"
 "ACALD"
 "AKGt2r"
 "PGM"
 "PIt2r"
 "ALCD2x"
 ⋮
 "MALt2_2"
 "MDH"
 "ME1"
 "ME2"
 "NADH16"
 "NADTRHD"
 "NH4t"
 "O2t"
 "PDH"

The main difference from sMOMENT comes from allowing multiple isozymes per
reaction (reactions with missing isozyme informations will be ignored,
leaving them as-is):

In [2]:
rxn_isozymes = Dict(
    rxn => [
        Isozyme(
            Dict(isozyme_genes .=> 1),
            randn() * 100 + 600, #forward kcat
            randn() * 100 + 500, #reverse kcat
        ) for isozyme_genes in reaction_gene_association(model, rxn)
    ] for rxn in rxns
)

Dict{String, Vector{Isozyme}} with 69 entries:
  "ACALD"   => [Isozyme(Dict("b0351"=>1), 462.073, 330.922), Isozyme(Dict("b124…
  "PTAr"    => [Isozyme(Dict("b2297"=>1), 471.533, 631.132), Isozyme(Dict("b245…
  "ALCD2x"  => [Isozyme(Dict("b0356"=>1), 617.367, 537.297), Isozyme(Dict("b147…
  "PDH"     => [Isozyme(Dict("b0114"=>1, "b0115"=>1, "b0116"=>1), 670.35, 423.7…
  "PYK"     => [Isozyme(Dict("b1854"=>1), 607.735, 454.975), Isozyme(Dict("b167…
  "CO2t"    => [Isozyme(Dict("s0001"=>1), 847.517, 338.625)]
  "MALt2_2" => [Isozyme(Dict("b3528"=>1), 383.908, 397.408)]
  "CS"      => [Isozyme(Dict("b0720"=>1), 750.383, 549.73)]
  "PGM"     => [Isozyme(Dict("b3612"=>1), 586.875, 439.579), Isozyme(Dict("b439…
  "TKT1"    => [Isozyme(Dict("b2935"=>1), 650.793, 528.251), Isozyme(Dict("b246…
  "ACONTa"  => [Isozyme(Dict("b0118"=>1), 621.458, 449.334), Isozyme(Dict("b127…
  "GLNS"    => [Isozyme(Dict("b3870"=>1), 411.219, 513.669), Isozyme(Dict("b129…
  "ICL"     => [Isozyme(Dict("b4015"=>1), 

We also construct similar bounds for total gene product amounts:

In [3]:
gene_product_bounds = Dict(genes(model) .=> Ref((0.0, 10.0)))

Dict{String, Tuple{Float64, Float64}} with 137 entries:
  "b4301" => (0.0, 10.0)
  "b1602" => (0.0, 10.0)
  "b4154" => (0.0, 10.0)
  "b3236" => (0.0, 10.0)
  "b1621" => (0.0, 10.0)
  "b1779" => (0.0, 10.0)
  "b3951" => (0.0, 10.0)
  "b1676" => (0.0, 10.0)
  "b3114" => (0.0, 10.0)
  "b1241" => (0.0, 10.0)
  "b2276" => (0.0, 10.0)
  "b1761" => (0.0, 10.0)
  "b3925" => (0.0, 10.0)
  "b3493" => (0.0, 10.0)
  "b3733" => (0.0, 10.0)
  "b2926" => (0.0, 10.0)
  "b0979" => (0.0, 10.0)
  "b4015" => (0.0, 10.0)
  "b2296" => (0.0, 10.0)
  ⋮       => ⋮

With this, the construction of the model constrained by all enzymatic
information is straightforward:

In [4]:
gecko_model =
    model |> with_gecko(;
        reaction_isozymes = rxn_isozymes,
        gene_product_bounds,
        gene_product_molar_mass = gene_product_masses,
        gene_product_mass_group = _ -> "uncategorized", # all products belong to the same "uncategorized" category
        gene_product_mass_group_bound = _ -> 100.0, # the total limit of mass in the single category
    )

Metabolic model of type GeckoModel
sparse([9, 51, 55, 64, 65, 73, 9, 51, 55, 64  …  200, 201, 202, 203, 204, 205, 206, 207, 208, 209], [1, 1, 1, 1, 1, 1, 2, 2, 2, 2  …  325, 326, 327, 328, 329, 330, 331, 332, 333, 334], [1.0, 1.0, -1.0, -1.0, 1.0, -0.001546133025849219, 1.0, 1.0, -1.0, -1.0  …  1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], 209, 334)
Number of reactions: 334
Number of metabolites: 209


(Alternatively, you may use `make_gecko_model`, which does the same
without piping by `|>`.)

The stoichiometry and coupling in the gecko model is noticeably more complex;
you may notice new "reactions" added that simulate the gene product
utilization:

In [5]:
[stoichiometry(gecko_model); coupling(gecko_model)]

262×334 SparseArrays.SparseMatrixCSC{Float64, Int64} with 1386 stored entries:
⠄⠦⠤⠤⠄⠶⠶⠦⠇⠶⡴⠠⠀⠀⢧⠀⠢⣲⠼⠿⠶⣶⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⡎⢶⡉⣁⡄⣬⣠⡅⠔⠰⡈⠀⠀⠐⡄⡀⠄⣫⡉⡉⠩⠝⠇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠭⠰⠀⠒⣆⣀⣊⣬⢗⣾⢠⣵⣆⠆⠀⡁⠨⠁⠈⠀⠂⠥⠁⠅⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⣧⡣⠁⠤⠟⠒⠻⠿⡗⠆⠠⣧⣤⢵⠀⢠⣄⡀⠘⠋⠚⠠⠂⠆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⢶⣂⠀⠀⠀⠀⠀⠀⠃⠀⠀⠉⠉⠉⠁⠈⠉⠉⠓⠀⠀⠀⠀⠰⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠈⠙⠺⠥⣀⡀⠀⠤⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠤⠀⠠⠄⠀⠱⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠘⣆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠑⢆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠉⠳⢄⡀⠀⠀⠀⠀⠀⠐⠀⠀⠀⠐⠀⠀⠀⠀⠀⠀⠀⠑⢆⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠲⠄⠀⠠⣄⠀⠀⠀⠀⠀⠐⠀⠀⠀⠀⠀⠀⠀⠀⠙⢦⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠧⣜⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢦⡀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠳⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢦⡀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢦⡀
⠒⠰⠤⣄⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙
⠀⠀⠀⠀⠀⠉⠁⠲⠤⢤⡀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠙⠀⠀⠐⠲⠤⢠⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠘⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒

Again, the resulting model can be used in any type of analysis. For example, flux balance analysis:

In [6]:
opt_model = flux_balance_analysis(gecko_model, GLPK.Optimizer)

A JuMP Model
Maximization problem with:
Variables: 334
Objective function type: JuMP.AffExpr
`JuMP.AffExpr`-in-`MathOptInterface.EqualTo{Float64}`: 209 constraints
`JuMP.AffExpr`-in-`MathOptInterface.LessThan{Float64}`: 774 constraints
Model mode: AUTOMATIC
CachingOptimizer state: ATTACHED_OPTIMIZER
Solver name: GLPK
Names registered in the model: c_lbs, c_ubs, lbs, mb, ubs, x

Get the fluxes

In [7]:
flux_sol = flux_dict(gecko_model, opt_model)

Dict{String, Float64} with 95 entries:
  "ACALD"       => 0.0
  "PTAr"        => 6.34803
  "ALCD2x"      => 0.0
  "PDH"         => 10.4628
  "PYK"         => 2.52634
  "CO2t"        => -16.3044
  "EX_nh4_e"    => -3.97209
  "MALt2_2"     => 0.0
  "CS"          => 1.38468
  "PGM"         => -14.9919
  "TKT1"        => 1.85156
  "EX_mal__L_e" => 0.0
  "ACONTa"      => 1.38468
  "EX_pi_e"     => -2.67974
  "GLNS"        => 0.186264
  "ICL"         => 0.0
  "EX_o2_e"     => -15.4623
  "FBA"         => 7.29353
  "EX_gln__L_e" => 0.0
  ⋮             => ⋮

Get the gene product concentrations

In [8]:
gp_concs = gene_product_dict(gecko_model, opt_model)

Dict{String, Float64} with 137 entries:
  "b4301" => -3.40857e-19
  "b1602" => 0.0
  "b4154" => 0.0
  "b3236" => 0.000804895
  "b1621" => 0.0144236
  "b3951" => 0.0
  "b1779" => 0.0252316
  "b1676" => 0.0
  "b3114" => 0.0
  "b1241" => 0.0
  "b2276" => 0.0499511
  "b1761" => 0.00565514
  "b3925" => 0.0
  "b3493" => 0.00349559
  "b3733" => 0.0550335
  "b2926" => 0.0332496
  "b0979" => 0.0581609
  "b4015" => 0.0
  "b2296" => 0.0170712
  ⋮       => ⋮

Get the total masses assigned to each mass group

In [9]:
gene_product_mass_group_dict(gecko_model, opt_model)

Dict{String, Float64} with 1 entry:
  "uncategorized" => 100.0

Variability:

In [10]:
flux_variability_analysis(gecko_model, GLPK.Optimizer, bounds = gamma_bounds(0.95))

95×2 Matrix{Float64}:
   5.94319       11.2165
   0.0           14.5575
  -0.0321351      9.85813
 -18.1604       -14.1881
   0.0            9.89027
  -3.45412        0.0
  -1.18575        0.0
 -17.1251       -13.1528
   2.54576        2.67974
  -3.45412        0.0
   ⋮            
   0.0            0.0
  -2.77471        8.56368
   0.0            3.01162
   0.0            4.05986
  27.1611        31.3545
   0.0            7.91762
   3.77348        4.79778
  14.092         17.1644
   1.60011e-15   15.6326

...and sampling:

In [11]:
affine_hit_and_run(gecko_model, warmup_from_variability(gecko_model, GLPK.Optimizer))' *
reaction_flux(gecko_model)

3340×95 Matrix{Float64}:
 5.79321  7.53434  4.341     -9.82259  …  2.58742  0.178106  4.65941  1.68849
 5.97172  7.69418  4.47547  -10.2705      2.82836  0.191991  5.18346  1.86908
 5.88919  7.08271  4.19867   -9.8486      2.55031  0.2609    4.80288  1.79194
 5.78822  7.59651  4.33514   -9.84917     2.56176  0.188864  4.66782  1.61288
 5.38645  7.6815   4.22388   -9.25145     2.24728  0.14908   4.24294  1.36614
 5.79294  7.4851   4.27372   -9.70463  …  2.58035  0.180579  4.68954  1.63211
 5.85428  7.62246  4.38253   -9.86583     2.55356  0.171362  4.67094  1.72747
 5.38753  7.2377   4.08927   -9.7917      2.9556   0.206229  5.38608  1.98014
 5.70746  7.3966   4.24649   -9.70319     2.63334  0.178416  4.66491  1.69637
 5.69817  7.69786  4.2997    -9.98324     3.02154  0.186499  4.66858  1.4801
 ⋮                                     ⋱                              
 6.41219  7.55886  4.77719  -10.7465      2.51659  0.190647  4.86579  2.17292
 6.09411  7.09449  4.54019   -9.93358     2.309

---

*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*