# Gene knockouts

Here we will use the `knockout` function to modify the optimization
model before solving, in order to simulate genes knocked out. We can pass
`knockout` to many analysis functions that support parameter
`modifications`, including `flux_balance_analysis`,
`flux_variability_analysis`, and others.

## Deleting a single gene

In [1]:
!isfile("e_coli_core.xml") &&
    download("http://bigg.ucsd.edu/static/models/e_coli_core.xml", "e_coli_core.xml")

using COBREXA, GLPK

model = load_model("e_coli_core.xml")

Metabolic model of type SBMLModel
sparse([41, 23, 51, 67, 61, 65, 1, 7, 19, 28  …  72, 3, 8, 33, 57, 66, 31, 45, 46, 57], [1, 2, 2, 2, 3, 3, 4, 4, 4, 4  …  93, 94, 94, 94, 94, 94, 95, 95, 95, 95], [-1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, -1.0, 1.0, -1.0  …  1.0, -1.0, 1.0, -1.0, 1.0, 1.0, 1.0, -1.0, -1.0, 1.0], 72, 95)
Number of reactions: 95
Number of metabolites: 72


First, let's compute the "original" flux, with no knockouts.

In [2]:
original_flux = flux_balance_analysis_dict(model, GLPK.Optimizer);

One can find gene IDs that we can knock out using `genes` and
`gene_name` functions:

In [3]:
genes(model)

137-element Vector{String}:
 "G_b1818"
 "G_b2278"
 "G_b3213"
 "G_b0811"
 "G_b3736"
 "G_b1479"
 "G_b3919"
 "G_b2281"
 "G_b4090"
 "G_b2914"
 ⋮
 "G_b4301"
 "G_b4152"
 "G_b4395"
 "G_b0727"
 "G_b1723"
 "G_b3403"
 "G_b3386"
 "G_b1819"
 "G_b1702"

It is possible to sort the genes by gene name to allow easier lookups:

In [4]:
sort(gene_name.(Ref(model), genes(model)) .=> genes(model))

137-element Vector{Pair{String, String}}:
 "aceA" => "G_b4015"
 "aceB" => "G_b4014"
 "aceE" => "G_b0114"
 "aceF" => "G_b0115"
 "ackA" => "G_b2296"
 "acnA" => "G_b1276"
 "acnB" => "G_b0118"
 "adhE" => "G_b1241"
 "adhP" => "G_b1478"
  "adk" => "G_b0474"
        ⋮
 "talB" => "G_b0008"
 "tdcD" => "G_b3115"
 "tdcE" => "G_b3114"
 "tktA" => "G_b2935"
 "tktB" => "G_b2465"
 "tpiA" => "G_b3919"
 "ydjI" => "G_b1773"
 "ytjC" => "G_b4395"
  "zwf" => "G_b1852"

Compute the flux with a genes knocked out:

In [5]:
flux_with_knockout =
    flux_balance_analysis_dict(model, GLPK.Optimizer, modifications = [knockout("G_b3236")])

Dict{String, Float64} with 95 entries:
  "R_EX_fum_e"    => 0.0
  "R_ACONTb"      => 8.33976
  "R_TPI"         => 9.17856
  "R_SUCOAS"      => -7.44879
  "R_GLNS"        => 0.211162
  "R_EX_pi_e"     => -3.03794
  "R_PPC"         => 9.81525
  "R_O2t"         => 23.9021
  "R_G6PDH2r"     => 5.90233e-15
  "R_TALA"        => -0.147739
  "R_PPCK"        => -4.78184e-15
  "R_EX_lac__D_e" => 8.50462e-15
  "R_PGL"         => 7.80602e-15
  "R_H2Ot"        => -30.8724
  "R_GLNabc"      => 0.0
  "R_EX_co2_e"    => 24.8568
  "R_EX_gln__L_e" => 0.0
  "R_EX_nh4_e"    => -4.50303
  "R_MALt2_2"     => -1.1582e-14
  ⋮               => ⋮

We can see there is a small decrease in production upon knocking out the gene:

In [6]:
biomass_id = "R_BIOMASS_Ecoli_core_w_GAM"
flux_with_knockout[biomass_id] / original_flux[biomass_id]

0.9449581959158885

Similarly, we can explore how the flux variability has changed once the gene
is knocked out:

In [7]:
variability_with_knockout =
    flux_variability_analysis(model, GLPK.Optimizer, modifications = [knockout("G_b3236")])

95×2 Matrix{Float64}:
   0.0           -1.03853e-13
   8.33976        8.33976
   9.17856        9.17856
  -7.44879       -7.44879
   0.211162       0.211162
  -3.03794       -3.03794
   9.81525        9.81525
  23.9021        23.9021
  -9.42733e-16   -7.16365e-12
  -0.147739      -0.147739
   ⋮            
   0.0           -2.89533e-11
  30.8724        30.8724
   0.0           -1.44766e-11
  -8.88178e-16    0.737991
   3.56283e-15   -8.14164e-13
 -23.9021       -23.9021
   4.02126e-14    0.0
   1.09592e-13    0.0
   3.03794        3.03794

## Knocking out multiple genes

Multiple genes can be knocked out by simply passing a vector of genes to the
knockout modification. This knocks out all genes that can run the FBA
reaction:

In [8]:
reaction_gene_association(model, "R_FBA")

3-element Vector{Vector{String}}:
 ["G_b2097"]
 ["G_b1773"]
 ["G_b2925"]

In [9]:
flux_with_double_knockout = flux_balance_analysis_dict(
    model,
    GLPK.Optimizer,
    modifications = [knockout(["G_b2097", "G_b1773", "G_b2925"])],
)

Dict{String, Float64} with 95 entries:
  "R_EX_fum_e"    => 0.0
  "R_ACONTb"      => 0.759585
  "R_TPI"         => 0.0
  "R_SUCOAS"      => -9.87092e-15
  "R_GLNS"        => 0.180022
  "R_EX_pi_e"     => -2.58994
  "R_PPC"         => 2.01749
  "R_O2t"         => 27.5263
  "R_G6PDH2r"     => 27.8991
  "R_TALA"        => 9.17374
  "R_PPCK"        => -2.30181e-14
  "R_EX_lac__D_e" => -5.41005e-15
  "R_PGL"         => 27.8991
  "R_H2Ot"        => -31.7697
  "R_GLNabc"      => 0.0
  "R_EX_co2_e"    => 26.6412
  "R_EX_gln__L_e" => 0.0
  "R_EX_nh4_e"    => -3.83897
  "R_MALt2_2"     => -7.3294e-15
  ⋮               => ⋮

In [10]:
flux_with_double_knockout[biomass_id] / original_flux[biomass_id]

0.8056066159777677

## Processing all single gene knockouts

Function `screen` provides a parallelizable and extensible way to run
the flux balance analysis with the knockout over all genes:

In [11]:
knockout_fluxes = screen(
    model,
    args = tuple.(genes(model)),
    analysis = (m, gene) -> begin
        res = flux_balance_analysis_dict(m, GLPK.Optimizer, modifications = [knockout(gene)])
        if !isnothing(res)
            res[biomass_id]
        end
    end,
)

137-element Vector{Union{Nothing, Float64}}:
 0.8739215069684418
 0.21166294973531138
 0.8739215069684418
 0.8739215069684418
 0.37422987493310794
 0.8739215069684418
 0.7040369478590376
 0.21166294973531138
 0.8739215069684418
 0.8739215069684418
 ⋮
 0.8739215069684418
 0.8739215069684317
 0.8739215069684418
 0.8583074080226936
 0.8739215069684418
 0.8739215069684316
 0.8739215069684418
 0.8739215069684418
 0.8739215069684418

It is useful to display the biomass growth rates of the knockout models
together with the gene name:

In [12]:
sort(gene_name.(Ref(model), genes(model)) .=> knockout_fluxes, by = first)

137-element Vector{Pair{String}}:
 "aceA" => 0.8739215069684418
 "aceB" => 0.8739215069684418
 "aceE" => 0.7966959254309692
 "aceF" => 0.7966959254309692
 "ackA" => 0.8739215069684418
 "acnA" => 0.8739215069684418
 "acnB" => 0.8739215069684418
 "adhE" => 0.8739215069684418
 "adhP" => 0.8739215069684418
  "adk" => 0.8739215069684344
        ⋮
 "talB" => 0.8739215069684418
 "tdcD" => 0.8739215069684418
 "tdcE" => 0.8739215069684418
 "tktA" => 0.8739215069684418
 "tktB" => 0.8739215069684418
 "tpiA" => 0.7040369478590376
 "ydjI" => 0.8739215069684418
 "ytjC" => 0.8739215069684418
  "zwf" => 0.8638133095040021

## Processing all multiple-gene deletions

### Double gene knockouts

Since we can generate any kind of argument matrix for `screen` to
process, it is straightforward to generate the matrix of all double gene
knockouts and let the function process it. This computes the biomass
production of all double-gene knockouts:

In [13]:
gene_groups = [[g1, g2] for g1 in genes(model), g2 in genes(model)];
double_knockout_fluxes = screen(
    model,
    args = tuple.(gene_groups),
    analysis = (m, gene_groups) -> begin
        res = flux_balance_analysis_dict(
            m,
            GLPK.Optimizer,
            modifications = [knockout(gene_groups)],
        )
        if !isnothing(res)
            res[biomass_id]
        end
    end,
)

137×137 Matrix{Union{Nothing, Float64}}:
 0.873922  0.211663  0.873922  0.873922  …  0.873922  0.873922  0.873922
 0.211663  0.211663  0.211663  0.211663     0.211663  0.211663  0.211663
 0.873922  0.211663  0.873922  0.873922     0.873922  0.873922  0.873922
 0.873922  0.211663  0.873922  0.873922     0.873922  0.873922  0.873922
 0.37423   0.202612  0.37423   0.37423      0.37423   0.37423   0.37423
 0.873922  0.211663  0.873922  0.873922  …  0.873922  0.873922  0.873922
 0.704037   nothing  0.704037  0.704037     0.704037  0.704037  0.481972
 0.211663  0.211663  0.211663  0.211663     0.211663  0.211663  0.211663
 0.873922  0.211663  0.873922  0.873922     0.873922  0.873922  0.873922
 0.873922  0.211663  0.873922  0.873922     0.873922  0.873922  0.873922
 ⋮                                       ⋱            ⋮         
 0.873922  0.211663  0.873922  0.873922     0.865716  0.873922  0.873922
 0.873922  0.211663  0.873922  0.873922     0.873922  0.873922  0.873922
 0.873922  0.211663

The results can be converted to an easily scrutinizable form as follows:

In [14]:
reshape([gene_name.(Ref(model), p) for p in gene_groups] .=> double_knockout_fluxes, :)

18769-element Vector{Pair{Vector{String}}}:
 ["manY", "manY"] => 0.8739215069684418
 ["nuoL", "manY"] => 0.21166294973531138
 ["gltD", "manY"] => 0.8739215069684418
 ["glnH", "manY"] => 0.8739215069684418
 ["atpF", "manY"] => 0.37422987493310794
 ["maeA", "manY"] => 0.8739215069684418
 ["tpiA", "manY"] => 0.7040369478590376
 ["nuoI", "manY"] => 0.21166294973531138
 ["rpiB", "manY"] => 0.8739215069684418
 ["rpiA", "manY"] => 0.8739215069684418
                  ⋮
 ["sgcE", "ppsA"] => 0.8739215069684418
 ["frdC", "ppsA"] => 0.8739215069684317
 ["ytjC", "ppsA"] => 0.8739215069684418
 ["sucB", "ppsA"] => 0.8583074080226936
 ["pfkB", "ppsA"] => 0.8739215069684418
  ["pck", "ppsA"] => 0.8739215069684316
  ["rpe", "ppsA"] => 0.8739215069684418
 ["manZ", "ppsA"] => 0.8739215069684418
 ["ppsA", "ppsA"] => 0.8739215069684418

### Triple gene knockouts (and others)

We can extend the same analysis to triple or other gene knockouts by
generating a different array of gene pairs. For example, one can generate
gene_groups for triple gene deletion screening:

In [15]:
gene_groups = [[g1, g2, g3] for g1 in genes(model), g2 in genes(model), g3 in genes(model)];

---

*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*