In [1]:
using COBREXA

Let's download and open the big model again

In [2]:
import Downloads
Downloads.download("http://bigg.ucsd.edu/static/models/iJO1366.json", "ecoli.json")
m = load_model(StandardModel, "ecoli.json");

Main reference is this:
https://lcsb-biocore.github.io/COBREXA.jl/stable/examples/07_gene_deletion/

Each reaction has a gene association, which dictates the gene products that
need to be available so that the reaction can "run".

In [3]:
genes(m)

grr = reaction_gene_association(m, "PFK")

2-element Vector{Vector{String}}:
 ["b1723"]
 ["b3916"]

The result is in DNF for (computational) simplicity; the rules can be
converted e.g. to Strings for reading:

In [4]:
COBREXA._unparse_grr(String, grr)

"(b1723) || (b3916)"

We might knock out genes by running through the reactions and evaluating DNF.
The knockout is available as a modification for simplicity:

In [5]:
gene_name(m, "b0720")

using GLPK
sol = flux_balance_analysis_dict(m, GLPK.Optimizer, modifications = [knockout("b0720")])
sol["BIOMASS_Ec_iJO1366_core_53p95M"]

4.7967818666792276e-32

...the model is still feasible but growth is basically zero.

We can screen through all genes. One could simply write:

In [6]:
[
    flux_balance_analysis_dict(m, GLPK.Optimizer, modifications = [knockout(g)]) for
    g in genes(m)[1:10]
]

10-element Vector{Dict{String, Float64}}:
 Dict("Zn2tex" => 0.00033498878813989854, "GUI1" => 0.0, "DXYLK" => 0.0, "CBL1tonex" => 0.0, "FE3DCITtonex" => 0.0, "FACOAL180t2pp" => 0.0, "METSOXR1" => 0.0, "LIPOtex" => 0.0, "NTD11" => 0.0, "GLUNpp" => 0.0…)
 Dict("Zn2tex" => 0.00033498878813989854, "GUI1" => 0.0, "DXYLK" => 0.0, "CBL1tonex" => 0.0, "FE3DCITtonex" => 0.0, "FACOAL180t2pp" => 0.0, "METSOXR1" => 0.0, "LIPOtex" => 0.0, "NTD11" => 0.0, "GLUNpp" => 0.0…)
 Dict("Zn2tex" => 0.00033498878813989854, "GUI1" => 0.0, "DXYLK" => 0.0, "CBL1tonex" => 0.0, "FE3DCITtonex" => 0.0, "FACOAL180t2pp" => 0.0, "METSOXR1" => 0.0, "LIPOtex" => 0.0, "NTD11" => 0.0, "GLUNpp" => 0.0…)
 Dict("Zn2tex" => 0.00033498878813989854, "GUI1" => 0.0, "DXYLK" => 0.0, "CBL1tonex" => 0.0, "FE3DCITtonex" => 0.0, "FACOAL180t2pp" => 0.0, "METSOXR1" => 0.0, "LIPOtex" => 0.0, "NTD11" => 0.0, "GLUNpp" => 0.0…)
 Dict("Zn2tex" => 0.00033498878813989854, "GUI1" => 0.0, "DXYLK" => 0.0, "CBL1tonex" => 0.0, "FE3DCITtonex" => 0.0

...but that would run for quite a bit of time, and does not always even
return a solution! (for some knockouts, there's even no feasible solution)

First, let's use COBREXA parallelization capabilities to make this bearably
fast. We use Distributed package to run this over multiple processes:

In [7]:
using Distributed
addprocs(8)  # you may add more depending on your machine or cluster size

load our stuff on the small cluster

In [8]:
@everywhere using COBREXA, GLPK

screen function allows us to run many analyses on a model with parallel, with
many optimizations related for distributed processing (e.g., data are only
moved once).

In [9]:
knockout_fluxes = screen(
    m, # the model which we process
    args = tuple.(genes(m)[1:10]), # all argument lists for the analyses
    analysis = (m, gene) -> # the analysis function ("lambda") that we want to run on the cluster for each item from the argument list
        flux_balance_analysis_dict(m, GLPK.Optimizer, modifications = [knockout(gene)]),
    workers = workers(), # this gives it the list of worker nodes to use
)

10-element Vector{Dict{String, Float64}}:
 Dict("Zn2tex" => 0.00033498878813989854, "GUI1" => 0.0, "DXYLK" => 0.0, "CBL1tonex" => 0.0, "FE3DCITtonex" => 0.0, "FACOAL180t2pp" => 0.0, "METSOXR1" => 0.0, "LIPOtex" => 0.0, "NTD11" => 0.0, "GLUNpp" => 0.0…)
 Dict("Zn2tex" => 0.00033498878813989854, "GUI1" => 0.0, "DXYLK" => 0.0, "CBL1tonex" => 0.0, "FE3DCITtonex" => 0.0, "FACOAL180t2pp" => 0.0, "METSOXR1" => 0.0, "LIPOtex" => 0.0, "NTD11" => 0.0, "GLUNpp" => 0.0…)
 Dict("Zn2tex" => 0.00033498878813989854, "GUI1" => 0.0, "DXYLK" => 0.0, "CBL1tonex" => 0.0, "FE3DCITtonex" => 0.0, "FACOAL180t2pp" => 0.0, "METSOXR1" => 0.0, "LIPOtex" => 0.0, "NTD11" => 0.0, "GLUNpp" => 0.0…)
 Dict("Zn2tex" => 0.00033498878813989854, "GUI1" => 0.0, "DXYLK" => 0.0, "CBL1tonex" => 0.0, "FE3DCITtonex" => 0.0, "FACOAL180t2pp" => 0.0, "METSOXR1" => 0.0, "LIPOtex" => 0.0, "NTD11" => 0.0, "GLUNpp" => 0.0…)
 Dict("Zn2tex" => 0.00033498878813989854, "GUI1" => 0.0, "DXYLK" => 0.0, "CBL1tonex" => 0.0, "FE3DCITtonex" => 0.0

let's preprocess the results a little, and add more genes

In [10]:
knockout_fluxes = screen(
    m,
    args = tuple.(genes(m)[1:50]),
    analysis = (m, gene) -> begin
        res = flux_balance_analysis_dict(m, GLPK.Optimizer, modifications = [knockout(gene)])
        if !isnothing(res)
            gene => res["BIOMASS_Ec_iJO1366_core_53p95M"]
        else
            gene => 0.0
        end
    end,
    workers = workers(),
)

50-element Vector{Pair{String, Float64}}:
 "b1377" => 0.9823718127269752
 "b0241" => 0.9823718127269752
 "b0929" => 0.9823718127269752
 "b2215" => 0.9823718127269752
 "b0653" => 0.9823718127269752
 "b0655" => 0.9823718127269752
 "b0118" => 0.9823718127269752
 "b1276" => 0.9823718127269752
 "b4032" => 0.9823718127269772
 "b3359" => -1.3672895417877006e-17
         ⋮
 "b3737" => 0.4024773014907232
 "b3733" => 0.4024773014907232
 "b3734" => 0.4024773014907232
 "b3738" => 0.4024773014907232
 "b1009" => 0.9823718127269752
 "b1812" => -1.1020296920314708e-30
 "b0180" => 0.0
 "b3360" => -1.1020296920314708e-30
 "b3731" => 0.4024773014907232

after debugging, you can erase the limit to the first 50 genes

create a CSV with a report

In [11]:
using DataFrames, CSV

df = DataFrame(gene = first.(knockout_fluxes))
df.name = gene_name.(Ref(m), df.gene)
df.fluxes = last.(knockout_fluxes)

best_result = maximum(last.(knockout_fluxes))
essential_threshold = 0.01 * best_result
df.essential = df.fluxes .<= essential_threshold
df.interesting = (df.fluxes .< best_result * 0.999) .&& .!df.essential

CSV.write("ko_report.csv", df)

df

Row,gene,name,fluxes,essential,interesting
Unnamed: 0_level_1,String,String,Float64,Bool,Bool
1,b1377,ompN,0.982372,false,false
2,b0241,phoE,0.982372,false,false
3,b0929,ompF,0.982372,false,false
4,b2215,ompC,0.982372,false,false
5,b0653,gltK,0.982372,false,false
6,b0655,gltI,0.982372,false,false
7,b0118,acnB,0.982372,false,false
8,b1276,acnA,0.982372,false,false
9,b4032,malG,0.982372,false,false
10,b3359,argD,-1.36729e-17,true,false


---

*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*