## Tutorial: Using `StandardModel`

In this tutorial we will use `COBREXA`'s `StandardModel` and functions that specifically operate on it. As usual we will use the toy model of *E. coli* for demonstration.

Let's first load the model.

### Loading a model

In [67]:
# download file if it is not already present
!isfile("e_coli_core.json") && download("http://bigg.ucsd.edu/static/models/e_coli_core.json", "e_coli_core.json")

using COBREXA

model = load_model(StandardModel, "e_coli_core.json") # we specifically want to load a StandardModel from the model file

[36m[95mMetabolic model of type StandardModel
[95m
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⢄⠀⠀⠀⠀⠀⠀⠀⠈⠶⠴⡆⠀⠀⠀⠀⠀⠀
⡀⢐⣀⢀⡀⡒⢒⣐⠀⣂⣂⠀⣂⣂⢂⠀⢀⠀⠀⠀⠀⠀⢀⠄⠀⠀⠀⢂⠀⢂⣀⣐⡒⡀⠆⢙⣀⠀⡀⠀
⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠀⠀⠀⠰⠀⠀⠀⠀⠀⠀⠀⠀⠀⠠⠀⠀⠀⠀⠀⡀⠀⠀⠀⠀⠈⢑⣀⣀⠀⠀
⠀⠀⠃⠀⠃⠀⠀⠀⠘⠀⡇⠀⠀⠀⠀⠀⢸⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠀⠀⡜⠀⡄⣤⢠⠘⠙⢣⡇⠘
⠀⠐⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠐⠀⠀⠀⠀⠀⠐⠁⠉⠀⠀⠀⠀⠀⠘⠄
⠀⢐⠀⠂⠀⠄⠠⠠⠀⠠⠆⠀⠄⠀⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠠⠀⠠⠀⠀⢀⠀⠀⠠⠀⠀⠁
⢀⠐⠀⠨⢀⠁⠈⣈⠀⢁⣁⠀⠀⠀⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄⠀⠁⢀⠀⢊⠉⠀⠀⠀⢀⠀⣀⠀⢀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡈⠀⡀⠆⠀⠆⠀⡀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠆⠀
⠀⠀⠂⠀⡂⠀⠀⠁⠀⠀⠀⠈⠁⠀⠀⠀⠄⠄⢁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠀⠀⠀⠀
⠈⠀⠁⠀⠀⢀⡀⠀⠠⠁⠁⠀⠑⠀⠐⠲⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂⠀⠂⠀⠀⠀⠀⠀⠀⠊⠀⠀⠀⠈
⠄⠠⢠⠀⠰⠀⠠⠀⠤⠦⠄⠈⠀⠀⠀⠠⠀⠁⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠤⠄⠄⠠⠀⠀⠀⠀⠀
⠂⠐⠀⠀⠐⡠⢐⠘⢃⠒⠂⡀⠄⠀⠀⠐⠀⠀⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠒⠀⢀⢀⠀⠀⣀⠀⢀
⠈⠀⠁⠀⡀⠀⠀⠀⠈⠁⠅⠀⠁⠀⢀⠈⠄⠔⠀⠀⠀⠀⠀⠀⠀⠀⠐⠀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠀⠈
⠣⠁⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠈⠀⠁⠁⠀⠈⡀⠀⠀⠀⠀⠀⠐⢣⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⡄⠀⠀⠀⠀⠂⠄⠤⠀⠀⠈⠂⠀⠀⠀⠀⠠⠀⠊⠒⣠⠀⠀⠀⠀⠀⠀⠀⠀⠀
[36mNumber of reactions: [95m95
[36mNumber of metabolites: [95m72


### Basic analysis

As before, for optimization based analysis we need to load a solver. Here we will use [`Tulip.jl`](https://github.com/ds4dm/Tulip.jl) to solve the linear programs of this tutorial. Refer to the flux balance analysis tutorial if you are confused by any functions in this section.

All the normal analysis functions work on `StandardModel`, due to it also having the same generic accessor interface as all the other model types.

In [68]:
using Tulip

dict_sol = flux_balance_analysis_dict(
    model,
    Tulip.Optimizer;
    modifications = [
        change_objective("BIOMASS_Ecoli_core_w_GAM"),
        change_constraint("EX_glc__D_e", -12, -12),
        change_constraint("EX_o2_e", 0, 0),
        ],
)

Dict{String, Float64} with 95 entries:
  "ACALD"       => -9.78427
  "PTAr"        => 10.0729
  "ALCD2x"      => -9.78427
  "PDH"         => 1.98388e-9
  "PYK"         => 9.94501
  "CO2t"        => 0.487021
  "EX_nh4_e"    => -1.48633
  "MALt2_2"     => -0.0
  "CS"          => 0.294088
  "PGM"         => -22.8676
  "TKT1"        => -0.0487648
  "EX_mal__L_e" => -0.0
  "ACONTa"      => 0.294088
  "EX_pi_e"     => -1.00274
  "GLNS"        => 0.069699
  "ICL"         => 5.34951e-11
  "EX_o2_e"     => -0.0
  "FBA"         => 11.7289
  "EX_gln__L_e" => -0.0
  "EX_glc__D_e" => -12.0
  "SUCCt3"      => 9.36957e-10
  "FORt2"       => 6.1847e-10
  "G6PDH2r"     => 4.23233e-9
  "AKGDH"       => 5.31373e-11
  "TKT2"        => -0.147167
  ⋮             => ⋮

This is not very exciting yet, since every other model type can do this. However, deeper inspection of flux results is possible when using `StandardModel`. 

### Inspecting the flux solution: `atom_exchange`

It is sometimes interesting to keep track of the atoms entering and leaving the system, this can be inspected by calling `atom_exchange`.

In [69]:
?atom_exchange

search: [0m[1ma[22m[0m[1mt[22m[0m[1mo[22m[0m[1mm[22m[0m[1m_[22m[0m[1me[22m[0m[1mx[22m[0m[1mc[22m[0m[1mh[22m[0m[1ma[22m[0m[1mn[22m[0m[1mg[22m[0m[1me[22m



```
atom_exchange(flux_dict::Dict{String, Float64}, model::StandardModel)
```

Return a dictionary mapping the flux of atoms across the boundary of the model  given `flux_dict` (the solution of a constraint based analysis) of reactions in `model`.


In [70]:
atom_exchange(dict_sol, model) # flux of individual atoms entering and leaving the system through boundary reactions (e.g. exchange reactions) based on flux_dict

Dict{String, Float64} with 5 entries:
  "C" => -11.5998
  "N" => -1.48633
  "P" => -1.00274
  "H" => -20.7086
  "O" => -12.995

### Inspecting the flux solution: `exchange_reactions`

It is also sometimes useful to inspect the exchange reactions used by a flux solution. The function `exchange_reactions` fulfills this purpose.

In [71]:
?exchange_reactions

search: [0m[1me[22m[0m[1mx[22m[0m[1mc[22m[0m[1mh[22m[0m[1ma[22m[0m[1mn[22m[0m[1mg[22m[0m[1me[22m[0m[1m_[22m[0m[1mr[22m[0m[1me[22m[0m[1ma[22m[0m[1mc[22m[0m[1mt[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m[0m[1ms[22m find_[0m[1me[22m[0m[1mx[22m[0m[1mc[22m[0m[1mh[22m[0m[1ma[22m[0m[1mn[22m[0m[1mg[22m[0m[1me[22m[0m[1m_[22m[0m[1mr[22m[0m[1me[22m[0m[1ma[22m[0m[1mc[22m[0m[1mt[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m[0m[1ms[22m



```
get_exchanges(flux_dict::Dict{String, Float64}; top_n=Inf, ignorebound=_constants.default_reaction_bound, verbose=true)
```

Display the `top_n` producing and consuming exchange fluxes. If `top_n` is not specified (by an integer), then all are displayed. Ignores infinite (problem upper/lower bound) fluxes (set with ignorebound). When `verbose` is false, the output is not printed out. Return these reactions (id => ) in two dictionaries: `consuming`, `producing`


In [72]:
consuming, producing = exchange_reactions(dict_sol, model; top_n = 4);

Consuming fluxes: 
EX_glc__D_e = -12.0
EX_h2o_e = -8.285701
EX_nh4_e = -1.48633
EX_pi_e = -1.002744
EX_co2_e = -0.487021
Producing fluxes: 
EX_h_e = 36.713726
EX_for_e = 21.172843
EX_ac_e = 10.072906
EX_etoh_e = 9.78427
EX_succ_e = 0.0


### Inspecting the flux solution: `metabolite_fluxes`

Another useful flux result analysis function is `metabolite_fluxes`. 

In [95]:
?metabolite_fluxes

search: [0m[1mm[22m[0m[1me[22m[0m[1mt[22m[0m[1ma[22m[0m[1mb[22m[0m[1mo[22m[0m[1ml[22m[0m[1mi[22m[0m[1mt[22m[0m[1me[22m[0m[1m_[22m[0m[1mf[22m[0m[1ml[22m[0m[1mu[22m[0m[1mx[22m[0m[1me[22m[0m[1ms[22m [0m[1mm[22m[0m[1me[22m[0m[1mt[22m[0m[1ma[22m[0m[1mb[22m[0m[1mo[22m[0m[1ml[22m[0m[1mi[22m[0m[1mt[22m[0m[1me[22m[0m[1m_[22m[0m[1mf[22mormu[0m[1ml[22ma



```
metabolite_fluxes(flux_dict::Dict{String, Float64}, model::StandardModel)
```

Return two dictionaries of metabolite `id`s mapped to reactions that consume or  produce them given the flux distribution supplied in `fluxdict`.


In [74]:
consuming, producing = metabolite_fluxes(dict_sol, model)

consuming["atp_c"] # try producing["atp_c"]

Dict{String, Float64} with 5 entries:
  "PFK"                      => -11.7289
  "BIOMASS_Ecoli_core_w_GAM" => -16.3031
  "GLNS"                     => -0.069699
  "ATPM"                     => -8.39
  "ATPS4r"                   => -6.80168

### Internals of `StandardModel`

Another benefit of `StandardModel` is that it supports a richer infrastructure of types that can be used to manipulate internal model attributes, like the genes, reactions, and metabolites of a model. This is particularly useful when modifying or even constructing a model from scratch.

Let's investigate the structure of a `StandardModel`.

### `Gene`s, `Reaction`s, and `Metabolite`s

`StandardModel` is composed of ordered dictionaries of `Gene`s, `Metabolite`s and `Reaction`s. Ordered dictionaries are used because the order of the reactions and metabolites are important for constructing a stoichiometric matrix where the rows and columns correspond to the order of the metabolites and reactions returned by calling the accessors `metabolites` and `reactions`.

Each `StandardModel` is composed of the following fields:

In [75]:
fieldnames(StandardModel) # fields of a StandardModel

(:id, :reactions, :metabolites, :genes)

The `genes` field of a `StandardModel` contains an ordered dictionary of gene ids mapped to `Gene`s. 

In [76]:
model.genes

OrderedCollections.OrderedDict{String, Gene} with 137 entries:
  "b1241" => Gene("b1241", nothing, Dict("original_bigg_ids"=>["b1241"]), Dict(…
  "b0351" => Gene("b0351", nothing, Dict("original_bigg_ids"=>["b0351"]), Dict(…
  "s0001" => Gene("s0001", nothing, Dict("original_bigg_ids"=>["s0001"]), Dict(…
  "b1849" => Gene("b1849", nothing, Dict("original_bigg_ids"=>["b1849"]), Dict(…
  "b3115" => Gene("b3115", nothing, Dict("original_bigg_ids"=>["b3115"]), Dict(…
  "b2296" => Gene("b2296", nothing, Dict("original_bigg_ids"=>["b2296"]), Dict(…
  "b1276" => Gene("b1276", nothing, Dict("original_bigg_ids"=>["b1276"]), Dict(…
  "b0118" => Gene("b0118", nothing, Dict("original_bigg_ids"=>["b0118"]), Dict(…
  "b0474" => Gene("b0474", nothing, Dict("original_bigg_ids"=>["b0474"]), Dict(…
  "b0116" => Gene("b0116", nothing, Dict("original_bigg_ids"=>["b0116"]), Dict(…
  "b0727" => Gene("b0727", nothing, Dict("original_bigg_ids"=>["b0727"]), Dict(…
  "b0726" => Gene("b0726", nothing, Dict("orig

The `Gene` type is a struct that can be used to store information about genes in a `StandardModel`. The keys used in the ordered dictionaries in `model.genes` are the ids returned using the generic accessor `genes`. `Gene`s have pretty printing, as demonstrated below for a random gene drawn from the model

In [77]:
random_gene_id = genes(model)[rand(1:n_genes(model))]
model.genes[random_gene_id]

[36mGene.id: [95mb2415
[36mGene.name: [95m---
[36mGene.notes: 
[95m	original_bigg_ids: ["b2415"]
[36mGene.annotations: 
[95m	sbo: ["SBO:0000243"]
[95m	uniprot: ["P0AA04"]
[95m	ecogene: ["EG10788"]
[95m	ncbigene: ["946886"]
[95m	ncbigi: ["16130341"]
[95m	refseq_locus_tag: ["b2415"]
[95m	refseq_name: ["ptsH"]
[95m	asap: ["ABE-0007962"]
[95m	refseq_synonym: ctr, ..., iex?


The same idea holds for both metabolites (stored as `Metabolite`s) and reactions (stored as `Reaction`s). This is demonstrated below.

In [78]:
random_metabolite_id = metabolites(model)[rand(1:n_metabolites(model))]
model.metabolites[random_metabolite_id]

[36mMetabolite.id: [95mfum_c
[36mMetabolite.name: [95m---
[36mMetabolite.formula: [95mC4H2O4
[36mMetabolite.charge: [95m-2
[36mMetabolite.compartment: [95mc
[36mMetabolite.notes: 
[95m	original_bigg_ids: ["fum_c"]
[36mMetabolite.annotations: 
[95m	envipath: 32de3cf4-e3e6-4168-9...
[95m	kegg.drug: ["D02308"]
[95m	kegg.compound: ["C00122"]
[95m	sbo: ["SBO:0000247"]
[95m	sabiork: ["1910"]
[95m	biocyc: ["META:FUM"]
[95m	chebi: CHEBI:36180, ..., CHEBI:24122
[95m	metanetx.chemical: ["MNXM93"]
[95m	inchi_key: VZCYOOQTPOCHFL-OWOJB...
[95m	hmdb: ["HMDB00134"]
[95m	bigg.metabolite: ["fum"]
[95m	seed.compound: ["cpd00106"]
[95m	reactome.compound: ["113588", "29586"]


In [79]:
random_reaction_id = reactions(model)[rand(1:n_reactions(model))]
model.reactions[random_reaction_id]

[36mReaction.id: [95mG6PDH2r
[36mReaction.name: [95m---
[36mReaction.metabolites: [95m1.0 nadp_c + 1.0 g6p_c ⟷  1.0 nadph_c + 1.0 6pgl_c + 1.0 h_c
[36mReaction.lb: [95m-1000.0
[36mReaction.ub: [95m1000.0
[36mReaction.grr: [95m(b1852)
[36mReaction.subsystem: [95mPentose Phosphate Pathway
[36mReaction.notes: 
[95m	original_bigg_ids: ["G6PDH2r"]
[36mReaction.annotations: 
[95m	bigg.reaction: ["G6PDH2r"]
[95m	sabiork: ["1176", "6509"]
[95m	metanetx.reaction: ["MNXR99907"]
[95m	rhea: 15842, ..., 15844
[95m	sbo: ["SBO:0000176"]
[95m	seed.reaction: ["rxn00604"]
[95m	kegg.reaction: ["R00835"]
[95m	biocyc: META:GLU6PDEHYDROG-R...
[95m	ec-code: ["1.1.1.363", "1.1.1.49"]
[36mReaction.objective_coefficient: [95m0.0


### Using the internals of `StandardModel`s: `check_duplicate_annotations`

`StandardModel` can be used to build your own metabolic model or modify an existing one. One of the main use cases for `StandardModel` is that it can be used to merge multiple models together. Since the internals are uniform inside each `StandardModel`, attributes of other model types are squashed into the required format. This ensures that the internals of all `StandardModel`s are the same - allowing easy systematic evaluation.

For example, often when models are automatically reconstructed duplicate genes, reactions or metabolites end up in a model. `COBREXA` exports `check_duplicate_annotations` to check for cases where the id may be different but the annotations the same (possibly suggesting a duplication).

In [80]:
?check_duplicate_annotations

search: [0m[1mc[22m[0m[1mh[22m[0m[1me[22m[0m[1mc[22m[0m[1mk[22m[0m[1m_[22m[0m[1md[22m[0m[1mu[22m[0m[1mp[22m[0m[1ml[22m[0m[1mi[22m[0m[1mc[22m[0m[1ma[22m[0m[1mt[22m[0m[1me[22m[0m[1m_[22m[0m[1ma[22m[0m[1mn[22m[0m[1mn[22m[0m[1mo[22m[0m[1mt[22m[0m[1ma[22m[0m[1mt[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m[0m[1ms[22m



```
check_duplicate_annotations(gene::Gene, genes::Dict{String, Gene}; inspect_annotations=_constants.gene_annotation_checks)
```

Determine if `gene` has any overlapping annotations in `genes`. The annotations checked are listed in `COBREXA._constants.gene_annotation_checks`. Return the `id` of the gene with duplicate annotations in `genes`, otherwise `nothing`.

---

```
check_duplicate_annotations(met::Metabolite, mets::OrderedDict{String, Metabolite}; inspect_annotations=_constants.metabolite_annotation_checks)
```

Check if a metabolite `met` has overlapping annotations with metabolites in `mets`. The annotations checked are listed in `COBREXA._constants.metabolite_annotation_checks`. Return id of the first hit, otherwise `nothing`.

See also: [`check_same_formula`](@ref), [`get_atoms`](@ref)

---

```
check_duplicate_annotations(rxn::Reaction, rxns::OrderedDict{String, Reaction}; inspect_annotations=_constants.reaction_annotation_checks)
```

Determine if a `rxn` has overlapping annotations in `rxns`. The annotations checked are listed in `COBREXA._constants.reaction_annotation_checks`. Return the `id` of the first hit, otherwise `nothing`.


For example, suppose we want to check if a metabolite already exists in the model (but has another id). Checking for unique formulas is not a good way to do this since many metabolites share the same formulas (the bonds may be different though). However, checking annotation details, e.g. inchi_keys, etc., is a more robust way for indentifying overlapping metabolites.

Here we will check if a dummy created metabolite already exists in the model by only checking if any annotation details overlap. 

In [92]:
new_metabolite = Metabolite() # construct a dummy metabolite
new_metabolite.id = "nh4_c_dummy"
new_metabolite.compartment = "c" # note, the compartment MUST be the same to prevent false positives of metabolites in different compartments
new_metabolite.annotations["inchi_key"] = ["QGZKDVFQNNGYKY-UHFFFAOYSA-O"]
new_metabolite.annotations["hmdb"] = ["1234"]
new_metabolite

[36mMetabolite.id: [95mnh4_c_dummy
[36mMetabolite.name: [95m---
[36mMetabolite.formula: [95m---
[36mMetabolite.charge: [95m---
[36mMetabolite.compartment: [95mc
[36mMetabolite.notes: [90m---
[36mMetabolite.annotations: 
[95m	inchi_key: QGZKDVFQNNGYKY-UHFFF...
[95m	hmdb: ["1234"]


In [93]:
overlap_id = check_duplicate_annotations(new_metabolite, model.metabolites) # overlap detected!

The `check_duplicate_annotations` function can also be used on reactions and genes.

### Using the internals of `StandardModel`s: `check_duplicate_reaction`

Another useful function is `check_duplicate_reaction`.

In [96]:
?check_duplicate_reaction

search: [0m[1mc[22m[0m[1mh[22m[0m[1me[22m[0m[1mc[22m[0m[1mk[22m[0m[1m_[22m[0m[1md[22m[0m[1mu[22m[0m[1mp[22m[0m[1ml[22m[0m[1mi[22m[0m[1mc[22m[0m[1ma[22m[0m[1mt[22m[0m[1me[22m[0m[1m_[22m[0m[1mr[22m[0m[1me[22m[0m[1ma[22m[0m[1mc[22m[0m[1mt[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m



```
check_duplicate_reaction(rxn::Reaction, rxns::Dict{String, Reaction})
```

Check if `rxn` already exists in `rxns` but has another `id`. Looks through all the reaction equations of `rxns` and compares metabolite `id`s  and their stoichiometric coefficients to those of `rxn`. If `rxn` has the same reaction equation as another reaction in `rxns`, the return the `id`. Otherwise return `nothing`.

See also: [`is_mass_balanced`](@ref)


For example, suppose a model is reconstructed from a database that has the same reaction listed twice but under different ids. The function `check_duplicate_reaction` can be used to identify these cases.

In [103]:
pgm_duplicate = Reaction()
pgm_duplicate.id = "pgm2" # Phosphoglycerate mutase
pgm_duplicate.metabolites = Dict{String, Float64}("3pg_c" => 1, "2pg_c" => -1)
pgm_duplicate

[36mReaction.id: [95mpgm2
[36mReaction.name: [95m---
[36mReaction.metabolites: [95m1.0 2pg_c ⟷  1.0 3pg_c
[36mReaction.lb: [95m-1000.0
[36mReaction.ub: [95m1000.0
[36mReaction.grr: [95m---
[36mReaction.subsystem: [95m---
[36mReaction.notes: [90m---
[36mReaction.annotations: [90m---
[36mReaction.objective_coefficient: [95m0.0


In [104]:
check_duplicate_reaction(pgm_duplicate, model.reactions)

"PGM"

### Using the internals of `StandardModel`s: `is_mass_balanced`

Finally, `is_mass_balanced` can be used to check if a reaction is mass balanced.

In [107]:
?is_mass_balanced

search: [0m[1mi[22m[0m[1ms[22m[0m[1m_[22m[0m[1mm[22m[0m[1ma[22m[0m[1ms[22m[0m[1ms[22m[0m[1m_[22m[0m[1mb[22m[0m[1ma[22m[0m[1ml[22m[0m[1ma[22m[0m[1mn[22m[0m[1mc[22m[0m[1me[22m[0m[1md[22m



```
is_mass_balanced(rxn::Reaction, model::StandardModel)
```

Checks if `rxn` is atom balanced. Returns a boolean for whether the reaction is balanced, and the associated balance of atoms for convenience (useful if not balanced).

See also: [`get_atoms`](@ref), [`check_duplicate_reaction`](@ref)


In [108]:
pgm_duplicate.metabolites = Dict{String, Float64}("3pg_c" => 1, "2pg_c" => -1, "h2o_c"=>1) # not mass balanced now

is_mass_balanced(pgm_duplicate, model)

(false, Dict("C" => 0.0, "P" => 0.0, "H" => 2.0, "O" => 1.0))