## Tutorial: Using `StandardModel`

In this tutorial we will use `COBREXA`'s `StandardModel` and functions that specifically operate on it. As usual we will use the toy model of *E. coli* for demonstration.

Let's first load the model.

In [3]:
# download file if it is not already present
!isfile("e_coli_core.json") && download("http://bigg.ucsd.edu/static/models/e_coli_core.json", "e_coli_core.json")

using COBREXA

model = load_model(StandardModel, "e_coli_core.json") # we specifically want to load a StandardModel

[36m[95mMetabolic model of type StandardModel
[95m
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⢄⠀⠀⠀⠀⠀⠀⠀⠈⠶⠴⡆⠀⠀⠀⠀⠀⠀
⡀⢐⣀⢀⡀⡒⢒⣐⠀⣂⣂⠀⣂⣂⢂⠀⢀⠀⠀⠀⠀⠀⢀⠄⠀⠀⠀⢂⠀⢂⣀⣐⡒⡀⠆⢙⣀⠀⡀⠀
⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠀⠀⠀⠰⠀⠀⠀⠀⠀⠀⠀⠀⠀⠠⠀⠀⠀⠀⠀⡀⠀⠀⠀⠀⠈⢑⣀⣀⠀⠀
⠀⠀⠃⠀⠃⠀⠀⠀⠘⠀⡇⠀⠀⠀⠀⠀⢸⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠀⠀⡜⠀⡄⣤⢠⠘⠙⢣⡇⠘
⠀⠐⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠐⠀⠀⠀⠀⠀⠐⠁⠉⠀⠀⠀⠀⠀⠘⠄
⠀⢐⠀⠂⠀⠄⠠⠠⠀⠠⠆⠀⠄⠀⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠠⠀⠠⠀⠀⢀⠀⠀⠠⠀⠀⠁
⢀⠐⠀⠨⢀⠁⠈⣈⠀⢁⣁⠀⠀⠀⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄⠀⠁⢀⠀⢊⠉⠀⠀⠀⢀⠀⣀⠀⢀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡈⠀⡀⠆⠀⠆⠀⡀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠆⠀
⠀⠀⠂⠀⡂⠀⠀⠁⠀⠀⠀⠈⠁⠀⠀⠀⠄⠄⢁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠀⠀⠀⠀
⠈⠀⠁⠀⠀⢀⡀⠀⠠⠁⠁⠀⠑⠀⠐⠲⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂⠀⠂⠀⠀⠀⠀⠀⠀⠊⠀⠀⠀⠈
⠄⠠⢠⠀⠰⠀⠠⠀⠤⠦⠄⠈⠀⠀⠀⠠⠀⠁⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠤⠄⠄⠠⠀⠀⠀⠀⠀
⠂⠐⠀⠀⠐⡠⢐⠘⢃⠒⠂⡀⠄⠀⠀⠐⠀⠀⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠒⠀⢀⢀⠀⠀⣀⠀⢀
⠈⠀⠁⠀⡀⠀⠀⠀⠈⠁⠅⠀⠁⠀⢀⠈⠄⠔⠀⠀⠀⠀⠀⠀⠀⠀⠐⠀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠀⠈
⠣⠁⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠈⠀⠁⠁⠀⠈⡀⠀⠀⠀⠀⠀⠐⢣⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⡄⠀⠀⠀⠀⠂⠄⠤⠀⠀⠈⠂⠀⠀⠀⠀⠠⠀⠊⠒⣠⠀⠀⠀⠀⠀⠀⠀⠀⠀
[36mNumber of reactions: [95m95
[36mNumber of metabolites: [95m72


### Basic analysis

As before, for optimization based analysis we need to load a solver. Here we will use [`Tulip.jl`](https://github.com/ds4dm/Tulip.jl) to solve linear programs. Refer the the flux balance analysis tutorial if you are confused by any functions in this section.

All the normal analysis functions work on `StandardModel`, due to it also having the same generic accessor interface as all the other model types.

In [7]:
using Tulip

dict_sol = flux_balance_analysis_dict(
    model,
    Tulip.Optimizer;
    modifications = [
        change_objective("BIOMASS_Ecoli_core_w_GAM"),
        change_constraint("EX_glc__D_e", -12, -12),
        change_constraint("EX_o2_e", 0, 0),
        ],
)

Dict{String, Float64} with 95 entries:
  "ACALD"       => -9.78427
  "PTAr"        => 10.0729
  "ALCD2x"      => -9.78427
  "PDH"         => 1.98388e-9
  "PYK"         => 9.94501
  "CO2t"        => 0.487021
  "EX_nh4_e"    => -1.48633
  "MALt2_2"     => -0.0
  "CS"          => 0.294088
  "PGM"         => -22.8676
  "TKT1"        => -0.0487648
  "EX_mal__L_e" => -0.0
  "ACONTa"      => 0.294088
  "EX_pi_e"     => -1.00274
  "GLNS"        => 0.069699
  "ICL"         => 5.34951e-11
  "EX_o2_e"     => -0.0
  "FBA"         => 11.7289
  "EX_gln__L_e" => -0.0
  "EX_glc__D_e" => -12.0
  "SUCCt3"      => 9.36957e-10
  "FORt2"       => 6.1847e-10
  "G6PDH2r"     => 4.23233e-9
  "AKGDH"       => 5.31373e-11
  "TKT2"        => -0.147167
  ⋮             => ⋮

This is not very exciting yet, since every other model type can do this. However, deeper inspection of flux results is possible when using `StandardModel`. 

It is sometimes interesting to keep track of the atoms entering and leaving the system, this can be inspected by calling `atom_exchange`.

In [48]:
?atom_exchange

search: [0m[1ma[22m[0m[1mt[22m[0m[1mo[22m[0m[1mm[22m[0m[1m_[22m[0m[1me[22m[0m[1mx[22m[0m[1mc[22m[0m[1mh[22m[0m[1ma[22m[0m[1mn[22m[0m[1mg[22m[0m[1me[22m



```
atom_exchange(flux_dict::Dict{String, Float64}, model::StandardModel)
```

Return a dictionary mapping the flux of atoms across the boundary of the model given `flux_dict` of reactions in `model`. Here `flux_dict` is a mapping of reaction `id`s to fluxes, e.g. from FBA.


In [52]:
atom_exchange(dict_sol, model) # flux of individual atoms entering and leaving the system through boundary reactions (e.g. exchange reactions)

Dict{String, Float64} with 5 entries:
  "C" => -11.5998
  "N" => -1.48633
  "P" => -1.00274
  "H" => -20.7086
  "O" => -12.995

It is also sometimes useful to inspect the exchange reactions used by a flux solution. The function `exchange_reactions` fulfills this purpose.

In [56]:
?exchange_reactions

search: [0m[1me[22m[0m[1mx[22m[0m[1mc[22m[0m[1mh[22m[0m[1ma[22m[0m[1mn[22m[0m[1mg[22m[0m[1me[22m[0m[1m_[22m[0m[1mr[22m[0m[1me[22m[0m[1ma[22m[0m[1mc[22m[0m[1mt[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m[0m[1ms[22m find_[0m[1me[22m[0m[1mx[22m[0m[1mc[22m[0m[1mh[22m[0m[1ma[22m[0m[1mn[22m[0m[1mg[22m[0m[1me[22m[0m[1m_[22m[0m[1mr[22m[0m[1me[22m[0m[1ma[22m[0m[1mc[22m[0m[1mt[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m[0m[1ms[22m



```
get_exchanges(rxndict::Dict{String, Float64}; top_n=Inf, ignorebound=_constants.default_reaction_bound, verbose=true)
```

Display the top*n producing and consuming exchange fluxes. If `top*n`is not specified (by an integer), then all are displayed. Ignores infinite (problem upper/lower bound) fluxes (set with ignorebound). When`verbose`is false, the output is not printed out. Return these reactions (id => ) in two dictionaries:`consuming`,`producing`


In [55]:
consuming, producing = exchange_reactions(dict_sol, model; top_n = 4)

Consuming fluxes: 


LoadError: UndefVarError: v not defined

Another useful flux result analysis function is `metabolite_fluxes`. 

In [57]:
?metabolite_fluxes

search: [0m[1mm[22m[0m[1me[22m[0m[1mt[22m[0m[1ma[22m[0m[1mb[22m[0m[1mo[22m[0m[1ml[22m[0m[1mi[22m[0m[1mt[22m[0m[1me[22m[0m[1m_[22m[0m[1mf[22m[0m[1ml[22m[0m[1mu[22m[0m[1mx[22m[0m[1me[22m[0m[1ms[22m [0m[1mm[22m[0m[1me[22m[0m[1mt[22m[0m[1ma[22m[0m[1mb[22m[0m[1mo[22m[0m[1ml[22m[0m[1mi[22m[0m[1mt[22m[0m[1me[22m[0m[1m_[22m[0m[1mf[22mormu[0m[1ml[22ma



```
metabolite_fluxes(fluxdict::Dict{String, Float64}, model::StandardModel)
```

Return two dictionaries of metabolite `id`s mapped to reactions that consume or  produce them given the flux distribution supplied in `fluxdict`.


In [62]:
consuming, producing = metabolite_fluxes(dict_sol, model)

consuming["atp_c"] # try producing["atp_c"]

Dict{String, Float64} with 5 entries:
  "PFK"                      => -11.7289
  "BIOMASS_Ecoli_core_w_GAM" => -16.3031
  "GLNS"                     => -0.069699
  "ATPM"                     => -8.39
  "ATPS4r"                   => -6.80168

Another benefit of `StandardModel` is that it supports a richer infrastructure of types that can be used to manipulate internal model attributes, like the genes, reactions, and metabolites of a model. This is particularly useful when modifying or even constructing a model from scratch.

Let's investigate the structure of a `StandardModel`.

### Internals of `StandardModel`

`StandardModel` is composed of ordered dictionaries of `Gene`s, `Metabolite`s and `Reaction`s. Ordered dictionaries are used because the order of the reactions and metabolites are important for constructing a stoichiometric matrix where the rows and columns correspond to the order of the metabolites and reactions returned by calling the accessors `metabolites` and `reactions`.

Each `StandardModel` is composed of the following fields:

In [13]:
fieldnames(StandardModel) # fields of a StandardModel

(:id, :reactions, :metabolites, :genes)

The `genes` field of a `StandardModel` contains an ordered dictionary of gene ids mapped to `Gene`s. 

In [15]:
model.genes

OrderedCollections.OrderedDict{String, Gene} with 137 entries:
  "b1241" => Gene("b1241", nothing, Dict("original_bigg_ids"=>["b1241"]), Dict(…
  "b0351" => Gene("b0351", nothing, Dict("original_bigg_ids"=>["b0351"]), Dict(…
  "s0001" => Gene("s0001", nothing, Dict("original_bigg_ids"=>["s0001"]), Dict(…
  "b1849" => Gene("b1849", nothing, Dict("original_bigg_ids"=>["b1849"]), Dict(…
  "b3115" => Gene("b3115", nothing, Dict("original_bigg_ids"=>["b3115"]), Dict(…
  "b2296" => Gene("b2296", nothing, Dict("original_bigg_ids"=>["b2296"]), Dict(…
  "b1276" => Gene("b1276", nothing, Dict("original_bigg_ids"=>["b1276"]), Dict(…
  "b0118" => Gene("b0118", nothing, Dict("original_bigg_ids"=>["b0118"]), Dict(…
  "b0474" => Gene("b0474", nothing, Dict("original_bigg_ids"=>["b0474"]), Dict(…
  "b0116" => Gene("b0116", nothing, Dict("original_bigg_ids"=>["b0116"]), Dict(…
  "b0727" => Gene("b0727", nothing, Dict("original_bigg_ids"=>["b0727"]), Dict(…
  "b0726" => Gene("b0726", nothing, Dict("orig

The `Gene` type is a struct that can be used to store information about genes in a `StandardModel`. The keys used in the ordered dictionaries in `model.genes` are the ids returned using the generic accessor `genes`. `Gene`s have pretty printing, as demonstrated below for a random gene drawn from the model

In [18]:
random_gene_id = genes(model)[rand(1:n_genes(model))]
model.genes[random_gene_id]

[36mGene.id: [95mb2278
[36mGene.name: [95m---
[36mGene.notes: 
[95m	original_bigg_ids: ["b2278"]
[36mGene.annotations: 
[95m	sbo: ["SBO:0000243"]
[95m	uniprot: ["P33607"]
[95m	ecogene: ["EG12092"]
[95m	ncbigene: ["945540"]
[95m	ncbigi: ["16130213"]
[95m	refseq_locus_tag: ["b2278"]
[95m	refseq_name: ["nuoL"]
[95m	asap: ["ABE-0007532"]
[95m	refseq_synonym: ["ECK2272", "JW2273"]


The same idea holds for both metabolites (stored as `Metabolite`s) and reactions (stored as `Reaction`s). This is demonstrated below.

In [19]:
random_metabolite_id = metabolites(model)[rand(1:n_metabolites(model))]
model.metabolites[random_metabolite_id]

[36mMetabolite.id: [95mxu5p__D_c
[36mMetabolite.name: [95m---
[36mMetabolite.formula: [95mC5P1H9O8
[36mMetabolite.charge: [95m-2
[36mMetabolite.compartment: [95mc
[36mMetabolite.notes: 
[95m	original_bigg_ids: ["xu5p_D_c"]
[36mMetabolite.annotations: 
[95m	sabiork: ["1317"]
[95m	kegg.compound: ["C00231"]
[95m	sbo: ["SBO:0000247"]
[95m	biocyc: META:XYLULOSE-5-PHOS...
[95m	chebi: CHEBI:13036, ..., CHEBI:21121
[95m	metanetx.chemical: ["MNXM186"]
[95m	inchi_key: FNZLKVNUWIIPSJ-RFZPG...
[95m	hmdb: ["HMDB06212", "HMDB00868"]
[95m	bigg.metabolite: ["xu5p__D"]
[95m	seed.compound: ["cpd00198"]
[95m	reactome.compound: ["29790"]


In [21]:
random_reaction_id = reactions(model)[rand(1:n_reactions(model))]
model.reactions[random_reaction_id]

[36mReaction.id: [95mPPC
[36mReaction.name: [95m---
[36mReaction.metabolites: [95m1.0 h2o_c + 1.0 co2_c + 1.0 pep_c ⟶  1.0 oaa_c + 1.0 pi_c + 1.0 h_c
[36mReaction.lb: [95m0.0
[36mReaction.ub: [95m1000.0
[36mReaction.grr: [95m(b3956)
[36mReaction.subsystem: [95mAnaplerotic reactions
[36mReaction.notes: 
[95m	original_bigg_ids: ["PPC"]
[36mReaction.annotations: 
[95m	bigg.reaction: ["PPC"]
[95m	sabiork: ["150"]
[95m	metanetx.reaction: ["MNXR103096"]
[95m	rhea: 23073, ..., 23074
[95m	sbo: ["SBO:0000176"]
[95m	seed.reaction: ["rxn00251"]
[95m	kegg.reaction: ["R00345"]
[95m	ec-code: ["4.1.1.31"]
[36mReaction.objective_coefficient: [95m0.0


### Using the internals of `StandardModel`s

`StandardModel` can be used to build your own metabolic model or modify an existing one. One of the main use cases for `StandardModel` is that it can be used to merge multiple models together. Since the internals are uniform inside each `StandardModel`, attributes of other model types are squashed into the required format. This ensures that the internals of all `StandardModel`s are the same - allowing easy systematic evaluation.

For example, often when models are automatically reconstructed duplicate genes, reactions or metabolites end up in a model. `COBREXA` exports `check_duplicate_annotations` to check for cases where the id may be different but the annotations the same.

In [22]:
?check_duplicate_annotations

search: [0m[1mc[22m[0m[1mh[22m[0m[1me[22m[0m[1mc[22m[0m[1mk[22m[0m[1m_[22m[0m[1md[22m[0m[1mu[22m[0m[1mp[22m[0m[1ml[22m[0m[1mi[22m[0m[1mc[22m[0m[1ma[22m[0m[1mt[22m[0m[1me[22m[0m[1m_[22m[0m[1ma[22m[0m[1mn[22m[0m[1mn[22m[0m[1mo[22m[0m[1mt[22m[0m[1ma[22m[0m[1mt[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m[0m[1ms[22m



```
check_duplicate_annotations(gene::Gene, genes::Dict{String, Gene}; inspect_annotations...)
```

Determine if `gene` has any overlapping annotations in `genes`. The annotations checked are: `inspect_annotations = ["ncbigene", "ncbigi", "refseq_locus_tag",  "refseq_name", "refseq_synonym", "uniprot"]`. Return the `id` of the gene with duplicate annotations in `genes`. If no annotation overlap is found, return `nothing`.

---

```
check_duplicate_annotations(met::Metabolite, mets::Vector{Metabolite}; inspect_annotations=...)
```

Check if a metabolite `met` has overlapping annotations with metabolites in `mets`. If the annotations overlap, then check if they share a compartment to determine if it a a true duplicate. The annotations checked are: ["kegg.compound", "bigg.metabolite", "chebi", "inchi_key", "sabiork", "hmdb",  "seed.compound", "metanetx.chemical", "reactome.compound", "biocyc"]. Return index of the first hit, otherwise `nothing`.

See also: [`check_same_formula`](@ref), [`get_atoms`](@ref)

---

```
check_duplicate_annotations(rxn::Reaction, rxns::Dict{String, Reaction})
```

Determine if a `rxn` is has overlapping annotations in `rxns`. The annotations checked are: ["bigg.reaction", "biocyc", "ec-code", "kegg.reaction", "metanetx.reaction", "rhea", "sabiork", "seed.reaction"]. Return true and the `id` of the first hit, otherwise false and "".


For example, suppose we want to check if a metabolite already exists in the model (but has another id). Checking for unique formulas is not a good way to do this since many metabolites share the same formulas (the bonds may be different though). However, checking annotation details, e.g. inchi_keys, etc., is a more robust way for indentifying overlapping metabolites.

Here we will check if a dummy created metabolite already exists in the model by only checking if any annotation details overlap. 

In [42]:
new_metabolite = Metabolite() # construct a dummy metabolite
new_metabolite.id = "nh4_c_dummy"
new_metabolite.compartment = "c" # note, the compartment MUST be the same to prevent false positives of metabolites in different compartments
new_metabolite.annotations["inchi_key"] = ["QGZKDVFQNNGYKY-UHFFFAOYSA-O"]
new_metabolite.annotations["hmdb"] = ["1234"]
new_metabolite

[36mMetabolite.id: [95mnh4_c_dummy
[36mMetabolite.name: [95m---
[36mMetabolite.formula: [95m---
[36mMetabolite.charge: [95m---
[36mMetabolite.compartment: [95mc
[36mMetabolite.notes: [90m---
[36mMetabolite.annotations: 
[95m	inchi_key: QGZKDVFQNNGYKY-UHFFF...
[95m	hmdb: ["1234"]


In [44]:
check_duplicate_annotations(new_metabolite, model.metabolites) # overlap detected!

"nh4_c"

The `check_duplicate_annotations` function can also be used on reactions and genes.

Another useful function is `check_duplicate_reaction`.

In [45]:
?check_duplicate_reaction

search: [0m[1mc[22m[0m[1mh[22m[0m[1me[22m[0m[1mc[22m[0m[1mk[22m[0m[1m_[22m[0m[1md[22m[0m[1mu[22m[0m[1mp[22m[0m[1ml[22m[0m[1mi[22m[0m[1mc[22m[0m[1ma[22m[0m[1mt[22m[0m[1me[22m[0m[1m_[22m[0m[1mr[22m[0m[1me[22m[0m[1ma[22m[0m[1mc[22m[0m[1mt[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m



```
check_duplicate_reaction(rxn::Reaction, rxns::Dict{String, Reaction})
```

Check if `rxn` already exists in `rxns` but has another `id`. Looks through all the reaction equations of `rxns` and compares metabolite `id`s  and their stoichiometric coefficients to those of `rxn`. If `rxn` has the same reaction equation as another reaction in `rxns`, the return the `id`. Otherwise return `nothing`.

See also: [`is_mass_balanced`](@ref)


For example

search: [0m[1mi[22m[0m[1ms[22m[0m[1m_[22m[0m[1mm[22m[0m[1ma[22m[0m[1ms[22m[0m[1ms[22m[0m[1m_[22m[0m[1mb[22m[0m[1ma[22m[0m[1ml[22m[0m[1ma[22m[0m[1mn[22m[0m[1mc[22m[0m[1me[22m[0m[1md[22m



```
is_mass_balanced(rxn::Reaction, model::StandardModel)
```

Checks if `rxn` is atom balanced. Returns a boolean for whether the reaction is balanced, and the associated balance of atoms for convenience (useful if not balanced).

See also: [`get_atoms`](@ref), [`check_duplicate_reaction`](@ref)
