# Basic usage of `StandardModel`

In this tutorial we will use `COBREXA`'s `StandardModel` and functions that
specifically operate on it. As usual we will use the toy model of *E. coli*
for demonstration.

In [1]:
!isfile("e_coli_core.json") &&
    download("http://bigg.ucsd.edu/static/models/e_coli_core.json", "e_coli_core.json")

using COBREXA

## Loading a model in the StandardModel format

In [2]:
model = load_model(StandardModel, "e_coli_core.json") # we specifically want to load a StandardModel from the model file

Metabolic model of type StandardModel
sparse([9, 51, 55, 64, 65, 34, 44, 59, 66, 64  …  20, 22, 23, 25, 16, 17, 34, 44, 57, 59], [1, 1, 1, 1, 1, 2, 2, 2, 2, 3  …  93, 93, 94, 94, 95, 95, 95, 95, 95, 95], [1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, -1.0, 1.0, 1.0  …  1.0, -1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, -1.0], 72, 95)
Number of reactions: 95
Number of metabolites: 72


When using `load_model(StandardModel, file_location)` the model at
`file_location` is first loaded into its inferred format and is then
converted to a `StandardModel` using the generic accessor interface.
Thus, data loss may occur. Always check your model to ensure that
nothing important has been lost.

## Internals of `StandardModel`

A benefit of `StandardModel` is that it supports a richer internal
infrastructure that can be used to manipulate internal model attributes in a
systematic way. Specifically, the genes, reactions, and metabolites with of a
model each have a type. This is particularly useful when modifying or even
constructing a model from scratch.

## `Gene`s, `Reaction`s, and `Metabolite`s

`StandardModel` is composed of ordered dictionaries of `Gene`s, `Metabolite`s
and `Reaction`s. Ordered dictionaries are used because the order of the
reactions and metabolites are important for constructing a stoichiometric
matrix since the rows and columns should correspond to the order of the metabolites
and reactions returned by calling the accessors `metabolites` and `reactions`.

Each `StandardModel` is composed of the following fields:

In [3]:
fieldnames(StandardModel) # fields of a StandardModel

(:id, :reactions, :metabolites, :genes)

The `:genes` field of a `StandardModel` contains an ordered dictionary of gene ids mapped to `Gene`s.

In [4]:
model.genes # the keys of this dictionary are the same as genes(model)

OrderedCollections.OrderedDict{String, Gene} with 137 entries:
  "b1241" => Gene("b1241", "adhE", Dict("original_bigg_ids"=>["b1241"]), Dict("…
  "b0351" => Gene("b0351", "mhpF", Dict("original_bigg_ids"=>["b0351"]), Dict("…
  "s0001" => Gene("s0001", "", Dict("original_bigg_ids"=>["s0001"]), Dict("sbo"…
  "b1849" => Gene("b1849", "purT", Dict("original_bigg_ids"=>["b1849"]), Dict("…
  "b3115" => Gene("b3115", "tdcD", Dict("original_bigg_ids"=>["b3115"]), Dict("…
  "b2296" => Gene("b2296", "ackA", Dict("original_bigg_ids"=>["b2296"]), Dict("…
  "b1276" => Gene("b1276", "acnA", Dict("original_bigg_ids"=>["b1276"]), Dict("…
  "b0118" => Gene("b0118", "acnB", Dict("original_bigg_ids"=>["b0118"]), Dict("…
  "b0474" => Gene("b0474", "adk", Dict("original_bigg_ids"=>["b0474"]), Dict("s…
  "b0116" => Gene("b0116", "lpd", Dict("original_bigg_ids"=>["b0116"]), Dict("s…
  "b0727" => Gene("b0727", "sucB", Dict("original_bigg_ids"=>["b0727"]), Dict("…
  "b0726" => Gene("b0726", "sucA", Dict("origi

The `Gene` type is a struct that can be used to store information about genes
in a `StandardModel`. Each `Gene` is composed of the following fields:

In [5]:
fieldnames(Gene)

(:id, :name, :notes, :annotations)

Use <tab> to quickly explore the fields of a struct. For example,
Gene.<tab> will list all the fields shown above.

The keys used in the ordered dictionaries in
`model.genes` are the ids returned using the generic accessor `genes`. `Gene`s
have pretty printing, as demonstrated below for a random gene drawn from the
model:

In [6]:
random_gene_id = genes(model)[rand(1:n_genes(model))]
model.genes[random_gene_id]

Gene.id: b1603
Gene.name: pntA
Gene.notes: 
	original_bigg_ids: ["b1603"]
Gene.annotations: 
	sbo: ["SBO:0000243"]
	uniprot: ["P07001"]
	ecogene: ["EG10744"]
	ncbigene: ["946628"]
	ncbigi: ["16129561"]
	refseq_locus_tag: ["b1603"]
	refseq_name: ["pntA"]
	asap: ["ABE-0005354"]
	refseq_synonym: ["JW1595", "ECK1598"]


The same idea holds for both metabolites (stored as `Metabolite`s) and
reactions (stored as `Reaction`s). This is demonstrated below.

In [7]:
random_metabolite_id = metabolites(model)[rand(1:n_metabolites(model))]
model.metabolites[random_metabolite_id]

Metabolite.id: akg_c
Metabolite.name: 2-Oxoglutarate
Metabolite.formula: C5H4O5
Metabolite.charge: -2
Metabolite.compartment: c
Metabolite.notes: 
	original_bigg_ids: ["akg_c"]
Metabolite.annotations: 
	envipath: 32de3cf4-e3e6-4168-9...
	sabiork: ["1922"]
	kegg.compound: ["C00026"]
	sbo: ["SBO:0000247"]
	biocyc: ["META:2-KETOGLUTARATE"]
	chebi: CHEBI:40661, ..., CHEBI:19748
	metanetx.chemical: ["MNXM20"]
	inchi_key: KPGXRSRHYNQIFN-UHFFF...
	hmdb: HMDB00208, ..., HMDB62781
	bigg.metabolite: ["akg"]
	seed.compound: ["cpd00024"]
	reactome.compound: 113594, ..., 389537


In [8]:
random_reaction_id = reactions(model)[rand(1:n_reactions(model))]
model.reactions[random_reaction_id]

Reaction.id: GLUSy
Reaction.name: Glutamate synthase (NADPH)
Reaction.metabolites: 1.0 nadph_c + 1.0 gln__L_c + 1.0 akg_c + 1.0 h_c →  2.0 glu__L_c + 1.0 nadp_c
Reaction.lb: 0.0
Reaction.ub: 1000.0
Reaction.grr: (b3212 && b3213)
Reaction.subsystem: Glutamate Metabolism
Reaction.notes: 
	original_bigg_ids: ["GLUSy"]
Reaction.annotations: 
	bigg.reaction: ["GLUSy"]
	sabiork: ["694"]
	metanetx.reaction: ["MNXR100291"]
	rhea: 15503, ..., 15502
	sbo: ["SBO:0000176"]
	seed.reaction: ["rxn00085"]
	kegg.reaction: ["R00114"]
	biocyc: META:GLUTAMATESYN-RX...
	ec-code: ["1.4.1.13"]
Reaction.objective_coefficient: 0.0


`StandardModel` can be used to build your own metabolic model or modify an
existing one. One of the main use cases for `StandardModel` is that it can be
used to merge multiple models or parts of multiple models together. Since the
internals are uniform inside each `StandardModel`, attributes of other model
types are squashed into the required format (using the generic accessors).
This ensures that the internals of all `StandardModel`s are the same -
allowing easy systematic evaluation.

## Checking the internals of `StandardModel`s: `annotation_index`

Often when models are automatically reconstructed duplicate genes, reactions
or metabolites end up in a model. `COBREXA` exports `annotation_index` to
check for cases where the id of a struct may be different, but the annotations
the same (possibly suggesting a duplication). `annotation_index` builds a
dictionary mapping annotation features to the ids of whatever struct you are
inspecting. This makes it easy to find structs that share certain annotation features.

In [9]:
rxn_annotations = annotation_index(model.reactions)

Dict{String, Dict{String, Set{String}}} with 10 entries:
  "ec-code"           => Dict("3.6.3.37"=>Set(["ATPM"]), "3.6.3.42"=>Set(["ATPM…
  "sabiork"           => Dict("109"=>Set(["PGL"]), "762"=>Set(["GLUN"]), "155"=…
  "metanetx.reaction" => Dict("MNXR104869"=>Set(["TKT2"]), "MNXR99715"=>Set(["E…
  "rhea"              => Dict("27626"=>Set(["TKT2"]), "10229"=>Set(["ACONTa"]),…
  "sbo"               => Dict("SBO:0000627"=>Set(["EX_for_e", "EX_nh4_e", "EX_p…
  "seed.reaction"     => Dict("rxn05297"=>Set(["GLUt2r"]), "rxn09717"=>Set(["PY…
  "kegg.reaction"     => Dict("R00114"=>Set(["GLUSy"]), "R00199"=>Set(["PPS"]),…
  "biocyc"            => Dict("META:TRANS-RXN-121B"=>Set(["FUMt2_2"]), "META:PE…
  "reactome.reaction" => Dict("R-TGU-71397"=>Set(["PDH"]), "R-XTR-70449"=>Set([…
  "bigg.reaction"     => Dict("ACALD"=>Set(["ACALD"]), "PTAr"=>Set(["PTAr"]), "…

In [10]:
rxn_annotations["ec-code"]

Dict{String, Set{String}} with 141 entries:
  "3.6.3.37" => Set(["ATPM"])
  "3.6.3.42" => Set(["ATPM"])
  "3.6.3.38" => Set(["ATPM"])
  "3.6.3.19" => Set(["ATPM"])
  "2.3.3.1"  => Set(["CS"])
  "1.6.1.2"  => Set(["NADTRHD"])
  "3.6.3.35" => Set(["ATPM"])
  "6.2.1.5"  => Set(["SUCOAS"])
  "6.3.5.4"  => Set(["GLUN"])
  "3.6.3.49" => Set(["ATPM"])
  "3.6.3.51" => Set(["ATPM"])
  "1.2.1.12" => Set(["GAPD"])
  "3.6.3.32" => Set(["ATPM"])
  "2.3.3.3"  => Set(["CS"])
  "2.7.4.3"  => Set(["ADK1"])
  "6.3.5.5"  => Set(["GLUN"])
  "3.5.1.2"  => Set(["GLUN"])
  "1.1.1.49" => Set(["G6PDH2r"])
  "5.3.1.9"  => Set(["PGI"])
  ⋮          => ⋮

The `annotation_index` function can also be used on `Reaction`s and
`Gene`s in the same way.

## Checking the internals of `StandardModel`s: `check_duplicate_reaction`

Another useful function is `check_duplicate_reaction`, which checks for
reactions that have duplicate (or similar) reaction equations.

In [11]:
pgm_duplicate = Reaction()
pgm_duplicate.id = "pgm2" # Phosphoglycerate mutase
pgm_duplicate.metabolites = Dict{String,Float64}("3pg_c" => 1, "2pg_c" => -1)
pgm_duplicate

Reaction.id: pgm2
Reaction.name: ---
Reaction.metabolites: 1.0 2pg_c ↔  1.0 3pg_c
Reaction.lb: -1000.0
Reaction.ub: 1000.0
Reaction.grr: ---
Reaction.subsystem: ---
Reaction.notes: ---
Reaction.annotations: ---
Reaction.objective_coefficient: 0.0


In [12]:
check_duplicate_reaction(pgm_duplicate, model.reactions; only_metabolites = false) # can also just check if only the metabolites are the same but different stoichiometry is used

"PGM"

## Checking the internals of `StandardModel`s: `reaction_mass_balanced`

Finally, `reaction_mass_balanced` can be used to check if a reaction is mass
balanced based on the formulas of the reaction equation.

In [13]:
rxn_dict = Dict{String,Float64}("3pg_c" => 1, "2pg_c" => -1, "h2o_c" => 1)
reaction_mass_balanced(model, rxn_dict)

false

Now to determine which atoms are unbalanced, you can use `reaction_atom_balance`

In [14]:
reaction_atom_balance(model, rxn_dict)

Dict{String, Float64} with 4 entries:
  "C" => 0.0
  "P" => 0.0
  "H" => 2.0
  "O" => 1.0

Note, since `pgm_duplicate` is not in the model, we cannot use the other variants of this
function because they find the reaction equation stored inside the `model`.

---

*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*