Merge 342bfcc into 5ca0648

alanderos91 · Nov 12, 2018 · bd4e6cb · bd4e6cb
2 parents 5ca0648 + 342bfcc
commit bd4e6cb
Show file tree

Hide file tree

Showing 91 changed files with 13,130 additions and 5 deletions.
diff --git a/benchmarks/.DS_Store b/benchmarks/.DS_Store
diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -0,0 +1,165 @@
+# Benchmarks
+
+This folder contains the benchmark code used in **BioSimulator.jl: Stochastic simulation in Julia**.
+It compares BioSimulator.jl, [StochPy](http://stochpy.sourceforge.net/), [StochKit2](https://sourceforge.net/projects/stochkit/), and [Gillespie.jl](https://github.com/sdwfrost/Gillespie.jl) across a few examples. These benchmarks are not exhaustive.
+
+# What are the benchmarks?
+
+These scripts measure the performance of Gillespie's Direct method (SSA) across each software tool.
+In addition, the BioSimulator.jl benchmarks include additional results for the algorithms currently implemented (SSA, FRM, NRM, OTL, and SAL) using full simulation output, fixed-interval output, and multi-threading.
+There are five basic models included in this suite (see [model definitions](#Model-definitons)).
+
+The *task* that is benchmarked is
+
+```
+run-ssa model t_final n_saves n_trial
+```
+
+where `run-ssa` is the command that runs SSA for a given software package.
+Here `t_final` specifies a time span for an individual realization of the stochastic process, `n_saves` is the number of time points saved for a single realization, and `n_trial` is the number of realizations to simulate.
+The time to complete this task is sampled several times to obtain reasonable estimates of summary statistics.
+See the section on [benchmark parameters](#Benchmark-parameters) for additional details.
+
+**The benchmark task we have defined reflects how quickly a given tool generates multiple time trajectories of a given stochastic process using SSA.**
+"Good performance" depends on how well the following interact:
+
+- internal model representation/construction
+- algorithm implementation
+- simulation engine(s)
+- individual output generation/storage
+- ensemble output generation/storage
+
+With the exception of `StochPy`, performance is measured using [BenchmarkTools.jl](https://github.com/JuliaCI/BenchmarkTools.jl).
+
+# Minimal requirements
+
+| Software        | Version | Dependencies               |
+|-----------------|---------|----------------------------|
+| BioSimulator.jl | 0.4     | Julia v0.6.*               |
+| StochPy         | 2.3     | Python 2.6+ or 3.4+, NumPy |
+| StochKit2       | 2.0.13  |                            |
+| Gillespie.jl    | 0.0.2   |                            |
+
+**Note**: The StochPy scripts use `python3` by default.
+
+# Running the benchmarks
+
+## Setup
+
+The main script is contained in `examples.sh`.
+It requires `julia`, `python3`, and `ssa` (for StochKit2) to be visible in your `PATH`.
+
+## Individual scripts
+
+BioSimulator.jl, StochPy, and Gillespie.jl each have an associated script for running a benchmark from the command line.
+Running BioSimulator.jl with the multithreading option requires setting the `JULIA_NUM_THREADS` environment variable (see `examples.sh` for an example).
+It uses the following scripts to run the suite:
+
+#### BioSimulator.jl
+
+Usage: `julia benchmarks_biosimulator.jl [model1] [model2] ...`
+
+#### StochPy
+
+Usage: `python3 stochpy_bench.py [model] [t_final] [n_saves] [seed] [n_trial]`
+
+#### StochKit
+
+Usage: `julia benchmarks_stochkit.jl [model1] [model2] ...`
+
+#### Gilespie.jl
+
+Usage: `julia benchmarks_gillespie.jl [model1] [model2] ...`
+
+## Model definitions
+
+Directories:
+- `autoreg`
+- `dimer-decay`
+- `kendall`
+- `mmek`
+- `yeast`
+
+Files (within each model directory):
+
+- `biosimulator.jl`
+- `gillespie.jl`
+- `stochkit2.xml`
+- `stochpy.psc`
+
+## Benchmark parameters
+
+With the exception of the StochPy script, all the benchmark scripts source benchmark parameters from `benchmark_parameters.jl`.
+This file simply stores the arguments passed to each tool's simulation command.
+Parameters are stored in a `ParamSet` which is a 5-tuple with the following structure:
+
+- `t_final`: Each simulation command realizes the simulated stochastic process up to this value.
+- `n_saves`: The number of time points to save for each realization. Used only for fixed-interval output.
+- `n_trial`: The number of realizations to simulate.
+- `n_sample`: The number of times to sample a simulation *task*, as defined above ([What are the benchmarks?](What-are-the-benchmarks?))
+- `t_limit`: A time limit for individual benchmarks. A benchmark task will terminate if this limit is exceeded, even if the number of samples is less than `n_sample`.
+
+In addition, a `SEED` value is used to seed each benchmark task.
+The `MODELS` variable stores the name of each model included in this benchmark suite.
+The `parameters` dictionary is used to associate `MODELS[i]` to a `ParamSet`.
+
+## Results
+
+Files:
+
+- `biosimulator-serial.json`
+- `biosimulator-parallel.json`
+- `stochkit.json`
+- `gillespie.json`
+- `stochpy/*.txt`
+
+The BenchmarkTools.jl output is saved in JSON files.
+These files, along with the output from StochPy, are summarized using
+
+```
+# for JSON output
+julia analysis.jl <filename>.json >> <another-filename>.txt
+
+# for StochPy output
+julia analysis.jl ./stochpy/<model>.txt >> summary-stochpy.txt
+```
+The summary files are structured as follows:
+
+```
+----- <model-name-here> -----
+trials: <value>       # no. times the task was measured
+  mean: <estimate> ms # mean based on <value>
+   std: <estimate> ms # standard deviation based on <value>
+median: <estimate> ms # median based on <value>
+    Q1: <estimate> ms # first quartile based on <value>
+    Q3: <estimate> ms # third quartile based on <value>
+```
+
+# Important caveats for consideration
+
+The tools selected here vary in output saving strategies, algorithm selection, and model representation.
+These notes should inform the interpretation of the benchmark results.
+Bold type is used to emphasize a point that is likely to have a large impact on performance.
+Please feel free to file an issue if an important point is missing. 
+
+### BioSimulator.jl
+
+- Uses fixed-interval output.
+- Generates a dependency graph even if it is not used.
+
+### StochPy
+
+- Saves state after each simulation step. Fixed-interval output is available but it uses StochKit2 internally.
+- Saves additional information about reaction propensities.
+
+### StochKit2
+
+- Uses fixed-interval output.
+- Uses parallelism by default.
+- Selects a variant of SSA based on model information. In our benchmarks, `odm_ssa_small` is used.
+- **The `ssa` command is invoked from within Julia.** The benchmark results use a small adjustment to account for any overhead, which varies depending on the machine.
+
+### Gillespie.jl
+
+- Saves state after each simulation step.
+- Does not generate trajectories in an ensemble like the other tools. **We implement a simple wrapper that uses a `for` loop to mimic this feature.** The output of each run is effectively ignored.
diff --git a/benchmarks/analysis.jl b/benchmarks/analysis.jl
@@ -0,0 +1,64 @@
+using BenchmarkTools, StatsBase
+
+##### function definitions #####
+
+# for BenchmarkTools.jl results
+function summarize(::Val{true}, fname)
+  # retrieve the Trial objects for the test suite
+  results = BenchmarkTools.load(fname)[1]
+
+  for (b_info, trial) in leaves(results)
+    # convert results from ns to ms
+    times = [ t * 1e-6 for t in trial.times ]
+
+    b_mean   = mean(times)
+    b_std    = std(times)
+    b_median = median(times)
+    b_q1     = quantile(times, 0.25)
+    b_q3     = quantile(times, 0.75)
+
+    # benchmark info string
+    str = [ "$(s) " for s in b_info ]
+
+    println("----- $(str...)-----")
+    println("trials: $(length(times))")
+    println("  mean: $(b_mean) ms")
+    println("   std: $(b_std) ms")
+    println("median: $(b_median) ms")
+    println("    Q1: $(b_q1) ms")
+    println("    Q3: $(b_q3) ms")
+    println()
+  end
+end
+
+function summarize(::Val{false}, fname)
+  # retrieve data from Python benchmarks
+  results = map(x -> parse(Float64, x), readlines(fname))
+
+  # convert results from s to ms
+  times = [ t * 1e3 for t in results ]
+
+  b_mean   = mean(times)
+  b_std    = std(times)
+  b_median = median(times)
+  b_q1     = quantile(times, 0.25)
+  b_q3     = quantile(times, 0.75)
+
+  println("----- $(fname) ------")
+  println("trials: $(length(results))")
+  println("  mean: $(b_mean) ms")
+  println("   std: $(b_std) ms")
+  println("median: $(b_median) ms")
+  println("    Q1: $(b_q1) ms")
+  println("    Q3: $(b_q3) ms")
+  println()
+end
+
+##### command-line tool #####
+
+fname = ARGS[1]
+ftype = split(fname, ".")[end]
+
+is_json = ftype == "json"
+
+summarize(Val(is_json), fname)
diff --git a/benchmarks/autoreg/biosimulator.jl b/benchmarks/autoreg/biosimulator.jl
@@ -0,0 +1,34 @@
+"""
+  Prokaryotic auto-regulation network
+"""
+function autoreg(;
+  gene = 10, P2_gene = 0, RNA = 0, P = 0, P2 = 0,
+  k1  :: Float64 = 0.01,
+  k2  :: Float64 = 10.0,
+  k3  :: Float64 = 1.0,
+  k3r :: Float64 = 1.0,
+  k4  :: Float64 = 1.0,
+  k4r :: Float64 = 10.0,
+  k5  :: Float64 = 0.1,
+  k6  :: Float64 = 0.01)
+  m = Network("auto-regulation")
+
+  m <= Species("gene",    gene)
+  m <= Species("P2_gene", P2_gene)
+  m <= Species("RNA",     RNA)
+  m <= Species("P",       P)
+  m <= Species("P2",      P2)
+
+  m <= Reaction("transcription",              k1,  "gene --> gene + RNA")
+  m <= Reaction("translation",                k2,  "RNA --> RNA + P")
+  m <= Reaction("dimerization",               k3,  "P + P --> P2")
+  m <= Reaction("dissociation",               k3r, "P2 --> P + P")
+  m <= Reaction("repression binding",         k4,  "gene + P2 --> P2_gene")
+  m <= Reaction("reverse repression binding", k4r, "P2_gene --> gene + P2")
+  m <= Reaction("RNA degradation",            k5,  "RNA --> 0")
+  m <= Reaction("protein degradation",        k6,  "P --> 0")
+
+  return m
+end
+
+model = autoreg()
diff --git a/benchmarks/autoreg/gillespie.jl b/benchmarks/autoreg/gillespie.jl
@@ -0,0 +1,39 @@
+function F_autoreg(x, parms)
+  gene, P2_gene, RNA, P, P2 = x
+  k1, k1r, k2, k3, k4, k4r, k5, k6 = parms
+
+  repression_binding = k1 * gene * P2
+  reverse_repression = k1r * P2_gene
+  transcription = k2 * gene
+  translation = k3 * RNA
+  dimerization = k4 * P * (P - 1) / 2
+  dissociation = k4r * P2
+  RNA_degradation = k5 * RNA
+  protein_degradation = k6 * P
+
+  return [repression_binding, reverse_repression, transcription, translation, dimerization, dissociation, RNA_degradation, protein_degradation]
+end
+
+function gillespie_autoreg(;
+  gene = 10, P2_gene = 0, RNA = 0, P = 0, P2 = 0,
+  k1 = 1.0, k1r = 10.0, k2 = 0.01, k3 = 10.0, k4 = 1.0, k4r = 1.0, k5 = 0.1, k6 = 0.01)
+  # initial conditions
+  x0 = [gene, P2_gene, RNA, P, P2]
+  # stoichiometries
+  nu = [
+    [-1  1  0  0 -1]; # gene + P2 --> P2_gene
+    [ 1 -1  0  0  1]; # P2_gene --> gene + P2
+    [ 0  0  1  0  0]; # gene --> gene + RNA
+    [ 0  0  0  1  0]; # RNA --> RNA + P
+    [ 0  0  0 -2  1]; # P + P --> P2
+    [ 0  0  0  2 -1]; # P2 --> P + P
+    [ 0  0 -1  0  0]; # RNA --> 0
+    [ 0  0  0 -1  0]  # P --> 0
+  ]
+  # parameter vector
+  parms = [k1, k1r, k2, k3, k4, k4r, k5, k6]
+  return x0, nu, parms
+end
+
+x0, nu, parms = gillespie_autoreg()
+F = F_autoreg