Skip to content

Commit

Permalink
Merge 342bfcc into 5ca0648
Browse files Browse the repository at this point in the history
  • Loading branch information
alanderos91 committed Nov 12, 2018
2 parents 5ca0648 + 342bfcc commit bd4e6cb
Show file tree
Hide file tree
Showing 91 changed files with 13,130 additions and 5 deletions.
Binary file added benchmarks/.DS_Store
Binary file not shown.
165 changes: 165 additions & 0 deletions benchmarks/README.md
@@ -0,0 +1,165 @@
# Benchmarks

This folder contains the benchmark code used in **BioSimulator.jl: Stochastic simulation in Julia**.
It compares BioSimulator.jl, [StochPy](http://stochpy.sourceforge.net/), [StochKit2](https://sourceforge.net/projects/stochkit/), and [Gillespie.jl](https://github.com/sdwfrost/Gillespie.jl) across a few examples. These benchmarks are not exhaustive.

# What are the benchmarks?

These scripts measure the performance of Gillespie's Direct method (SSA) across each software tool.
In addition, the BioSimulator.jl benchmarks include additional results for the algorithms currently implemented (SSA, FRM, NRM, OTL, and SAL) using full simulation output, fixed-interval output, and multi-threading.
There are five basic models included in this suite (see [model definitions](#Model-definitons)).

The *task* that is benchmarked is

```
run-ssa model t_final n_saves n_trial
```

where `run-ssa` is the command that runs SSA for a given software package.
Here `t_final` specifies a time span for an individual realization of the stochastic process, `n_saves` is the number of time points saved for a single realization, and `n_trial` is the number of realizations to simulate.
The time to complete this task is sampled several times to obtain reasonable estimates of summary statistics.
See the section on [benchmark parameters](#Benchmark-parameters) for additional details.

**The benchmark task we have defined reflects how quickly a given tool generates multiple time trajectories of a given stochastic process using SSA.**
"Good performance" depends on how well the following interact:

- internal model representation/construction
- algorithm implementation
- simulation engine(s)
- individual output generation/storage
- ensemble output generation/storage

With the exception of `StochPy`, performance is measured using [BenchmarkTools.jl](https://github.com/JuliaCI/BenchmarkTools.jl).

# Minimal requirements

| Software | Version | Dependencies |
|-----------------|---------|----------------------------|
| BioSimulator.jl | 0.4 | Julia v0.6.* |
| StochPy | 2.3 | Python 2.6+ or 3.4+, NumPy |
| StochKit2 | 2.0.13 | |
| Gillespie.jl | 0.0.2 | |

**Note**: The StochPy scripts use `python3` by default.

# Running the benchmarks

## Setup

The main script is contained in `examples.sh`.
It requires `julia`, `python3`, and `ssa` (for StochKit2) to be visible in your `PATH`.

## Individual scripts

BioSimulator.jl, StochPy, and Gillespie.jl each have an associated script for running a benchmark from the command line.
Running BioSimulator.jl with the multithreading option requires setting the `JULIA_NUM_THREADS` environment variable (see `examples.sh` for an example).
It uses the following scripts to run the suite:

#### BioSimulator.jl

Usage: `julia benchmarks_biosimulator.jl [model1] [model2] ...`

#### StochPy

Usage: `python3 stochpy_bench.py [model] [t_final] [n_saves] [seed] [n_trial]`

#### StochKit

Usage: `julia benchmarks_stochkit.jl [model1] [model2] ...`

#### Gilespie.jl

Usage: `julia benchmarks_gillespie.jl [model1] [model2] ...`

## Model definitions

Directories:
- `autoreg`
- `dimer-decay`
- `kendall`
- `mmek`
- `yeast`

Files (within each model directory):

- `biosimulator.jl`
- `gillespie.jl`
- `stochkit2.xml`
- `stochpy.psc`

## Benchmark parameters

With the exception of the StochPy script, all the benchmark scripts source benchmark parameters from `benchmark_parameters.jl`.
This file simply stores the arguments passed to each tool's simulation command.
Parameters are stored in a `ParamSet` which is a 5-tuple with the following structure:

- `t_final`: Each simulation command realizes the simulated stochastic process up to this value.
- `n_saves`: The number of time points to save for each realization. Used only for fixed-interval output.
- `n_trial`: The number of realizations to simulate.
- `n_sample`: The number of times to sample a simulation *task*, as defined above ([What are the benchmarks?](What-are-the-benchmarks?))
- `t_limit`: A time limit for individual benchmarks. A benchmark task will terminate if this limit is exceeded, even if the number of samples is less than `n_sample`.

In addition, a `SEED` value is used to seed each benchmark task.
The `MODELS` variable stores the name of each model included in this benchmark suite.
The `parameters` dictionary is used to associate `MODELS[i]` to a `ParamSet`.

## Results

Files:

- `biosimulator-serial.json`
- `biosimulator-parallel.json`
- `stochkit.json`
- `gillespie.json`
- `stochpy/*.txt`

The BenchmarkTools.jl output is saved in JSON files.
These files, along with the output from StochPy, are summarized using

```
# for JSON output
julia analysis.jl <filename>.json >> <another-filename>.txt
# for StochPy output
julia analysis.jl ./stochpy/<model>.txt >> summary-stochpy.txt
```
The summary files are structured as follows:

```
----- <model-name-here> -----
trials: <value> # no. times the task was measured
mean: <estimate> ms # mean based on <value>
std: <estimate> ms # standard deviation based on <value>
median: <estimate> ms # median based on <value>
Q1: <estimate> ms # first quartile based on <value>
Q3: <estimate> ms # third quartile based on <value>
```

# Important caveats for consideration

The tools selected here vary in output saving strategies, algorithm selection, and model representation.
These notes should inform the interpretation of the benchmark results.
Bold type is used to emphasize a point that is likely to have a large impact on performance.
Please feel free to file an issue if an important point is missing.

### BioSimulator.jl

- Uses fixed-interval output.
- Generates a dependency graph even if it is not used.

### StochPy

- Saves state after each simulation step. Fixed-interval output is available but it uses StochKit2 internally.
- Saves additional information about reaction propensities.

### StochKit2

- Uses fixed-interval output.
- Uses parallelism by default.
- Selects a variant of SSA based on model information. In our benchmarks, `odm_ssa_small` is used.
- **The `ssa` command is invoked from within Julia.** The benchmark results use a small adjustment to account for any overhead, which varies depending on the machine.

### Gillespie.jl

- Saves state after each simulation step.
- Does not generate trajectories in an ensemble like the other tools. **We implement a simple wrapper that uses a `for` loop to mimic this feature.** The output of each run is effectively ignored.
64 changes: 64 additions & 0 deletions benchmarks/analysis.jl
@@ -0,0 +1,64 @@
using BenchmarkTools, StatsBase

##### function definitions #####

# for BenchmarkTools.jl results
function summarize(::Val{true}, fname)
# retrieve the Trial objects for the test suite
results = BenchmarkTools.load(fname)[1]

for (b_info, trial) in leaves(results)
# convert results from ns to ms
times = [ t * 1e-6 for t in trial.times ]

b_mean = mean(times)
b_std = std(times)
b_median = median(times)
b_q1 = quantile(times, 0.25)
b_q3 = quantile(times, 0.75)

# benchmark info string
str = [ "$(s) " for s in b_info ]

println("----- $(str...)-----")
println("trials: $(length(times))")
println(" mean: $(b_mean) ms")
println(" std: $(b_std) ms")
println("median: $(b_median) ms")
println(" Q1: $(b_q1) ms")
println(" Q3: $(b_q3) ms")
println()
end
end

function summarize(::Val{false}, fname)
# retrieve data from Python benchmarks
results = map(x -> parse(Float64, x), readlines(fname))

# convert results from s to ms
times = [ t * 1e3 for t in results ]

b_mean = mean(times)
b_std = std(times)
b_median = median(times)
b_q1 = quantile(times, 0.25)
b_q3 = quantile(times, 0.75)

println("----- $(fname) ------")
println("trials: $(length(results))")
println(" mean: $(b_mean) ms")
println(" std: $(b_std) ms")
println("median: $(b_median) ms")
println(" Q1: $(b_q1) ms")
println(" Q3: $(b_q3) ms")
println()
end

##### command-line tool #####

fname = ARGS[1]
ftype = split(fname, ".")[end]

is_json = ftype == "json"

summarize(Val(is_json), fname)
34 changes: 34 additions & 0 deletions benchmarks/autoreg/biosimulator.jl
@@ -0,0 +1,34 @@
"""
Prokaryotic auto-regulation network
"""
function autoreg(;
gene = 10, P2_gene = 0, RNA = 0, P = 0, P2 = 0,
k1 :: Float64 = 0.01,
k2 :: Float64 = 10.0,
k3 :: Float64 = 1.0,
k3r :: Float64 = 1.0,
k4 :: Float64 = 1.0,
k4r :: Float64 = 10.0,
k5 :: Float64 = 0.1,
k6 :: Float64 = 0.01)
m = Network("auto-regulation")

m <= Species("gene", gene)
m <= Species("P2_gene", P2_gene)
m <= Species("RNA", RNA)
m <= Species("P", P)
m <= Species("P2", P2)

m <= Reaction("transcription", k1, "gene --> gene + RNA")
m <= Reaction("translation", k2, "RNA --> RNA + P")
m <= Reaction("dimerization", k3, "P + P --> P2")
m <= Reaction("dissociation", k3r, "P2 --> P + P")
m <= Reaction("repression binding", k4, "gene + P2 --> P2_gene")
m <= Reaction("reverse repression binding", k4r, "P2_gene --> gene + P2")
m <= Reaction("RNA degradation", k5, "RNA --> 0")
m <= Reaction("protein degradation", k6, "P --> 0")

return m
end

model = autoreg()
39 changes: 39 additions & 0 deletions benchmarks/autoreg/gillespie.jl
@@ -0,0 +1,39 @@
function F_autoreg(x, parms)
gene, P2_gene, RNA, P, P2 = x
k1, k1r, k2, k3, k4, k4r, k5, k6 = parms

repression_binding = k1 * gene * P2
reverse_repression = k1r * P2_gene
transcription = k2 * gene
translation = k3 * RNA
dimerization = k4 * P * (P - 1) / 2
dissociation = k4r * P2
RNA_degradation = k5 * RNA
protein_degradation = k6 * P

return [repression_binding, reverse_repression, transcription, translation, dimerization, dissociation, RNA_degradation, protein_degradation]
end

function gillespie_autoreg(;
gene = 10, P2_gene = 0, RNA = 0, P = 0, P2 = 0,
k1 = 1.0, k1r = 10.0, k2 = 0.01, k3 = 10.0, k4 = 1.0, k4r = 1.0, k5 = 0.1, k6 = 0.01)
# initial conditions
x0 = [gene, P2_gene, RNA, P, P2]
# stoichiometries
nu = [
[-1 1 0 0 -1]; # gene + P2 --> P2_gene
[ 1 -1 0 0 1]; # P2_gene --> gene + P2
[ 0 0 1 0 0]; # gene --> gene + RNA
[ 0 0 0 1 0]; # RNA --> RNA + P
[ 0 0 0 -2 1]; # P + P --> P2
[ 0 0 0 2 -1]; # P2 --> P + P
[ 0 0 -1 0 0]; # RNA --> 0
[ 0 0 0 -1 0] # P --> 0
]
# parameter vector
parms = [k1, k1r, k2, k3, k4, k4r, k5, k6]
return x0, nu, parms
end

x0, nu, parms = gillespie_autoreg()
F = F_autoreg

0 comments on commit bd4e6cb

Please sign in to comment.