Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
bgroenks96 committed Jun 5, 2024
1 parent 87f9afa commit fcd1379
Showing 1 changed file with 109 additions and 14 deletions.
123 changes: 109 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
[docs-dev-img]: https://img.shields.io/badge/docs-latest-blue.svg
[docs-dev-url]: https://bgroenks96.github.io/SimulationBasedInference.jl/dev/

`SimulationBasedInference.jl` aims to bring together a variety of different methods for *simulation-based inference*, i.e. statistical inference with simulator-like models, in the Julia programming language.
`SimulationBasedInference.jl` aims to bring together a variety of different methods for *simulation-based inference* (SBI), i.e. statistical inference with simulator-like models, in the Julia programming language. Although SBI can be applied to both Bayesian and Frequentist inference frameworks, this pckage focuses on the Bayesian approach.

Please note that this package is still very much under construction and things may break or change without prior notice.

Expand All @@ -14,7 +14,7 @@ If you would like to use this package in your work, please let us know by creati
## Introduction
Simulator-type models are ubiquitous in science and engineering.

Most (all?) models require some kind of input, e.g boundary conditions (forcings), physical properties or constants, etc.
Most simulator-type models require some kind of input, e.g boundary conditions (forcings), physical properties or constants, etc.

Often, these parameters are not fully known *a priori*... but usually we know something!

Expand All @@ -26,9 +26,7 @@ $$

The **posterior distribution** $p(\boldsymbol{\boldsymbol{\theta}} | \mathbf{y})$ represents our best estimate (with uncertainty) of the unknown parameters $\boldsymbol{\theta}$ after observing $\mathbf{y}$.

## Simulation-based inference

Simulation-based inference (SBI) [1] refers to the problem of performing **statistical inference** (Bayesian or otherwise) of unknown parameters $\boldsymbol{\theta}$ where the forward model $\mathcal{M}$:
**Simulation-based inference** (SBI) [1] refers to the problem of performing **statistical inference** (Bayesian or otherwise) of unknown parameters $\boldsymbol{\theta}$ where the forward model $\mathcal{M}$:

$$
y = \mathcal{M}(\boldsymbol{\theta}) + \epsilon
Expand All @@ -37,24 +35,121 @@ $$
is a dynamical model or physics-based *simulator* mapping from parameters to noisy ($\epsilon$) observations.

There are two fundamental challenges with this problem:
1. The forward model $\mathcal{M}$ is very often **nonlinear** and, in the case of dynamical models, **intractable** (i.e. we cannot write down the solution in analytical form).
1. The forward model $\mathcal{M}$ is very often **nonlinear** and typically has no closed-form solution.
2. Evaluating the forward map $\mathcal{M}(\boldsymbol{\theta})$ is usually non-trivial, i.e. **computationally expensive** or at least inconvenient.

Thus, classical statistical methods that rely on either analytical or numerical methods to derive the posterior distribution are generally difficult (or impossible) to apply.

### Methods

Although the term "simulation-based inference" is relatively new, the basic problem of statistical inference and uncertainty quantification with dynamical "simulator" models is not. In the field of geoscientific modeling in general and weather forecasting in particular, the problem is often referred to as *data assimilation* [2].

The following is a list of methods which are planned to be included in this package:
## Getting started

`SimulationBasedInference` extends the basic [SciML interface](https://docs.sciml.ai/SciMLBase/stable/interfaces/Problems/) to SBI problems.

The first basic building block is `SimulatorForwardProblem`, which wraps either an arbitrary function `f(x)`, where `x` are the input parameters, or another `SciMLProblem`.

The purpose of the `SimulatorForwardProblem` abstraction is to provide a general interface for defining observables and interacting with the simulator.

To illustrate this, we start with the [linear ODE example](examples/linearode/linearode.jl). We first define an `ODEProblem` describing the dynamical model:

```julia
ode_func(u,p,t) = -p[1]*u;
α_true = 0.2
ode_p = [α_true];
tspan = (0.0,10.0);
odeprob = ODEProblem(ode_func, [1.0], tspan, ode_p)
```
We can then define *observables* of the system which should be sampled over the course of the simulation. In this case, we will define a trivial observable that just returns the state of the ODE integrator:
```julia
tsave = tspan[1]+0.1:0.2:tspan[end];
n_obs = length(tsave);
observable = SimulatorObservable(:y, integrator -> integrator.u, tspan[1], tsave, size(odeprob.u0), samplerate=0.01);
```
We are now ready to construct a forward problem from `odeprob` and `observable`:
```julia
forward_prob = SimulatorForwardProblem(odeprob, observable)
```
which can then be solved normally using the `solve` or `init` interface from SciML:
```julia
forward_sol = solve(forward_prob, ode_solver);
y_pred = get_observable(forward_sol, :y)
```
```
╭────────────────────────────────╮
│ 50-element DimArray{Float64,1} │
├────────────────────────────────┴────────────────────────── dims ┐
↓ Ti Sampled{Float64} 0.1:0.2:9.9 ForwardOrdered Regular Points
└─────────────────────────────────────────────────────────────────┘
0.1 0.991057
0.3 0.961815
0.5 0.924101
0.7 0.887867
9.5 0.152753
9.7 0.146763
9.9 0.141009
```

For the purposes of demonstration, we construct artificial noisy observations from this forward solution:
```julia
true_obs = get_observable(forward_sol, :y)
noisy_obs = true_obs .+ 0.05*randn(rng, n_obs);
```

The next step is defining a `SimulatorInferenceProblem`. We first define priors using [Distributions.jl](https://juliastats.org/Distributions.jl/stable/):
```julia
model_prior = prior=Beta(2,2));
noise_scale_prior = prior=Exponential(0.1));
```

then we assign a likelihood:
```julia
lik = IsotropicGaussianLikelihood(observable, noisy_obs, noise_scale_prior);
```

and finally construct the inference problem:
```julia
inference_prob = SimulatorInferenceProblem(forward_prob, ode_solver, model_prior, lik);
```

We can solve the inference problem with one of the simplest methods, *ensemble importance sampling*:
```julia
const rng = Random.MersenneTwister(1234);
enis_sol = solve(inference_prob, EnIS(), ensemble_size=128, rng=rng);
# note that for EnIS, the primary output of inference is the importance weights, and the ensemble reflects only draws from the prior.
importance_weights = get_weights(enis_sol);
prior_ens = get_transformed_ensemble(enis_sol);
```

Note that all ensemble algorithms default to using the `EnsembleThreads` parallelization strategy. We could also use `EnsembleDistributed` or `EnsembleSerial` like so:
```julia
enis_sol = solve(inference_prob, EnIS(), EnsembleDistributed(), ensemble_size=128, rng=rng);
```

To apply a different inference algorithm, such as [EKS](https://clima.github.io/EnsembleKalmanProcesses.jl/dev/ensemble_kalman_sampler/), we need only change the inference solver!
```julia
eks_sol = solve(inference_prob, EKS(), ensemble_size=128, rng=rng);
```

If the `PythonCall` package is installed, we can also use one of the sequential neural density estimator methods from [sbi](https://sbi-dev.github.io/sbi/):
```julia
# this defaults to the SNPE-C method, but this can be changed in the PySNE arguments.
snpe_sol = solve(inference_prob, PySNE(), num_simulations=1000, rng=rng);

# we can also re-use simulations from before instead of generating new ones:
simdata = SimulationBasedInference.sample_ensemble_predictive(enis_sol);
snpe_sol = solve(inference_prob, PySNE(), simdata, rng=rng);
```

## Roadmap

The following is a list of inference methods and/or probablisitc programming frameworks which are planned to be integrated into this package:

#### Ensemble/particle algorithms

- [x] Importance sampling, a.k.a particle batch smoothing [3] (PBS) and generalized likelihood uncertainty estimation [4] (GLUE)
- [ ] Particle Filtering/Sequential Monte Carlo [5,6] (PF/SMC)
- [ ] Particle Filtering and Sequential Monte Carlo [5,6] (PF/SMC)
- [x] Ensemble Kalman sampling and inversion (EKS/EKI) via [EnsembleKalmanProcesses.jl](https://github.com/CliMA/EnsembleKalmanProcesses.jl) [7]
- [x] Ensemble smoother with multiple data assimilation [8] (ES-MDA)
- [ ] Adaptive multiple importance sampling [9] (AMIS)
- [ ] Adaptive multiple importance sampling [9] (AMIS)
- [ ] Particle flow filters [10] (PFF)

#### Density estimation
Expand All @@ -69,7 +164,7 @@ The following is a list of methods which are planned to be included in this pack
#### Package integration
- [x] [Turing](https://github.com/TuringLang/Turing.jl)
- [x] [DynamicHMC](https://github.com/tpapp/DynamicHMC.jl)
- [ ] [Gen](https://github.com/probcomp/Gen.jl)
- [x] [Gen](https://github.com/probcomp/Gen.jl)

## References
[1] Cranmer, Kyle, Johann Brehmer, and Gilles Louppe. "The frontier of simulation-based inference." Proceedings of the National Academy of Sciences 117.48 (2020): 30055-30062.
Expand Down

0 comments on commit fcd1379

Please sign in to comment.