Skip to content

Commit

Permalink
Merge 6a9b558 into 66a8420
Browse files Browse the repository at this point in the history
  • Loading branch information
JamesYang007 committed Aug 13, 2020
2 parents 66a8420 + 6a9b558 commit cd3b8ad
Show file tree
Hide file tree
Showing 111 changed files with 5,088 additions and 5,230 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Expand Up @@ -5,3 +5,6 @@ build
lib/*
!lib/CMakeLists.txt
CMakeSettings.json
test/mcmc/hmc/nuts/ref/*_stan*
!test/mcmc/hmc/nuts/ref/*_stan.py
!test/mcmc/hmc/nuts/ref/*_stan.stan
16 changes: 6 additions & 10 deletions CMakeLists.txt
@@ -1,6 +1,6 @@
cmake_minimum_required(VERSION 3.7)
project("autoppl"
VERSION 0.8
VERSION 1.1
DESCRIPTION "C++ template library for probabilistic programming."
LANGUAGES C CXX)

Expand Down Expand Up @@ -32,15 +32,11 @@ target_compile_features(${PROJECT_NAME} INTERFACE cxx_std_17)
# Add AutoPPL header-only library to variable
set(AUTOPPL_LIBS ${PROJECT_NAME})

# Find Armadillo C++
find_package(Armadillo CONFIG REQUIRED
HINTS ${CMAKE_CURRENT_SOURCE_DIR}/lib/armadillo)
if (Armadillo_FOUND)
message(STATUS "Found Armadillo config at: ${Armadillo_DIR}")
add_compile_options(-DARMA_DONT_USE_WRAPPER)
set(AUTOPPL_INCLUDE_DIRS ${AUTOPPL_INCLUDE_DIRS} ${ARMADILLO_INCLUDE_DIRS})
set(AUTOPPL_LIBS ${AUTOPPL_LIBS} ${ARMADILLO_LIBRARIES})
endif()
# Find Eigen3
find_package(Eigen3 3.3 CONFIG REQUIRED NO_MODULE
HINTS ${CMAKE_CURRENT_SOURCE_DIR}/lib/FastAD/libs/eigen-3.3.7/build/share)
set(AUTOPPL_LIBS ${AUTOPPL_LIBS} Eigen3::Eigen)
message(STATUS "Eigen3 found at ${EIGEN3_INCLUDE_DIR}")

# Find FastAD
find_package(FastAD CONFIG REQUIRED
Expand Down
164 changes: 99 additions & 65 deletions README.md
Expand Up @@ -8,6 +8,8 @@
## Table of Contents

- [Overview](#overview)
- [Who should use AutoPPL?](#who-should-use-autoppl)
- [How is AutoPPL different from existing PPLs like STAN and PyMC3?](#how-is-autoppl-different-from-existing-ppls-like-stan-and-pymc3)
- [Features](#features)
- [Distributions](#distributions)
- [Sampling Algorithms](#sampling-algorithms)
Expand All @@ -34,12 +36,35 @@ generic framework for specifying probabilistic models and applying inference alg

The library is still at an experimental stage.

### Who should use AutoPPL?

The goal of this project is to provide a framework for practitioners, students, and researchers.
It is often desired to have a framework for specifying a probabilistic model separate from any inference algorithms.
While AutoPPL does provide a few inference algorithms such as NUTS and Metropolis-Hastings,
it allows users to write their _own_ sampling algorithms and even add new distributions.

### How is AutoPPL different from existing PPLs like STAN and PyMC3?

AutoPPL can be thought of as a hybrid of STAN and PyMC3.
It is similar to STAN in that it is extremely optimized for high performance (see [Benchmarks](#benchmarks))
and it uses much of the same logic discussed in the STAN reference guide.
It is similar to PyMC3 in that it is a library rather than a separate domain-specific language.

However, it is unique in that it is purely a C++ library.
While STAN requires the user to write a separate STAN code, which gets translated into C++ code
and then compiled into a binary, AutoPPL just requires the user to directly write C++ code.
Some benefits include the following:
- eliminates the extra layer of abstraction to a separate domain-specific language
- users can use native C++ tools to clean and prepare data, and also examine the posterior samples
- easily extend the library such as adding new distributions or a sampling algorithm

## Features

### Distributions
- [Uniform](https://en.wikipedia.org/wiki/Uniform_distribution_(continuous))
- [Multivariate Normal (up to diagonal covariance matrix)](https://en.wikipedia.org/wiki/Multivariate_normal_distribution)
- [Multivariate Normal](https://en.wikipedia.org/wiki/Multivariate_normal_distribution)
- [Bernoulli](https://en.wikipedia.org/wiki/Bernoulli_distribution)
- [Wishart](https://en.wikipedia.org/wiki/Wishart_distribution)

### Sampling Algorithms
- [Metropolis-Hastings](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm)
Expand All @@ -60,8 +85,7 @@ theta ~ Uniform(-1, 1)
```

Given its simplicity and expressiveness, we wanted to mimic this compact notation as much
as possible for users of our library.

as possible for users of our library.
For the model specified above, the code in AutoPPL could look like the following:
```cpp
ppl::Data<double, ppl::vec> X({...}); // load data
Expand All @@ -73,25 +97,25 @@ auto model = ( // specify model
```

A `ppl::Param<T>` represents a scalar parameter in our model,
while `ppl::Data<T, ppl::vec>` holds observed data.
The model can be specified by listing expressions
that use `operator|=` to associate a variable with a distribution.
while `ppl::Data<T, ppl::vec>` holds observed data as a vector.
The model can be specified by listing expressions using
`operator|=` to associate a variable with a distribution
and `operator,` to "glue" those associations.

### Efficient Memory Usage

We make the assumption that users are able to specify the probabilistic model at compile-time.
As a result, AutoPPL can construct all model expressions at compile-time
simply from the type information.

simply from the type information.
A model object makes no heap-allocations and is minimal in size.
It is simply a small, contiguous slab of memory representing the binary tree.
The `model` object in the [previous section](#intuitive-model-specification)
is about `120 bytes` on `x86_64-apple-darwin17.7.0` using `clang-11.0.3`.
is about `88 bytes` on `x86_64-apple-darwin17.7.0` using `clang-11.0.3`.

For a more complicated model such as the following:
```cpp
ppl::Data<double> X;
ppl::Param<double, ppl::vec> theta(6);
std::array<ppl::Param<double>, 6> theta;
auto model = (
theta[0] |= ppl::uniform(-1., 1.),
theta[1] |= ppl::uniform(theta[0], theta[0] + 2.),
Expand All @@ -102,8 +126,7 @@ auto model = (
X |= ppl::normal(theta[5], 1.)
);
```

The size of the model is `616 bytes` on the same machine and compiler.
The size of the model is `440 bytes` on the same machine and compiler.

A model expression simply references the variables used in the expression
such as `theta[0], theta[1], ..., theta[5], X`, i.e. it does not copy any data.
Expand Down Expand Up @@ -132,10 +155,10 @@ with automatic differentiation and handling model expressions.
First, clone the repository:

```
git clone --recurse-submodules https://github.com/JamesYang007/autoppl
git clone https://github.com/JamesYang007/autoppl ~/autoppl
```

To build and run tests, run the following:
To build and run tests, change directory into cloned directory and run the following:
```
./setup.sh
./clean-build.sh debug
Expand All @@ -149,36 +172,53 @@ ctest

Although AutoPPL was designed to perform inference on posterior distributions,
one can certainly use it to sample from any joint distribution defined by the priors and conditional distributions.
As an example, we can sample `1000` points from a
As an example, we can sample `1000` points with `1000` warmup iterations from a
standard normal distribution using Metropolis-Hastings in the following way:

```cpp
std::array<double, 1000> samples;
ppl::Param<double> theta {samples.data()};
ppl::Param<double> theta;
auto model = (
theta |= ppl::normal(0., 1.)
);
ppl::mh(model, 1000);
ppl::MHConfig config;
config.warmup = 1000;
config.samples = 1000;
auto res = ppl::mh(model, config);
```

The scalar parameter `theta` is bound to the storage `samples` at construction.
After calling `mh`, the `1000` samples are placed into `samples`.
`res` is a `ppl::MCMCResult` object,
which contains resulting information from calling the sampling algorithm.
For example, `res.cont_samples` and `res.disc_samples` contain the continuous
and discrete samples, respectively, where each row is a sample.
A row consists of sampled parameter values in the same order as the
priors in the model.
As an example, for the following model
```cpp
auto model = (
t1 |= ...,
t2 |= ...,
t3 |= ...,
...
);
```
a sample row would consist of values for `t1`, `t2`, `t3` in that order.
If the parameters are multi-dimensional, their values are flattened assuming column-major format.
So if `t1` is a vector with 4 elements, the first four elements of a row will be values for `t1`.
If `t2` is a 2x2 matrix, the next four elements of a row will be values for `t2(0,0), t2(1,0), t2(0,1), t2(1,1)`.

In general, so long as the joint PDF is known,
or equivalently and more commonly if the conditional and prior PDFs are known,
one can sample from the distribution.
As another example, we may sample from a more complicated joint distribution:
```cpp
std::array<double, 1000> theta1_samples;
std::array<double, 1000> theta2_samples;
ppl::Param<double> theta1 {theta1_samples.data()};
ppl::Param<double> theta2 {theta2_samples.data()};
ppl::Param<double> theta1;
ppl::Param<double> theta2;
auto model = (
theta1 |= ppl::uniform(-1., 1.),
theta2 |= ppl::normal(theta1, 1.)
);
ppl::mh(model, 1000);
auto res = ppl::mh(model); // by default, config is default-constructed
// default number of warmup and samples is 1000
```

Here, `theta2` depends on `theta1` and hence defines
Expand All @@ -195,34 +235,36 @@ it is recommended to use the state-of-the-art NUTS sampler to sample
from the posterior distribution.

```cpp
std::array<double, 1000> mu_samples, sigma_samples;
ppl::Data<double, ppl::vec> x(5);
ppl::Param<double> mu;
ppl::Param<double> sigma;

ppl::Data<double, ppl::vec> x {1.0, 1.5, 1.7, 1.2, 1.5};
ppl::Param<double> mu {mu_samples.data()};
ppl::Param<double> sigma {sigma_samples.data()};
x.get() << 1.0, 1.5, 1.7, 1.2, 1.5; // load data

auto model = (
mu |= ppl::normal(0., 3.),
sigma |= ppl::uniform(0., 2.),
x |= ppl::normal(mu, sigma)
);

ppl::nuts(model);
auto res = ppl::nuts(model);
```

By default, NUTS will run 1000 iterations each for warmup and actual sampling
(total of 2000 iterations).
Note that `.get()` of a Data object returns a reference to `Eigen::Matrix<...>` object,
which contains the actual data.
Users can use this helper function to load data, but note that the size must be known at construction
of the Data variable and cannot be changed.
The default adaptation method is to estimate the diagonal precision matrix.
If the user wishes to change the configuration, one must pass in a config object:

```cpp
NUTSConfig<...> config; // replace ... with ppl::unit_var or ppl::diag_var
config.n_samples = ...; // number of samples
config.samples = ...; // number of samples
config.warmup = ...; // number of warmup iters
config.seed = ...; // set seed (default: random)
config.max_depth = ...; // max depth of tree built during NUTS (default: 10)

ppl::nuts(model, config);
auto res = ppl::nuts(model, config);
```

### Bayesian Linear Regression
Expand All @@ -241,13 +283,13 @@ b ~ Uniform(0, 2)

In AutoPPL, we can write the following code and sample from the posterior:
```cpp
std::array<double, 1000> w_storage;
std::array<double, 1000> b_storage;
ppl::Data<double> x(6);
ppl::Data<double> y(6);
ppl::Param<double> w;
ppl::Param<double> b;

ppl::Data<double> x{2.4, 3.1, 3.6, 4, 4.5, 5.};
ppl::Data<double> y{3.5, 4, 4.4, 5.01, 5.46, 6.1};
ppl::Param<double> w {w_storage.data()};
ppl::Param<double> b {b_storage.data()};
x.get() << 2.4, 3.1, 3.6, 4, 4.5, 5.;
y.get() << 3.5, 4, 4.4, 5.01, 5.46, 6.1;

auto model = (
w |= ppl::uniform(0., 2.),
Expand Down Expand Up @@ -303,10 +345,10 @@ and perform NUTS.
The runtimes have similar log-like behavior,
but it is clear that STAN (dotted lines) takes far longer
in both sampling and warmup times.
in both sampling and warmup times by a factor of about 5.
As for ESS/s, upon comparing by colors (corresponding to a parameter)
between dotted (STAN) and solid (AutoPPL) lines,
we see that AutoPPL has uniformly larger ESS/s.
we see that AutoPPL has uniformly larger ESS/s by a factor of 5 as well.
This difference quickly becomes larger as sample size grows.
From these plots and that sampling results were identical
show that the drastic difference in ESS/s is simply from faster
Expand All @@ -324,27 +366,21 @@ The full code can be found
[here](benchmark/regression_autoppl.cpp):
```cpp
auto X = ppl::make_data_view<ppl::mat>(X_data);
auto y = ppl::make_data_view<ppl::vec>(y_data);
DataView<double, mat> X(X_data.data(), X_data.rows(), X_data.cols());
DataView<double, vec> y(y_data.data(), y_data.rows());
ppl::Param<double, ppl::vec> w(3);
ppl::Param<double> b;
ppl::Param<double> s;
arma::mat storage(num_samples, w.size() + b.size() + s.size());
for (size_t i = 0; i < w.size(); ++i) {
w.storage(i) = storage.colptr(i);
}
b.storage() = storage.colptr(w.size());
s.storage() = storage.colptr(w.size() + b.size());
auto model = (s |= ppl::uniform(0.5, 8.),
b |= ppl::normal(0., 5.),
w |= ppl::normal(0., 5.),
y |= ppl::normal(ppl::dot(X, w) + b, s * s + 2.));
NUTSConfig<> config;
config.warmup = num_samples;
config.n_samples = num_samples;
config.samples = num_samples;
auto res = ppl::nuts(model, config);
```

Expand Down Expand Up @@ -377,11 +413,12 @@ means, standard deviation, and ESS.
The runtimes have a similar linear trend,
and it is clear that STAN (dotted lines) takes far longer
in both sampling and warmup times.
Comparing the sampling times, for example, we see about 20 times improvement.
The ESS/s for `l1` and `l2` overlap completely (red and blue) in both STAN and AutoPPL
and this is expected since they are symmetric in the model specification.
With the exception of the two smallest sample sizes (100, 500),
ESS/s is fairly constant as sample size varies.
It is quite evident that AutoPPL (solid) has a larger ESS/s by a significant factor.
It is quite evident that AutoPPL (solid) has a larger ESS/s by a factor of 20.

The reason for this difference is in how we handle expressions
where data vector elements are iid (independent and identically distributed).
Expand All @@ -391,26 +428,23 @@ However, it is worth noting that this optimization does not apply when
the data are simply independent but not identically distributed
(as in the [linear regression](#benchmarks-bayesian-linear-regression) case),
or when the variable is a parameter, not data.
Nonetheless, our AD is extremely fast due to vectorization and is in general faster than STAN.

The following is the AutoPPL code without data generation.
The full code can be found
[here](benchmark/normal_two_prior_distribution.cpp).

```cpp
ppl::Param<double> l1, l2, s;
ppl::Data<double, ppl::vec> y(n_data);
ppl::Param<double> lambda1, lambda2, sigma;

auto model = (
s |= ppl::uniform(0.0, 20.0),
l1 |= ppl::normal(0.0, 10.0),
l2 |= ppl::normal(0.0, 10.0),
y |= ppl::normal(l1 + l2, s)
sigma |= ppl::uniform(0.0, 20.0),
lambda1 |= ppl::normal(0.0, 10.0),
lambda2 |= ppl::normal(0.0, 10.0),
y |= ppl::normal(lambda1 + lambda2, sigma)
);

arma::mat storage(n_samples, 3);
l1.storage() = storage.colptr(0);
l2.storage() = storage.colptr(1);
s.storage() = storage.colptr(2);

ppl::NUTSConfig<> config;
config.n_samples = n_samples;
config.warmup = n_samples;
Expand Down Expand Up @@ -439,11 +473,11 @@ auto res = ppl::nuts(model, config);
## Third Party Tools

Many thanks to the following third party tools used in this project:
- [Armadillo](http://arma.sourceforge.net/): matrix library used for inference algorithms.
- [Clang](https://clang.llvm.org/): one of the main compilers used.
- [CMake](https://cmake.org/): build system.
- [Coveralls](https://coveralls.io/): check test coverage.
- [Cpp Coveralls](https://github.com/eddyxu/cpp-coveralls): check test coverage specifically for C++ code.
- [Eigen](http://eigen.tuxfamily.org/index.php?title=Main_Page): matrix library.
- [FastAD](https://github.com/JamesYang007/FastAD): automatic differentiation library.
- [GCC](https://gcc.gnu.org/): one of the main compilers used.
- [Google Benchmark](https://github.com/google/benchmark): benchmark library algorithms.
Expand Down

0 comments on commit cd3b8ad

Please sign in to comment.