Merge 6a9b558 into 66a8420

JamesYang007 · Aug 13, 2020 · cd3b8ad · cd3b8ad
2 parents 66a8420 + 6a9b558
commit cd3b8ad
Show file tree

Hide file tree

Showing 111 changed files with 5,088 additions and 5,230 deletions.
diff --git a/.gitignore b/.gitignore
@@ -5,3 +5,6 @@ build
 lib/*
 !lib/CMakeLists.txt
 CMakeSettings.json
+test/mcmc/hmc/nuts/ref/*_stan*
+!test/mcmc/hmc/nuts/ref/*_stan.py
+!test/mcmc/hmc/nuts/ref/*_stan.stan
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -1,6 +1,6 @@
 cmake_minimum_required(VERSION 3.7)
 project("autoppl"
-        VERSION 0.8
+        VERSION 1.1
         DESCRIPTION "C++ template library for probabilistic programming."
         LANGUAGES C CXX)
 
@@ -32,15 +32,11 @@ target_compile_features(${PROJECT_NAME} INTERFACE cxx_std_17)
 # Add AutoPPL header-only library to variable
 set(AUTOPPL_LIBS ${PROJECT_NAME})
 
-# Find Armadillo C++
-find_package(Armadillo CONFIG REQUIRED
-    HINTS ${CMAKE_CURRENT_SOURCE_DIR}/lib/armadillo)
-if (Armadillo_FOUND)
-    message(STATUS "Found Armadillo config at: ${Armadillo_DIR}")
-    add_compile_options(-DARMA_DONT_USE_WRAPPER)
-    set(AUTOPPL_INCLUDE_DIRS ${AUTOPPL_INCLUDE_DIRS} ${ARMADILLO_INCLUDE_DIRS})
-    set(AUTOPPL_LIBS ${AUTOPPL_LIBS} ${ARMADILLO_LIBRARIES})
-endif()
+# Find Eigen3
+find_package(Eigen3 3.3 CONFIG REQUIRED NO_MODULE
+    HINTS ${CMAKE_CURRENT_SOURCE_DIR}/lib/FastAD/libs/eigen-3.3.7/build/share)
+    set(AUTOPPL_LIBS ${AUTOPPL_LIBS} Eigen3::Eigen)
+message(STATUS "Eigen3 found at ${EIGEN3_INCLUDE_DIR}")
 
 # Find FastAD
 find_package(FastAD CONFIG REQUIRED

diff --git a/README.md b/README.md
@@ -8,6 +8,8 @@
 ## Table of Contents
 
 - [Overview](#overview)
+    - [Who should use AutoPPL?](#who-should-use-autoppl)
+    - [How is AutoPPL different from existing PPLs like STAN and PyMC3?](#how-is-autoppl-different-from-existing-ppls-like-stan-and-pymc3)
 - [Features](#features)
     - [Distributions](#distributions)
     - [Sampling Algorithms](#sampling-algorithms)
@@ -34,12 +36,35 @@ generic framework for specifying probabilistic models and applying inference alg
 
 The library is still at an experimental stage.
 
+### Who should use AutoPPL?
+
+The goal of this project is to provide a framework for practitioners, students, and researchers.
+It is often desired to have a framework for specifying a probabilistic model separate from any inference algorithms.
+While AutoPPL does provide a few inference algorithms such as NUTS and Metropolis-Hastings,
+it allows users to write their _own_ sampling algorithms and even add new distributions.
+
+### How is AutoPPL different from existing PPLs like STAN and PyMC3?
+
+AutoPPL can be thought of as a hybrid of STAN and PyMC3.
+It is similar to STAN in that it is extremely optimized for high performance (see [Benchmarks](#benchmarks))
+and it uses much of the same logic discussed in the STAN reference guide.
+It is similar to PyMC3 in that it is a library rather than a separate domain-specific language.
+
+However, it is unique in that it is purely a C++ library.
+While STAN requires the user to write a separate STAN code, which gets translated into C++ code
+and then compiled into a binary, AutoPPL just requires the user to directly write C++ code.
+Some benefits include the following:
+- eliminates the extra layer of abstraction to a separate domain-specific language
+- users can use native C++ tools to clean and prepare data, and also examine the posterior samples
+- easily extend the library such as adding new distributions or a sampling algorithm
+
 ## Features
 
 ### Distributions
 - [Uniform](https://en.wikipedia.org/wiki/Uniform_distribution_(continuous))
-- [Multivariate Normal (up to diagonal covariance matrix)](https://en.wikipedia.org/wiki/Multivariate_normal_distribution)
+- [Multivariate Normal](https://en.wikipedia.org/wiki/Multivariate_normal_distribution)
 - [Bernoulli](https://en.wikipedia.org/wiki/Bernoulli_distribution)
+- [Wishart](https://en.wikipedia.org/wiki/Wishart_distribution)
 
 ### Sampling Algorithms
 - [Metropolis-Hastings](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm)
@@ -60,8 +85,7 @@ theta ~ Uniform(-1, 1)
 ```
 
 Given its simplicity and expressiveness, we wanted to mimic this compact notation as much 
-as possible for users of our library.
-
+as possible for users of our library. 
 For the model specified above, the code in AutoPPL could look like the following:
 ```cpp
 ppl::Data<double, ppl::vec> X({...});   // load data
@@ -73,25 +97,25 @@ auto model = (                          // specify model
 ```
 
 A `ppl::Param<T>` represents a scalar parameter in our model, 
-while `ppl::Data<T, ppl::vec>` holds observed data. 
-The model can be specified by listing expressions 
-that use `operator|=` to associate a variable with a distribution.
+while `ppl::Data<T, ppl::vec>` holds observed data as a vector. 
+The model can be specified by listing expressions using 
+`operator|=` to associate a variable with a distribution
+and `operator,` to "glue" those associations.
 
 ### Efficient Memory Usage
 
 We make the assumption that users are able to specify the probabilistic model at compile-time.
 As a result, AutoPPL can construct all model expressions at compile-time
-simply from the type information.
-
+simply from the type information. 
 A model object makes no heap-allocations and is minimal in size.
 It is simply a small, contiguous slab of memory representing the binary tree.
 The `model` object in the [previous section](#intuitive-model-specification)
-is about `120 bytes` on `x86_64-apple-darwin17.7.0` using `clang-11.0.3`.
+is about `88 bytes` on `x86_64-apple-darwin17.7.0` using `clang-11.0.3`.
 
 For a more complicated model such as the following:
 ```cpp
 ppl::Data<double> X;
-ppl::Param<double, ppl::vec> theta(6);
+std::array<ppl::Param<double>, 6> theta;
 auto model = (
     theta[0] |= ppl::uniform(-1., 1.),
     theta[1] |= ppl::uniform(theta[0], theta[0] + 2.),
@@ -102,8 +126,7 @@ auto model = (
     X |= ppl::normal(theta[5], 1.)
 );
 ```
-
-The size of the model is `616 bytes` on the same machine and compiler.
+The size of the model is `440 bytes` on the same machine and compiler.
 
 A model expression simply references the variables used in the expression
 such as `theta[0], theta[1], ..., theta[5], X`, i.e. it does not copy any data.
@@ -132,10 +155,10 @@ with automatic differentiation and handling model expressions.
 First, clone the repository:
 
 ```
-git clone --recurse-submodules https://github.com/JamesYang007/autoppl
+git clone https://github.com/JamesYang007/autoppl ~/autoppl
 ```
 
-To build and run tests, run the following:
+To build and run tests, change directory into cloned directory and run the following:
 ```
 ./setup.sh
 ./clean-build.sh debug
@@ -149,36 +172,53 @@ ctest
 
 Although AutoPPL was designed to perform inference on posterior distributions,
 one can certainly use it to sample from any joint distribution defined by the priors and conditional distributions.
-As an example, we can sample `1000` points from a 
+As an example, we can sample `1000` points with `1000` warmup iterations from a 
 standard normal distribution using Metropolis-Hastings in the following way:
 
 ```cpp
-std::array<double, 1000> samples;
-ppl::Param<double> theta {samples.data()};
+ppl::Param<double> theta;
 auto model = (
     theta |= ppl::normal(0., 1.)
 );
-ppl::mh(model, 1000);
+ppl::MHConfig config;
+config.warmup = 1000;
+config.samples = 1000;
+auto res = ppl::mh(model, config);
 ```
 
-The scalar parameter `theta` is bound to the storage `samples` at construction.
-After calling `mh`, the `1000` samples are placed into `samples`.
+`res` is a `ppl::MCMCResult` object, 
+which contains resulting information from calling the sampling algorithm.
+For example, `res.cont_samples` and `res.disc_samples` contain the continuous
+and discrete samples, respectively, where each row is a sample.
+A row consists of sampled parameter values in the same order as the 
+priors in the model.
+As an example, for the following model
+```cpp
+auto model = (
+    t1 |= ...,
+    t2 |= ...,
+    t3 |= ...,
+    ...
+);
+```
+a sample row would consist of values for `t1`, `t2`, `t3` in that order.
+If the parameters are multi-dimensional, their values are flattened assuming column-major format.
+So if `t1` is a vector with 4 elements, the first four elements of a row will be values for `t1`.
+If `t2` is a 2x2 matrix, the next four elements of a row will be values for `t2(0,0), t2(1,0), t2(0,1), t2(1,1)`.
 
 In general, so long as the joint PDF is known, 
 or equivalently and more commonly if the conditional and prior PDFs are known,
 one can sample from the distribution.
-
 As another example, we may sample from a more complicated joint distribution:
 ```cpp
-std::array<double, 1000> theta1_samples;
-std::array<double, 1000> theta2_samples;
-ppl::Param<double> theta1 {theta1_samples.data()};
-ppl::Param<double> theta2 {theta2_samples.data()};
+ppl::Param<double> theta1;
+ppl::Param<double> theta2;
 auto model = (
     theta1 |= ppl::uniform(-1., 1.),
     theta2 |= ppl::normal(theta1, 1.)
 );
-ppl::mh(model, 1000);
+auto res = ppl::mh(model);  // by default, config is default-constructed
+                            // default number of warmup and samples is 1000
 ```
 
 Here, `theta2` depends on `theta1` and hence defines
@@ -195,34 +235,36 @@ it is recommended to use the state-of-the-art NUTS sampler to sample
 from the posterior distribution.
 
 ```cpp
-std::array<double, 1000> mu_samples, sigma_samples;
+ppl::Data<double, ppl::vec> x(5);
+ppl::Param<double> mu;
+ppl::Param<double> sigma;
 
-ppl::Data<double, ppl::vec> x {1.0, 1.5, 1.7, 1.2, 1.5};
-ppl::Param<double> mu {mu_samples.data()};
-ppl::Param<double> sigma {sigma_samples.data()};
+x.get() << 1.0, 1.5, 1.7, 1.2, 1.5; // load data
 
 auto model = (
     mu |= ppl::normal(0., 3.),
     sigma |= ppl::uniform(0., 2.),
     x |= ppl::normal(mu, sigma)
 );
 
-ppl::nuts(model);
+auto res = ppl::nuts(model);
 ```
 
-By default, NUTS will run 1000 iterations each for warmup and actual sampling
-(total of 2000 iterations).
+Note that `.get()` of a Data object returns a reference to `Eigen::Matrix<...>` object,
+which contains the actual data.
+Users can use this helper function to load data, but note that the size must be known at construction
+of the Data variable and cannot be changed.
 The default adaptation method is to estimate the diagonal precision matrix.
 If the user wishes to change the configuration, one must pass in a config object:
 
 ```cpp
 NUTSConfig<...> config; // replace ... with ppl::unit_var or ppl::diag_var
-config.n_samples = ...; // number of samples
+config.samples = ...;   // number of samples
 config.warmup = ...;    // number of warmup iters
 config.seed = ...;      // set seed (default: random)
 config.max_depth = ...; // max depth of tree built during NUTS (default: 10)
 
-ppl::nuts(model, config);
+auto res = ppl::nuts(model, config);
 ```
 
 ### Bayesian Linear Regression
@@ -241,13 +283,13 @@ b ~ Uniform(0, 2)
 
 In AutoPPL, we can write the following code and sample from the posterior:
 ```cpp
-std::array<double, 1000> w_storage;
-std::array<double, 1000> b_storage;
+ppl::Data<double> x(6); 
+ppl::Data<double> y(6);
+ppl::Param<double> w;
+ppl::Param<double> b;
 
-ppl::Data<double> x{2.4, 3.1, 3.6, 4, 4.5, 5.};
-ppl::Data<double> y{3.5, 4, 4.4, 5.01, 5.46, 6.1};
-ppl::Param<double> w {w_storage.data()};
-ppl::Param<double> b {b_storage.data()};
+x.get() << 2.4, 3.1, 3.6, 4, 4.5, 5.;
+y.get() << 3.5, 4, 4.4, 5.01, 5.46, 6.1;
 
 auto model = (
         w |= ppl::uniform(0., 2.),
@@ -303,10 +345,10 @@ and perform NUTS.
 
 The runtimes have similar log-like behavior,
 but it is clear that STAN (dotted lines) takes far longer
-in both sampling and warmup times.
+in both sampling and warmup times by a factor of about 5.
 As for ESS/s, upon comparing by colors (corresponding to a parameter)
 between dotted (STAN) and solid (AutoPPL) lines,
-we see that AutoPPL has uniformly larger ESS/s.
+we see that AutoPPL has uniformly larger ESS/s by a factor of 5 as well.
 This difference quickly becomes larger as sample size grows.
 From these plots and that sampling results were identical
 show that the drastic difference in ESS/s is simply from faster
@@ -324,27 +366,21 @@ The full code can be found
 [here](benchmark/regression_autoppl.cpp):
 
 ```cpp
-auto X = ppl::make_data_view<ppl::mat>(X_data);
-auto y = ppl::make_data_view<ppl::vec>(y_data); 
+DataView<double, mat> X(X_data.data(), X_data.rows(), X_data.cols());
+DataView<double, vec> y(y_data.data(), y_data.rows());
 ppl::Param<double, ppl::vec> w(3);
 ppl::Param<double> b;
 ppl::Param<double> s;
 
-arma::mat storage(num_samples, w.size() + b.size() + s.size());
-for (size_t i = 0; i < w.size(); ++i) {
-    w.storage(i) = storage.colptr(i);
-}
-b.storage() = storage.colptr(w.size());
-s.storage() = storage.colptr(w.size() + b.size());
-
 auto model = (s |= ppl::uniform(0.5, 8.),
               b |= ppl::normal(0., 5.),
               w |= ppl::normal(0., 5.),
               y |= ppl::normal(ppl::dot(X, w) + b, s * s + 2.));
 
 NUTSConfig<> config;
 config.warmup = num_samples;
-config.n_samples = num_samples;
+config.samples = num_samples;
+
 auto res = ppl::nuts(model, config);
 ```
 
@@ -377,11 +413,12 @@ means, standard deviation, and ESS.
 The runtimes have a similar linear trend,
 and it is clear that STAN (dotted lines) takes far longer
 in both sampling and warmup times.
+Comparing the sampling times, for example, we see about 20 times improvement.
 The ESS/s for `l1` and `l2` overlap completely (red and blue) in both STAN and AutoPPL
 and this is expected since they are symmetric in the model specification.
 With the exception of the two smallest sample sizes (100, 500),
 ESS/s is fairly constant as sample size varies.
-It is quite evident that AutoPPL (solid) has a larger ESS/s by a significant factor.
+It is quite evident that AutoPPL (solid) has a larger ESS/s by a factor of 20.
 
 The reason for this difference is in how we handle expressions
 where data vector elements are iid (independent and identically distributed).
@@ -391,26 +428,23 @@ However, it is worth noting that this optimization does not apply when
 the data are simply independent but not identically distributed
 (as in the [linear regression](#benchmarks-bayesian-linear-regression) case),
 or when the variable is a parameter, not data.
+Nonetheless, our AD is extremely fast due to vectorization and is in general faster than STAN.
 
 The following is the AutoPPL code without data generation.
 The full code can be found 
 [here](benchmark/normal_two_prior_distribution.cpp).
 
 ```cpp
-ppl::Param<double> l1, l2, s;
+ppl::Data<double, ppl::vec> y(n_data);
+ppl::Param<double> lambda1, lambda2, sigma;
 
 auto model = (
-    s |= ppl::uniform(0.0, 20.0),
-    l1 |= ppl::normal(0.0, 10.0),
-    l2 |= ppl::normal(0.0, 10.0),
-    y |= ppl::normal(l1 + l2, s)
+    sigma |= ppl::uniform(0.0, 20.0),
+    lambda1 |= ppl::normal(0.0, 10.0),
+    lambda2 |= ppl::normal(0.0, 10.0),
+    y |= ppl::normal(lambda1 + lambda2, sigma)
 );
 
-arma::mat storage(n_samples, 3);
-l1.storage() = storage.colptr(0);
-l2.storage() = storage.colptr(1);
-s.storage() = storage.colptr(2);
-
 ppl::NUTSConfig<> config;
 config.n_samples = n_samples;
 config.warmup = n_samples;
@@ -439,11 +473,11 @@ auto res = ppl::nuts(model, config);
 ## Third Party Tools
 
 Many thanks to the following third party tools used in this project:
-- [Armadillo](http://arma.sourceforge.net/): matrix library used for inference algorithms.
 - [Clang](https://clang.llvm.org/): one of the main compilers used.
 - [CMake](https://cmake.org/): build system.
 - [Coveralls](https://coveralls.io/): check test coverage.
 - [Cpp Coveralls](https://github.com/eddyxu/cpp-coveralls): check test coverage specifically for C++ code.
+- [Eigen](http://eigen.tuxfamily.org/index.php?title=Main_Page): matrix library.
 - [FastAD](https://github.com/JamesYang007/FastAD): automatic differentiation library.
 - [GCC](https://gcc.gnu.org/): one of the main compilers used.
 - [Google Benchmark](https://github.com/google/benchmark): benchmark library algorithms.