<a href="https://colab.research.google.com/github/annariha/StanCon-2024-BO-Stan/blob/main/template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Bayesian optimisation using Stan @ StanCon 2024

### Setup

In [None]:
# Use a repository of pre-built package binaries to speed-up installation
download.file("https://github.com/eddelbuettel/r2u/raw/master/inst/scripts/add_cranapt_jammy.sh",
              "add_cranapt_jammy.sh")
Sys.chmod("add_cranapt_jammy.sh", "0755")
system("./add_cranapt_jammy.sh")

# Install the R Packages we'll be using
install.packages(c("here", "tidyverse", "bayesplot", "cmdstanr"),
                  repos = c("https://stan-dev.r-universe.dev", getOption("repos")))


In this tutorial, we use the [cmdstanr](https://mc-stan.org/cmdstanr/articles/cmdstanr.html) R interface to CmdStan. We install and setup CmdStan as follows:

In [None]:
# Install and setup CmdStan
download.file("https://github.com/stan-dev/cmdstan/releases/download/v2.35.0/colab-cmdstan-2.35.0.tgz",
              "cmdstan-2.35.0.tgz")
utils::untar("cmdstan-2.35.0.tgz")
cmdstanr::set_cmdstan_path("cmdstan-2.35.0")

Now, we load all required libraries and set a seed: 

In [None]:
library(cmdstanr)
library(here)
library(tidyverse)
library(khroma)

set.seed(424242)

### Icebreaker 

Idea: show function evaluations & ask where to sample next to find globale minimum

### Introduction to Bayesian optimisation (BO)

 - goal of BO
 - where is it used in practice (use cases) → add model selection perspective
 - components of BO

### Surrogate models

GP as a surrogate model
- GP as a prior (and posterior) over functions 
- Exercise: implement your own kernel


### Acquisition functions
-> if I create a figure, put the code in the template such that participants can try


### (?) Computational Tricks for GPs 

- mention GP tricks: Kronecker, HSGP
- Thompson sampling

### Cost- and response propensity-aware BO 

#### Varying cost of queries  

#### Propensity of response 

-> possibly provide samples via GitHub & then ask to visualise 
-> Aalto file sharing 

## Wrap-up 

Same function as in the ice-breaker; Your turn: query from us function evaluations, and find the maximum of a function, but this time use the science you’ve learnt

To illustrate the different steps of Bayesian optimisation, assume that the unknown function is $f(x) = (6  x - 2)^2  \sin(12  x - 4)$ (aka Forrester function). 

In [None]:
x_grid <- seq(0, 1, length.out = 100)
f_evals <- true_f(x_grid)
data_plot <- data.frame(x_grid, f_evals)

# Plot the objective function 
plot <- ggplot(data = data_plot, aes(x = x_grid, y = f_evals)) +
  geom_line() +
  labs(y = "f(x)") +
  theme_bw()

plot

We can set up a GP surrogate for $f(x)$ as follows: 

$$
\begin{aligned}
y &\sim \text{N}(g(x), \sigma) \ \text{with} \ \sigma \sim \text{N}^+(0,1),\\
\\
g(x) &\sim GP(\mu, K),  \text{with} \ \mu \sim \text{N}(0,1),\\ 
K_{i,j} &= k (x_i, x_j) = \alpha^2  \exp \left(- \frac{(x_i - x_j)^2}{\rho^2} \right),\\
\\
\alpha &\sim \text{N}^+(0,1),\\ 
\rho &\sim \text{N}(0.3,0.1).
\end{aligned}
$$

In Stan, we can implement the model like this: 

In [None]:
cat(readLines(here::here("Stan", "fit_gauss_3.stan")), sep = "\n")

### Prior predictive checks 

Before we have collected any data, we can sample from our priors and check the predictions we would obtain. 

In [None]:
# Get samples using chosen priors 

model_sim <- cmdstanr::cmdstan_model(stan_file = here::here("Stan", "sim_gauss.stan"))

n_draws <- 15
samples <- matrix(NA, nrow = n_draws, ncol=length(x_grid))

for (i in 1:n_draws){
    # 1. create data input, sample from the chosen priors for fit_gauss_3.stan
    stan_dat <- list(N = length(x_grid),
                     x = x_grid,
                     alpha = abs(rnorm(1)),
                     rho = rnorm(1, 0.3, 0.1),
                     mu = rnorm(1),
                     sigma = abs(rnorm(1, 0.1, 1)))

    # 2. sample from model_sim using one chain and iteration, no warmup
    gp_priors <- model_sim$sample(data = stan_dat,
                                  seed = 424242, 
                                  iter_sampling = 1, 
                                  iter_warmup = 0, 
                                  chains = 1,
                                  adapt_engaged=FALSE) # to sample without warmup
    # 3. extract corresponding samples
    samples[i,] <- gp_priors$draws("g") 
  }

In [None]:
# Visualise the prior predictive results 


# Extract and reformat samples
data <- data.frame(samples[,1:NCOL(samples)])
colnames(data) <- sub("^X", "", colnames(data))

data <- data |>
  mutate(draw_id = as.factor(row_number())) |>
  pivot_longer(cols = -draw_id, names_to = "n_evals", values_to = "evaluations") |>
  mutate(n_evals = as.integer(n_evals)) |>
  nest(data = c(draw_id, evaluations)) |>
  mutate(x_grid = x_grid) |>
  unnest(cols = c(data))

# Plot prior predictives 
plot_prior_pred <- plot +
  geom_line(data = data, aes(x = x_grid, y = evaluations, group = draw_id, color = draw_id, alpha = 0.4)) + 
  scale_colour_discreterainbow() + 
  theme(legend.position = "none")

plot_prior_pred