# Variational Laplace and Sampling

This demo exemplifies sampling-based and Laplace-approximated message passing that generalize ForneyLab beyond analytical message updates. The present demo illustrates the versatility of these approaches, especially when combined in a number of example applications:

1. Inference in non-conjugate models (univariate)
2. Inference in non-conjugate models (multivariate)
3. Sampling-based approximation of the posterior
4. Hierarchical models and nonlinear functions
5. Hybrid models
6. Nonlinear functions with multiple arguments


## 1. Inference in non-conjucate models (univariate)

In the first example, we assume an observation $y$ that is drawn from a Gaussian distribution, of which we know the mean $m$ is positive. We are interested in inferring a posterior belief over the mean. In this model, we enforce the positivity mean by a non-conjugate Gamma prior.

We approximate the posterior belief with a Gaussian through a Laplace approximation. ForneyLab fully automates this procedure as a gradient ascent on the log-pdf of posterior, where the gradients are obtained by automatic differentiation. Step sizes are also adjusted automaticely according to the goodness of the fit.

In [1]:
using ForneyLab, LinearAlgebra

# Build a non-conjugate model
g = FactorGraph()

@RV m ~ Gamma(0.5, 0.2) # Choose a Gamma prior for the mean
@RV y ~ GaussianMeanVariance(m, 1.0)

placeholder(y, :y)
;

In [2]:
# Construct an algorithm that infers a belief for the mean
algo = sumProductAlgorithm(m)
source_code = algorithmSourceCode(algo)
eval(Meta.parse(source_code));
# println(source_code) # Uncomment to inspect source code

In [3]:
# Execute the algorithm
data = Dict(:y => 2.2)
marginals = step!(data)
marginals[:m] # Inspect the resulting belief

𝒩(m=1.71, w=0.83)


## 2. Inference in non-conjugate models (multivariate)

The above example can also be extended to a multivariate model. Here we consider a three-dimensional Gaussian observation $y$, for which the elements of the mean $m$ must sum to unity. This constraint can be enforced by a Dirichlet prior on the mean.

In [4]:
# Build a non-conjugate model
g = FactorGraph()

@RV m ~ Dirichlet([2.0, 1.0, 3.4])
@RV y ~ GaussianMeanVariance(m, diageye(3))

placeholder(y, :y, dims=(3,))
;

In [5]:
# Infer an algorithm
algo = sumProductAlgorithm(m)
source_code = algorithmSourceCode(algo)
eval(Meta.parse(source_code));
# println(source_code) # Uncomment to inspect source code

In [6]:
# Execute the algorithm
data = Dict(:y => [3.2, 3.2, 3.2])
marginals = step!(data)
marginals[:m] # Inspect the result

𝒩(m=[3.49, 3.20, 3.83], w=[[1.08, -0.00, -0.00][-0.00, 1.00, -0.00][-0.00, -0.00, 1.16]])


## 3. Sampling-based approximations of the posterior

The above examples automatically approximated the posterior belief with a Gaussian through a Laplace approximation. However, a Gaussian belief might not always be the right choice of approximation, and often more flexibility is required.

In this example, we use importance sampling to approximate the posterior belief with a set of samples and corresponding weights. Here, one message is designated as the sampling distribution, and the other message is used to determine the importance weights.

We again consider a non-conjugate model, as defined below.

In [7]:
# Define a model
g = FactorGraph()

@RV l ~ Beta(2.0, 5.0)
@RV y ~ Poisson(l)

placeholder(y, :y)
;

In [8]:
# Infer an algorithm
algo = sumProductAlgorithm(l)
source_code = algorithmSourceCode(algo)
eval(Meta.parse(source_code));
# println(source_code) # Uncomment to inspect source code

In [9]:
# Execute the algorithm
data = Dict(:y => 7.0)
marginals = step!(data)
marg_l = marginals[:l]
println("The marginal for l is a $(typeof(marg_l)) with mean $(round(mean(marg_l), digits=3)) and variance $(round(var(marg_l), digits=3))")

The marginal for l is a ProbabilityDistribution{Univariate,SampleList} with mean 0.626 and variance 0.018


## 4. Hierarchical models and nonlinear functions

A hierarchical model is a model where higher-layer beliefs constrain the statistics of lower-layer beliefs. In this example we consider a model where we have a top-level variable $z$ that constrains the precision of a lower-level variables $x$, which in turn controls the mean of an observed variable $y$. We are interested in obtaining posterior beliefs for both hierarchical layers.

In [10]:
# Build a hierarchical, nonlinear model
g = FactorGraph()

f(z) = exp(-z) # Nonlinear mapping between layers

@RV z ~ GaussianMeanVariance(0.0, 1.0) # Higher layer
@RV w ~ Nonlinear{Sampling}(z, g=f, n_samples=1000) # Connect layers
@RV x ~ GaussianMeanPrecision(0.0, w) # Lower layer with controlled precision
@RV y ~ GaussianMeanPrecision(x, 1.0) # Observation model

placeholder(y, :y)
;

The nonlinear mapping between layers renders belief propagation intractable, even in this simple model. However, messages can still be computed through an importance sampling procedure. Samples from the higher layer are first transformed through the nonlinearity, with a set of accompanying weights. These weights are then importance-adjusted from the lower layer. A Laplace approximation then again ensures that the resulting beliefs are Gaussian.

In [11]:
# Infer an algorithm
q = PosteriorFactorization(z, x, ids=[:Z, :X])
algo = variationalAlgorithm(q, free_energy=true)
source_code = algorithmSourceCode(algo, free_energy=true)
eval(Meta.parse(source_code));
# println(source_code) # Uncomment to inspect source code

In [12]:
# Execute algorithm
n_its = 5
marginals = Dict()
F = zeros(n_its)
data = Dict(:y => 1.4)

marginals[:z] = ProbabilityDistribution(Univariate, GaussianMeanVariance, m=0.0, v=1.0)
marginals[:x] = ProbabilityDistribution(Univariate, GaussianMeanVariance, m=0.0, v=1.0)
marginals[:w] = vague(SampleList)

for i = 1:n_its
    stepX!(data, marginals)
    stepZ!(data, marginals)
    
    F[i] = freeEnergy(data, marginals)
end

In [13]:
marginals[:z] # Inspect higher-layer belief

𝒩(m=-0.08, w=1.42)


In [14]:
marginals[:x] # Inspect lower-layer belief

𝒩(xi=1.40, w=2.34)


In [15]:
print("Free energy per iteration: ", join(round.(F, digits=3), ", ")) # Inspect free energy

Free energy per iteration: 2.041, 1.951, 1.945, 1.949, 1.946

Because the process is stochastic, free energy may not decrease on every iteration.

## 5. Hybrid models

The nonlinear mapping allows us to define customized relations between random variables that may include conditional statements, loops, etc. We can then build highly flexible models, as exemplified below.

In [16]:
# Build a model
g = FactorGraph()

# Nonlinear mapping with included control flow
function f(x)
    if x[1] == 1
        w = 0.1
    elseif x[2] == 1
        w = 1
    elseif x[3] == 1
        w = 10
    end
    
    return w
end

@RV z ~ Dirichlet([2.1, 4.4, 3.2])
@RV x ~ Categorical(z)
@RV w ~ Nonlinear{Sampling}(x, g=f, n_samples=1000)
@RV y ~ GaussianMeanPrecision(0, w)

placeholder(y, :y)
;

In [17]:
# Infer algorithm
q = PosteriorFactorization(z, x, ids=[:Z, :X])
algo = variationalAlgorithm(q, free_energy=true)
source_code = algorithmSourceCode(algo, free_energy=true)
eval(Meta.parse(source_code));
# println(source_code) # Uncomment to inspect source code

In [18]:
# Execute algorithm
n_its = 5
marginals = Dict()
F = zeros(n_its)
data = Dict(:y => 1.4)

marginals[:z] = ProbabilityDistribution(Dirichlet, a=[1.0, 1.0, 1.0])
marginals[:x] = ProbabilityDistribution(Categorical, p=[0.3, 0.3, 0.4])
marginals[:w] = vague(SampleList)

for i = 1:n_its
    stepZ!(data, marginals)
    stepX!(data, marginals)
    
    F[i] = freeEnergy(data, marginals)
end

In [19]:
marginals[:z] # Inspect higher-level belief

Dir(a=[2.32, 5.18, 3.20])


In [20]:
# Inspect lower-level belief
marg_x = marginals[:x]
println("The marginal for x is a $(typeof(marg_x)) with mean vector entries\n$(mean(marg_x))")

The marginal for x is a ProbabilityDistribution{Univariate,SampleList} with mean vector entries
  [1]  =  0.233642
  [2]  =  0.766144
  [3]  =  0.000213632


In [21]:
print("Free energy per iteration: ", join(round.(F, digits=3), ", ")) # Inspect free energy

Free energy per iteration: 2.477, 2.455, 2.456, 2.422, 2.454

During the free energy calculation, the differential entropy of the belief for $x$ is required. Because this belief is represented by a sample list (see above), the differential entropy is computed by Monte Carlo summation.

## 6. Nonlinear functions with multiple arguments

Lastly, an inference mechanism for nonlinear functions with multiple arguments is introduced. A joint Gaussian belief is obtained through Laplace approximation, Backward Gaussian messages towards input arguments are computed by marginalization and making use of the incoming messages. A simple example is given below.

In [22]:
# Build model
g = FactorGraph()

f(x, z) = x^2 + z^2

@RV x ~ GaussianMeanVariance(1.0, 2.0)
@RV z ~ GaussianMeanVariance(2.0, 1.0)
@RV m ~ Nonlinear{Sampling}(x, z, g=f, n_samples=1000)
@RV y ~ GaussianMeanVariance(m, 1.0)

placeholder(y, :y)
;

In [23]:
# Infer algorithm
algo = sumProductAlgorithm([x, z, m])
source_code = algorithmSourceCode(algo)
eval(Meta.parse(source_code));
# println(source_code) # Uncomment to inspect source code

In [24]:
# Execute algorithm
data = Dict(:y => 4.2)
marginals = step!(data)
;

In [25]:
marginals[:x] # Inspect results

𝒩(xi=0.91, w=0.99)


In [26]:
marginals[:z]

𝒩(xi=6.75, w=3.63)


In [27]:
marg_m = marginals[:m]
println("The marginal for m is a $(typeof(marg_m)) with mean $(round(mean(marg_m), digits=3)) and variance $(round(var(marg_m), digits=3))")

The marginal for m is a ProbabilityDistribution{Univariate,SampleList} with mean 4.194 and variance 0.94
