# Classification Using Gaussian Process

This Turing tutorial is ported from the Edward tutorial on the same topic: http://edwardlib.org/tutorials/supervised-classification

In supervised learning, the task is to infer hidden structure from
labeled data, comprised of training examples $\{(x_n, y_n)\}$.
Classification means the output $y$ takes discrete values.


In [1]:
# Import Turing and Distributions.
using Turing, Distributions

# Import RDatasets.
using RDatasets

# We need a logit function, which is provided by StatsFuns.
using StatsFuns: logit

# We need Cholesky matrix decomposition provided by LinearAlgebra
using LinearAlgebra

# We need Radial Basis Kernel provided by MLKernels
using MLKernels

# Import MCMCChain, Plots, and StatPlots for visualizations and diagnostics.
using MCMCChain, Plots, StatsPlots

## Data

Use the
[crabs data set](https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/crabs.html) from the RDatasets package,
which consists of morphological measurements on a crab species. We
are interested in predicting whether a given crab has the color form
blue (encoded as 0) or orange (encoded as 1). We use all the numeric features
in the dataset.

In [2]:
data = dataset("MASS", "crabs")
first(data, 6)

Unnamed: 0_level_0,Sp,Sex,Index,FL,RW,CL,CW,BD
Unnamed: 0_level_1,Categorical…,Categorical…,Int32,Float64,Float64,Float64,Float64,Float64
1,B,M,1,8.1,6.7,16.1,19.0,7.0
2,B,M,2,8.8,7.7,18.1,20.8,7.4
3,B,M,3,9.2,7.8,19.0,22.4,7.7
4,B,M,4,9.6,7.9,20.1,23.1,8.2
5,B,M,5,9.8,8.0,20.3,23.0,8.2
6,B,M,6,10.8,9.0,23.0,26.5,9.8


In [3]:
size(data)

(200, 8)

In [4]:
# Create new column for target defualted to zero.
data[:Colour] = 0.0

for i in 1:length(data.Sp)
    # If a row's "Sp" columns say "O" for orange, set them to 1 in our new columns.
    data[:Colour][i] = (data[:Sp][i] == "B" ? 0.0 : 1.0)
end

In [5]:
# Note that we sample alternate points because the first and last 
# 100 in the crbas dataset of RDatasets belong to same class

# Create our labels. These are the values we are trying to predict.
y_train = data[1:2:end, :Colour]

# Get the list of columns to keep.
remove_names = filter(x->!in(x, [:Sp, :Sex, :Index, :Colour]), names(data))

# Filter the train data
X_train = Matrix{Real}(data[1:2:end,remove_names]);

N, D = size(X_train)

println("Number of data points: ", N)
print("Number of features: ", D)

Number of data points: 100
Number of features: 5

## Model

A Gaussian process is a powerful object for modeling nonlinear
relationships between pairs of random variables. It defines a distribution over
(possibly nonlinear) functions, which can be applied for representing
our uncertainty around the true functional relationship.
Here we define a Gaussian process model for classification
(Rasumussen & Williams, 2006).

Formally, a distribution over functions $f:\mathbb{R}^D\to\mathbb{R}$ can be specified
by a Gaussian process
$$
\begin{align*}
  p(f)
  &=
  \mathcal{GP}(f\mid \mathbf{0}, k(\mathbf{x}, \mathbf{x}^\prime)),
\end{align*}
$$
whose mean function is the zero function, and whose covariance
function is some kernel which describes dependence between
any set of inputs to the function.

Given a set of input-output pairs
$\{\mathbf{x}_n\in\mathbb{R}^D,y_n\in\mathbb{R}\}$,
the likelihood can be written as a multivariate normal

\begin{align*}
  p(\mathbf{y})
  &=
  \text{Normal}(\mathbf{y} \mid \mathbf{0}, \mathbf{K})
\end{align*}

where $\mathbf{K}$ is a covariance matrix given by evaluating
$k(\mathbf{x}_n, \mathbf{x}_m)$ for each pair of inputs in the data
set.

The above applies directly for regression where $\mathbb{y}$ is a
real-valued response, but not for (binary) classification, where $\mathbb{y}$
is a label in $\{0,1\}$. To deal with classification, we interpret the
response as latent variables which is squashed into $[0,1]$. We then
draw from a Bernoulli to determine the label, with probability given
by the squashed value.

Define the likelihood of an observation $(\mathbf{x}_n, y_n)$ as

\begin{align*}
  p(y_n \mid \mathbf{z}, x_n)
  &=
  \text{Bernoulli}(y_n \mid \text{logit}^{-1}(\mathbf{x}_n^\top \mathbf{z})).
\end{align*}

Define the prior to be a multivariate normal

\begin{align*}
  p(\mathbf{z})
  &=
  \text{Normal}(\mathbf{z} \mid \mathbf{0}, \mathbf{K}),
\end{align*}

with covariance matrix given as previously stated.

Let's build the model in Turing. We use a radial basis function (RBF)
kernel, also known as the squared exponential or exponentiated
quadratic. It returns the kernel matrix evaluated over all pairs of
data points; we then Cholesky decompose the matrix to parameterize the
multivariate normal distribution.

In [11]:
function sigmoid(z)
  return 1.0 ./ (1.0 .+ exp(-z))
end

sigmoid (generic function with 1 method)

In [10]:
# Gaussian Process classification
@model gp_classification(x, y, n) = begin
    
    # Calculate distance matrix using Radial Basis Kernel
    distmat = RadialBasisKernel(x)
    
    # Cholesky decompose the matrix to parameterize the normal distribution
    cov = cholesky(distmat).L
    
    # Define the prior to be a multivariate normal
    logits ~ MvNormal(0, cov)
    
    # Get probabilities for classification from the prior
    probs = sigmoid(logits)
    
    # For classfication treat response as latent variable squashed to [0,1]
    for i = 1:n
        y[i] ~ Bernoulli(probs[i])
    end

end;

## Inference

Perform approximate inference using No U-Turn Sampling

In [12]:
n_obs, n_vars = size(X_train)
model = gp_classification(X_train, y_train, n_obs)
chain = sample(model, NUTS(1500, 200, 0.65));

MethodError: MethodError: no method matching SquaredExponentialKernel(::Array{Real,2})
Closest candidates are:
  SquaredExponentialKernel() at /home/sheshank/.julia/packages/MLKernels/DqEdF/src/kernelfunctions/mercer/squaredexponential.jl:23
  SquaredExponentialKernel(!Matched::T<:Real) where T<:Real at /home/sheshank/.julia/packages/MLKernels/DqEdF/src/kernelfunctions/mercer/squaredexponential.jl:23

In [8]:
plot(chain)

UndefVarError: UndefVarError: chain not defined

In [9]:
describe(chain)

UndefVarError: UndefVarError: chain not defined