# Machine Learning Exercises 11
This set of exercises are about Bayes classifier for generative models. LDA and QDA are both generative
models that are based on Gaussian distributions for the features conditionally on the class. Of course,
unless we know and use the exact distribution of the data, the classifier will be only an approximation to
Bayes classifier.

## Useful information
**Estimated variance matrices in LDA and QDA with multiple features.** In lectures and in
ISLwR we have seen the (bias-corrected) mle for LDA and QDA with a single feature. Here are the
corresponding results for multiple features.  

The covariance matrices needed for QDA with multiple features are obtained for each class $k$ as
$$
\hat{\sigma}^{2}=\frac{1}{n_{k}-1}\sum_{i:y_{i}=k}(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{k})(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{k})^\text{T}
$$
Here, $x_i$ is the feature vector for observation $i$, and $n_k$ is the number of observations in class $k$.  

The shared covariance matrix for LDA is estimated as
$$
\hat{\sigma}^{2}=\frac{1}{n-K}\sum_{k=1}^K\sum_{i:y_i=k}(\boldsymbol{x}_i-\boldsymbol{\mu}_k)(\boldsymbol{x}_i-\boldsymbol{\mu}_k)^\text{T}
$$  

**Multivariate normal distribution in Python.** A multivariate random variable is implemented as
`multivariate_normal` from `scipy.stats`. The pdf for the multivariate normal distribution can be found
as method `pdf`, and random samples can be drawn using method `rvs`.

## LDA decision regions
Let us warm up by sketching decision boundaries for LDA as we did it in lectures.

**Exercise 1.** Below you see three examples of contour curves for a specific value of the three discriminant
functions in a three-class classification with LDA,  
$$
g_k(\boldsymbol{x})=2\log\pi_k-(\boldsymbol{x}-\boldsymbol{\mu}_k)^\text{T}\Sigma^{-1}(\boldsymbol{x}-\boldsymbol{\mu}_k),\;k=1,2,3.
$$
![image.png](attachment:image.png)

For each example, sketch the decision boundaries and decision regions for the LDA classifier as follows:  
  
a) First sketch the decision boundary between each pair of classes.  

b) Then decide the winning colour in each of the six resulting regions  

In the third example, how would boundaries look if the priors were instead equal?

*Solution*:  
![image.png](attachment:image.png)

## LDA – shared covariance matrix for all classes
**Exercise 2.** Imagine that you have a binary classification problem, and that you know that data comes
from a model, where class probabilities are $P(Y = \text{black}) = 0.4$ and $P(Y = \text{red}) = 0.6$ and class
conditionals are multivariate normal with parameters $\boldsymbol{\mu}_\text{black}=(2,1),\boldsymbol{\mu}_\text{red}=(4,2)$, and a shared covariance matrix $\Sigma=\begin{bmatrix}3&-1\\-1&2\end{bmatrix}$.  
  
(a) Explain how to derive Bayes classifier for this model via two discriminant functions, where classification is to the class with highest discriminant function.  
*You do not have to derive the linear discriminant functions as we have done in lectures, but rather you can use directly the pdf for the multivariate normal distribution in the two discriminant functions.*  

(b) Classify a new data point with $X_1 = 3$ and $X_2 = 1$. Compute the posterior probability for each
class.  

(c) Create a plot that shows the decision regions for Bayes classifier.  

(d) Explain how you would simulate from the model.  

(e) Simulate a training set and use it to estimate the model parameters.  

(f) Visualise the decision regions for the LDA classifier trained on the training set and compare to the
decision regions for the Bayes classifier.  

(g) Simulate a test set of 1000 observations and use it to compute both the Bayes error (the minimum
possible error rate, as obtained with Bayes classifier) and the misclassification error for the LDA
classifier trained on data.

*Solution*:  
  
a) If the assumptions of LDA are met, the LDA discriminants will be equal to the Bayes classifier based on Bayes' Theroem:  
$$
\mathrm{P}(Y=k|X=x)=\frac{\pi_{k}f_{k}(x)}{\sum_{l=1}^{K}\pi_{l}f_{l}(x)}
$$  
  
b)  
$$
\begin{align*}
\delta_{k}(x)&=x^{T}\mathbf{\Sigma}^{-1}\mu_{k}-\frac{1}{2}\mu_{k}^{T}\mathbf{\Sigma}^{-1}\mu_{k}+\log \pi_{k} \\
\delta_{\text{black}}\left(\begin{bmatrix}3\\1\end{bmatrix}\right)&=\begin{bmatrix}3&1\end{bmatrix}\begin{bmatrix}3&-1\\-1&2\end{bmatrix}^{-1}\begin{bmatrix}2\\1\end{bmatrix}-\frac{1}{2}\begin{bmatrix}2&1\end{bmatrix}\begin{bmatrix}3&-1\\-1&2\end{bmatrix}^{-1}\begin{bmatrix}2\\1\end{bmatrix}+\log 0.4 \\
&=2.89794 \\
\delta_{\text{red}}(\begin{bmatrix}3\\1\end{bmatrix})&=\begin{bmatrix}3&1\end{bmatrix}\begin{bmatrix}3&-1\\-1&2\end{bmatrix}^{-1}\begin{bmatrix}4\\2\end{bmatrix}-\frac{1}{2}\begin{bmatrix}4&2\end{bmatrix}\begin{bmatrix}3&-1\\-1&2\end{bmatrix}^{-1}\begin{bmatrix}4\\2\end{bmatrix}+\log \pi_{k} \\
&=2.221849 \\
\end{align*}
$$
Posterior probability for black: 56.6%  
Posterior probability for red:   43.4%  
  
c)  
I just, *really* cannot be bothered.

d)  
1. Choose randomly between red and black using the class priors.  
2. Draw from the gaussian estimate corresponding to the chosen class.  
  
e, f, g)  
I did that in [exercise 9](../9/Solutions_9.ipynb).  