# Generative Gaussian Models

We'll use again the *iris* dataset, and solve the iris classification prolem using Gaussian classifiers.

In [13]:
import numpy as np
import matplotlib.pyplot as plt
from load_dataset import loadDataSet                                #for loading the dataset
from train_validation_split import splitTrainingValidation          #for splitting the dataset into training and validation sets
from mean_covariance import vcol, vrow, compute_mu_C              #for computing the empirical mean and the empirical covariance of the dataset 

In [7]:
numFeatures = 4

#load the iris dataset
D, L = loadDataSet('iris.csv', numFeatures)
print("Data shape: ", D.shape)
print("Labels shape: ", L.shape)

Data shape:  (4, 150)
Labels shape:  (150,)


In [55]:
#split the dataset into training and validation sets
#DTR and LTR are training data and labels, DTE and LTE are evaluation (or more precisely validation) data and labels
(DTR, LTR), (DVAL, LVAL) = splitTrainingValidation(2/3, D, L)
print("Training data shape: ", DTR.shape)
print("Training labels shape: ", LTR.shape)
print("Evaluation data shape: ", DVAL.shape)
print("Evaluation labels shape: ", LVAL.shape)

Training data shape:  (4, 100)
Training labels shape:  (100,)
Evaluation data shape:  (4, 50)
Evaluation labels shape:  (50,)


We use 100 samples for training and 50 samples for evaluation.

## Multivariate Gaussian Classifier
The optimal Bayes decision is to select for each test point the class with highest **posterior probability**: having class $c$ and $x_{t}$ as test point, we can thus write:
$$
c_{t}^{*} = argmax_{c} P (C_{t} = c \mid \mathbf{X}_t = \mathbf{x}_t) \rightarrow We \space assign \space x_{t} \space to \space the \space class \space having \space the \space highest \space Posterior \space probability
$$ 
We will assume that the samples are independent and identically distributed (*i.i.d.*) according to $(\mathbf{X}_t, C_{t}) ∼ (\mathbf{X}, C)$. <br>
Let $f_{X,C}$ be the joint density of $X, C$: we can
compute the joint likelihood for the hypothesized class $c$ for the observed test
sample $x_{t}$ as $f_{X,C}(x_{t}, c)$ and then use **Bayes rule** to compute the class posterior
probability:
$$
P(C_t = c \mid \mathbf{X}_t = \mathbf{x}_t) = \frac{f_{\mathbf{X},C}(\mathbf{x}_t, c)}{\sum_{c' \in C} f_{\mathbf{X},C}(\mathbf{x}_t, c')}
$$
We can factorize the joint density as:
$$
f_{\mathbf{X}_t, C_t}(\mathbf{x}_t, c) = f_{\mathbf{X} \mid C}(\mathbf{x}_t \mid c) P(c)
$$
Where:
- $f_{\mathbf{X} \mid C}(\mathbf{x}_t \mid c)$ is the conditional distribution 
- $P(c)$ is called *Prior probabilty*: it's application-dependent and describes the probability of the class being $c$ **before** we observe $x_{t}$ 

In this specific case, we assume that our data, given the class, can be described by a **Gaussian distribution**:
$$
(\mathbf{X}_t \mid C_{t} = c) ∼ (\mathbf{X} \mid C = c) ∼ \mathcal{N}(\mathbf{µ_{c}}, \mathbf{Σ_{c}})
$$
If we knew $\mathbf{µ_{c}}$, $\mathbf{Σ_{c}}$ then we could compute the conditional this way;
$$
f_{\mathbf{X} \mid C}(\mathbf{x}_t \mid c) = \mathcal{N}(\mathbf{µ_{c}}, \mathbf{Σ_{c}})
$$
The problem is that we don't have **these parameters** $\theta = [(\mathbf{µ_{1}}, \mathbf{Σ_{1}}), . . . ,(\mathbf{µ_{k}}, \mathbf{Σ_{k}})] $, where $k$ is the number of different classes. <br>
However, since we have at our disposal a *labeled Dataset*, we can assume:
- Gaussian distribution for $\mathbf{X} \mid C$
- That, given the model parameters $\theta$, all the samples observations are *i.i.d* 

After (and only after) making these assumptions, we can plug in the **Maximum Likelihood Estimators** (*MLE*), which, for a **MVG** distribution, are the empirical mean and covariance matrix of each class:
$$
\mu^{MLE}_{c} = \frac{1}{N_c} \sum_{i} x_{c,i}, \quad 
\Sigma^{MLE}_{c} = \frac{1}{N_c} \sum_{i} (x_{c,i} - \mu^*_c)(x_{c,i} - \mu^*_c)^T
$$
Where $x_{c,i}$ is the $i$-th sample of class $c$.


In [18]:
#Compute the MLE estimators of a MVG distribtion, which are the empirical mean and covariance of the training data
mu_0, C_0, = compute_mu_C(DTR[:, LTR == 0])
mu_1, C_1, = compute_mu_C(DTR[:, LTR == 1])
mu_2, C_2, = compute_mu_C(DTR[:, LTR == 2])

print(f"mu_0:\n{mu_0}\nShape: {mu_0.shape}")
print(f"mu_1:\n{mu_1}\nShape: {mu_1.shape}")
print(f"mu_2:\n{mu_2}\nShape: {mu_2.shape}")
print(f"C_0:\n{C_0}\nShape: {C_0.shape}")
print(f"C_1:\n{C_1}\nShape: {C_1.shape}")
print(f"C_2:\n{C_2}\nShape: {C_2.shape}")

mu_0:
[[4.96129032]
 [3.42903226]
 [1.46451613]
 [0.2483871 ]]
Shape: (4, 1)
mu_1:
[[5.91212121]
 [2.78484848]
 [4.27272727]
 [1.33939394]]
Shape: (4, 1)
mu_2:
[[6.45555556]
 [2.92777778]
 [5.41944444]
 [1.98888889]]
Shape: (4, 1)
C_0:
[[0.13140479 0.11370447 0.02862643 0.01187305]
 [0.11370447 0.16270552 0.01844953 0.01117586]
 [0.02862643 0.01844953 0.03583767 0.00526535]
 [0.01187305 0.01117586 0.00526535 0.0108845 ]]
Shape: (4, 4)
C_1:
[[0.26470156 0.09169881 0.18366391 0.05134068]
 [0.09169881 0.10613407 0.08898072 0.04211203]
 [0.18366391 0.08898072 0.21955923 0.06289256]
 [0.05134068 0.04211203 0.06289256 0.03208448]]
Shape: (4, 4)
C_2:
[[0.30080247 0.08262346 0.18614198 0.04311728]
 [0.08262346 0.08533951 0.06279321 0.05114198]
 [0.18614198 0.06279321 0.18434414 0.04188272]
 [0.04311728 0.05114198 0.04188272 0.0804321 ]]
Shape: (4, 4)


Given the estimated model, we now turn our attention towards inference for a test sample $x$. As we
have seen, the final goal is to compute class posterior probabilities $P(c \mid \mathbf{x})$. We split the process in three
stages:

*Stage 1*: For each sample we compute the likelihoods, so the class conditional probabilities as:
$$
f_{X|C} (x_t | c) = \mathcal{N} (x_t | \mu^*_c, \Sigma^*_c)
$$

**Beware**: model params were estimated using the *training samples*, whereas densities are computed using *estimation samples*!

In [56]:
from logpdf_loglikelihood_GAU import logpdf_GAU_ND

#For each class Compute the log-pdf of the training data given the MLE parameters of the MVG distribution
#It's better to compute the log-pdf and not the pdf, because the pdf can be very small and can cause numerical problems (underflow)
#Then the logpdf gets exponentiated and the numerical problems are avoided
logpdf_0 = logpdf_GAU_ND(DVAL, mu_0, C_0)
logpdf_1 = logpdf_GAU_ND(DVAL, mu_1, C_1)
logpdf_2 = logpdf_GAU_ND(DVAL, mu_2, C_2)

print(f"logpdf_0 Shape: {logpdf_0.shape}")
print(f"logpdf_1 Shape: {logpdf_1.shape}")
print(f"logpdf_2 Shape: {logpdf_2.shape}")


logpdf_0 Shape: (50,)
logpdf_1 Shape: (50,)
logpdf_2 Shape: (50,)


In [57]:
#Now in order to compute the pds I exponentiate the log-likelihoods
pds_0 = np.exp(logpdf_0)
pds_1 = np.exp(logpdf_1)
pds_2 = np.exp(logpdf_2)

In [58]:
#Now i compute the likelihood of the training data given the MLE parameters of the MVG distribution
#The likelihood is the sum of the pdfs of all classes
likelihood_0 = np.sum(pds_0)
likelihood_1 = np.sum(pds_1)
likelihood_2 = np.sum(pds_2)
print(f"Likelihood of class 0: {likelihood_0}")
print(f"Likelihood of class 1: {likelihood_1}")
print(f"Likelihood of class 2: {likelihood_2}")


Likelihood of class 0: 92.71859585946842
Likelihood of class 1: 21.00076276268103
Likelihood of class 2: 7.586133742614116


In [76]:
def scoreMatrix_LogPdf_GAU(D, params):
    """
    Compute the log-Pdf of the data given the parameters of a Gaussian distribution
    and populate the score matrix S with the log-pdf of each class
    #The score matrix is filled with the log-pdfs of the training data given the MLE parameters of the MVG distribution
    #S[i, j] is the log-pdf of the j-th sample given the i-th class

    Parameters:
    - D: the data matrix of shape (numFeatures, numSamples)
    - params: the model parameters, so  list of tuples (mu, C) where mu is the mean vector fo class c and C is the covariance matrix of class c

    Returned Values:
    - S: the score matrix of shape (numClasses, numSamples) where each row is the score of the class given the sample

    """
    numClasses = len(params) #number of classes, since for each class we have a tuple (mu, C)
    S = np.zeros((numClasses, D.shape[1]))
    for label in range(numClasses):
        S[label, :] = np.exp(logpdf_GAU_ND(D, params[label][0], params[label][1]))

    return S

In [77]:
#Compute score matrix S of log likelihoods for each sample and class
S_logLikelihoods = scoreMatrix_LogPdf_GAU(DVAL, [(mu_0, C_0), (mu_1, C_1), (mu_2, C_2)])
print(f"Score matrix shape: {S_logLikelihoods.shape}")

Score matrix shape: (3, 50)


*Stage 2*: We multiply the class conditional probabilities, computed before, with the class *Prior* probabilities. In
the following we assume that the three classes have the same Prior probability $P(c) = 1/3$. We can thus
compute the joint distribution for samples and classes as:
$$
f_{X,C}(x_t, c) = f_{X|C}(x_t | c) P_C(c)
$$


In [78]:
def computeSJoint(S, Priors):
    """
    Compute the joint densities by multiplying the score matrix S with the Priors
    #The joint densities are the product of the score matrix S with the Priors

    Parameters:
    - S: the score matrix of shape (numClasses, numSamples) where each row is the score of the class given the sample
    - Priors: the priors of the classes, so a list of length numClasses

    Returned Values:
    - SJoint: the joint densities of shape (numClasses, numSamples) where each row is the joint density of the class given the sample
    """
    numClasses = len(Priors) #number of classes, since we have 1 prior for each class

    for classIndex in range(numClasses):
        #multiply each row of S (where 1 row corresponds to a class) with the prior of the class
        S[classIndex, :] *= Priors[classIndex]


    return S

In [79]:
SJoint_MVG = computeSJoint(S_logLikelihoods, [1/3, 1/3, 1/3])
print(f"Joint densities shape: {SJoint_MVG.shape}")

SJoint_MVG_Sol = np.load("./solutions/SJoint_MVG.npy")

#Check if the joint densities are equal to the solution



Joint densities shape: (3, 50)


In [80]:
SJoint_MVG_Sol

array([[1.58575943e+000, 1.04243514e+000, 2.72957564e-062,
        2.24806026e-182, 1.41705326e-209, 3.71159641e+000,
        2.73554357e+000, 1.77783391e+000, 1.88494475e-075,
        1.38281331e-073, 6.75473622e-001, 1.25923986e-001,
        3.27963237e-067, 2.60765569e-001, 4.86691243e-246,
        1.59659471e-034, 6.14615463e-184, 5.56896774e-073,
        1.52018824e-001, 1.47475180e-106, 5.73724955e-003,
        2.69557088e-213, 2.93005929e+000, 2.04486323e+000,
        4.46535964e-173, 3.57224919e+000, 6.01202795e-177,
        1.73618815e-061, 1.22661255e-052, 3.04373991e-037,
        5.93105804e-156, 1.88023605e-152, 5.65586443e-100,
        4.20695998e-108, 8.00340945e-001, 1.16366505e-043,
        2.36030153e-176, 1.06685624e-127, 3.61775980e+000,
        5.06643905e-078, 7.20278540e-061, 3.35571731e-106,
        1.18718880e-076, 7.59266808e-002, 1.64162425e+000,
        1.05273322e+000, 1.23817483e-141, 2.02173642e-053,
        1.05499498e-212, 3.09755432e+000],
       [6.086

In [81]:
SJoint_MVG

array([[1.58575943e+000, 1.04243514e+000, 2.72957564e-062,
        2.24806026e-182, 1.41705326e-209, 3.71159641e+000,
        2.73554357e+000, 1.77783391e+000, 1.88494475e-075,
        1.38281331e-073, 6.75473622e-001, 1.25923986e-001,
        3.27963237e-067, 2.60765569e-001, 4.86691243e-246,
        1.59659471e-034, 6.14615463e-184, 5.56896774e-073,
        1.52018824e-001, 1.47475180e-106, 5.73724955e-003,
        2.69557088e-213, 2.93005929e+000, 2.04486323e+000,
        4.46535964e-173, 3.57224919e+000, 6.01202795e-177,
        1.73618815e-061, 1.22661255e-052, 3.04373991e-037,
        5.93105804e-156, 1.88023605e-152, 5.65586443e-100,
        4.20695998e-108, 8.00340945e-001, 1.16366505e-043,
        2.36030153e-176, 1.06685624e-127, 3.61775980e+000,
        5.06643905e-078, 7.20278540e-061, 3.35571731e-106,
        1.18718880e-076, 7.59266808e-002, 1.64162425e+000,
        1.05273322e+000, 1.23817483e-141, 2.02173642e-053,
        1.05499498e-212, 3.09755432e+000],
       [6.086