# Overview:
We will be running "Gaussian Discriminant Analysis" on the breast_cancer dataset. This demonstration aims to show how the respective formulae apply for binary classification, as a generative learning algorithm.

# Generative algorithms - Differences in working:
1. Discriminant methods such as GLM's or Newton's method in Logistic Regression, aim to create a mapping $f:X \rightarrow Y$, where X is the dataset input labels and Y is the dataset output labels. We do this by estimating the parameters used in determing $P(Y = y^{(i)} | \, X = x^{(i)})$, such that the joint log-likelihood, $l(\theta) = \sum_{i = 1}^{m} log \, (P \,(Y = y^{(i)} | \, X = x^{(i)} \, ; \theta))$ is maximised in terms of $\theta$.

2. Generative methods, on the other hand, aim to create a mapping $f : Y \rightarrow X$, wherein we calculate the parameters used in determing $P(X = x^{(i)} | \, Y = y^{(i)})$, such that the joint log-likelihood statement is replaced by the Bayes' expansion of the term $(P \,(Y = y^{(i)} | \, X = x^{(i)} \, ; \theta)) = \frac{P(X = x^{(i)} | \, Y = y^{(i)}) P(Y = y^{(i)})}{P(X = x^{(i)})}$ where $P(X = x^{(i)}) = P(X = x^{(i)} | \, Y = 1)P(Y = 1) + P(X = x^{(i)} | \, Y = 0)P(Y = 0)$, assuming Y is a binary variable.

3. As a design choice, we assume $x^{(i)} \in \mathbb{R}^{n}$ such that $x^{(i)} \sim \mathcal{N} (\vec{\mu_{(y^{(i)})}}, \, \sum)$ (Multivariate Gaussian, where $\sum \in \mathbb{R}^{n \times n}$ and $\mu \in \mathbb{R}^{n}$) and $y^{(i)} \sim Bernoulli(\phi)$ (where $\phi \in (0, 1)$). This would mean their distributions will be given by: $$P(X = x \,| \, Y = y) = \frac{1}{(2 \pi)^{\frac{n}{2}} |\sum|^{\frac{1}{2}}} exp(-\frac{1}{2}(x - \vec{\mu})^{T} (\sum)^{-1} (x - \vec{\mu}))$$ where $\sum$ is covariance matrix and $\mu$ is the mean. For Y: $$P(Y = y) = \phi^{y} (1 - \phi)^{1 - y}$$ where $\phi$ is the probability of y = 1.

4. The design choice for continuous x being Gaussian distributed is just a general result of the Central Limit Theorem.

In [41]:
#Importing all the main libraries + dataset.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer
import math

In [42]:
#Fetching the values from the dataset.
data = load_breast_cancer()
X_data = data.data
Y_data = data.target.reshape(-1, 1)
#Verifying shapes.
print(X_data.shape)
print(Y_data.shape)

(569, 30)
(569, 1)


In [43]:
#Scaling the sample elements in the dataset.
scaler = StandardScaler()
X_data_scaled = scaler.fit_transform(X_data)
#Slicing the dataset into two matrix parts.
#We will use the first 500 samples to train our model's parameters, and the next 69 as queries to be used on the model.
X = X_data_scaled[:500, :]
X_queries = X_data_scaled[500:, :]
Y = Y_data[:500, :]
Y_queries = Y_data[500:, :]
#Verifying the shapes.
print(X.shape)
print(X_queries.shape)
print(Y.shape)
print(Y_queries.shape)

(500, 30)
(69, 30)
(500, 1)
(69, 1)


In [44]:
#Now, we fetch all samples with outputs 0 and 1 respectively.
y = Y.flatten()
#We need to convert Y into a 1D array for boolean indexing. 
X_0 = X[y == 0]
X_1 = X[y == 1]
#Verifying the shapes.
print(X_0.shape)
print(X_1.shape)

(195, 30)
(305, 30)


In [45]:
#Now, finding the parameters u_1 and u_0, or the means:
u_1 = np.mean(X_1, axis = 0, keepdims = True)
u_0 = np.mean(X_0, axis = 0, keepdims = True)
#Now finding phi:
phi = X_1.shape[0] / X.shape[0]
#Now finding the covariance matrix:
u_data = np.mean(X, axis = 0, keepdims = True)
X_dev = X - u_data
sigma = np.zeros((X.shape[1], X.shape[1]))
#Verifying shapes.
print(X_dev.shape)
print(u_data.shape)
print(sigma.shape)
#Summing over all deviation matrices per sample and calculating mean accordingly
i = 0
#Set iterator at 0.
while(i < X.shape[0]):
    #Fetching ith row from the X_dev matrix.
    X_dev_i = X_dev[[i], :]
    #Calculating the deviation matrix for this row.
    sig_i = X_dev_i.T @ X_dev_i
    #Summing it up.
    sigma += sig_i
    i += 1
#Factoring in X.shape[0] so as to get a mean value.
sigma /= X.shape[0]
#Verifying shape.
print(sigma.shape)

(500, 30)
(1, 30)
(30, 30)
(30, 30)


In [46]:
#Now, using these parameters, we define the Probability Distributions required.
#X_q is the qth query from the query matrix.
def gauss_0(X_q):
    return np.exp(-(X_q - u_0) @ np.linalg.pinv(sigma) @ (X_q - u_0).T / 2) / (((2 * math.pi)**(X.shape[1] / 2)) * ((np.linalg.det(sigma))**(0.5)))
    #Multivariate gaussian for samples with output labels as 0.
def gauss_1(X_q):
    return np.exp(-(X_q - u_1) @ np.linalg.pinv(sigma) @ (X_q - u_1).T / 2) / (((2 * math.pi)**(X.shape[1] / 2)) * ((np.linalg.det(sigma))**(0.5)))
    #Multivariate gaussian for sample with output labels as 1.
def p_of_x(X_q):
    #We return P(x) here, expanded by Bayes' decomposition rule.
    return gauss_0(X_q)*(1 - phi) + gauss_1(X_q)*phi

In [47]:
#Now, we iterate through the query space, and record the accuracy of our trained model.
q = 0
#Set iterator at 0.
#We will use a counter to record the no. of correct predictions while iterating through our query space.
acc_cnt = 0
while(q < X_queries.shape[0]):
    #Slicing the sample query.
    X_q = X_queries[[q], :]
    #Fetching conditional probability of x given y = 1.
    p_y_1 = (gauss_1(X_q) * phi) / p_of_x(X_q)
    #Since these are complimentary probabilities; p_y_0 = 1 - p_y_1.
    p_y_0 = 1 - p_y_1
    #Basic boolean checking to see if prediction is correct.
    if p_y_1 >= p_y_0:
        if Y_queries[q, 0] == 1:
            acc_cnt += 1
    else:
        if Y_queries[q, 0] == 0:
            acc_cnt += 1
    q += 1
#Printing accuracy percentage.
print(acc_cnt)
print((acc_cnt / X_queries.shape[0]) * 100)

67
97.10144927536231


Since, we have to find out $P(X = x \, | \, Y = 1)$ and $P(X = x \, | \, Y = 0)$ both, we create two different PDF's with two separate mean parameters given by $$ P(X = x \, | \, Y = 1) \sim \mathcal{N}(\vec{\mu_{1}}, \sum) $$ for Y = 1 and $$ P(X = x \, | \, Y = 0) \sim \mathcal{N}(\vec{\mu_{0}}, \sum) $$ for Y = 0. These 2 separate PDF's help in modelling out the conditional probability $P(X = x | Y = y)$.

We assume that the covariance matrix stays the same in the 2 PDF's, which may not be true always. This is a simple design choice.

Anyways, thank you for sticking around :)