<table style="background-color:#F5F5F5;" width="100%">
<tr><td style="background-color:#F5F5F5;"><img src="../images/logo.png" width="100" align='right'/></td></tr>     <tr><td>
            <h1><center>Aprendizagem Automática em Engenharia Biomédica</center></h1>
            <h3><center>1st Semester - 2022/2023</center></h3>
            <h4><center>Universidade Nova de Lisboa - Faculdade de Ciências e Tecnologia</center></h4>

</td></tr>
    <tr><td><h2><b><center>Lab 6 - Statistical Machine Learning</center></b></h2>
    <h4><i><b><center>Bayes Rule, Naive Bayes and Breast Cancer</center></b></i></h4></td></tr>
</table>

## 1. Classification Approaches

A common taxonomy to divide classification models is to look at their learning process. In this sense, there are two main types of models:

* __Discriminative Models__: These models try to model the conditional probability $P(Y | X)$ directly by looking at the distinctive factors between classes, and learning decision boundaries.

* __Generative Models__: On the contrary, these algorithms model the characteristics of the data, namely the conditional probability $P(X|Y)$. Then, a known model is used to find the $P(Y | X)$, such as the Bayes Theorem.

<div>
<img src="attachment:disc_.png" width="500"/>
</div>


## 2. A Statistical Approach Towards Machine Learning

In [None]:
# %matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt

Let us consider a classification problem with two classes and one feature: 

C1 with $$\mu_1 = -2, \sigma_1 = 1$$   
C2 with $$\mu_2 = 2, \sigma_2 = 1$$ 

and assume the classes to be equiprobable: $$P(w_1) = P(w_2) = 0.5$$

__The Bayes Rule:__

$$P(w_i|x) = \frac{p(x|w_i)P(w_i)}{P(x)} = \frac{p(x|w_i)P(w_i)}{\sum_{j=1}^{C} p(x|w_j)P(w_i)}$$

with $P(w_i|x)$ being the *a posteriori* class probability for class $w_i$ defined in terms of the class conditional pdf of observations and the *a priori* class probability, $P(w_i)$. 

__Step 0__: Assign the class *priors* to `pw1` and `pw2` variables

In [None]:
pw1 = 0.5
pw2 = 0.5

__Step 1__: Define a function to calculate the probability density function of a normal distribution.

$f(x) =\frac{1}{\sqrt{2\sigma^2\pi}}\, e^{-\frac{(x - \mu)^2}{2 \sigma^2}}$

In [None]:
def npdf(x,u,s):
    return None # implement here

__Step 2__: Call the previous function to plot the likelihood of each class, $p(x|w_i)$.

In [None]:
# x range
dx = 0.1
x = np.arange(-7, 7, dx)

# Class 1
u1 = None
s1 = None

p1 = None 

# Class 2
u2 = None
s2 = None

p2 = None 

# Plot
plt.figure(figsize=(8, 5))
plt.plot(x, p1, label='$p(x|w_1)$')
plt.plot(x, p2, label='$p(x|w_2)$')
plt.xlabel('Feature values')
plt.ylabel('Density')
plt.title('Likelihood')
plt.legend()

__Step 3__: Calculate the *prior* probability of the evidence, $P(x)$, over the defined feature vector. Plot the result.

In [None]:
# Evidence
px = None

# Plot
plt.figure(figsize=(8, 5))
plt.plot(x, px, label='$p(x)$')
plt.xlabel('Feature values')
plt.ylabel('Density')
plt.legend()

__Step 4__: Calculate the *a posteriori* class probability for each class according to the Bayes rule, i.e. $P(w_i|x)$.

In [None]:
# A posterior probability
prob1 = None
prob2 = None

# Plot
plt.figure(figsize=(8, 5))
plt.plot(x, prob1, label='$P(w_1|x)$')
plt.plot(x, prob2, label='$P(w_2|x)$')
plt.xlabel('Feature values')
plt.ylabel('Probability')
plt.legend();

__Step 5__: What is the probability of being class 1 and 2 given `xi = -1`?

In [None]:
x1 = -1

prob_c1 = None
prob_c2 = None

print('Probability of Class 1:', prob_c1)
print('Probability of Class 2:', prob_c2)

__Step 6__: What is the probability of being class 1 and 2 given `x2 = 0`?

In [None]:
x2 = 0

prob_c1 = None
prob_c2 = None

print('Probability of Class 1:', prob_c1)
print('Probability of Class 2:', prob_c2)

__Step 7__: Plot both examples overlaid with plot from __Step 4__.

In [None]:
# Plot


__Step 8__: Define the discriminative functions for this problem?

In [None]:
g1 = None
g2 = None

__Step 9__: What is the separation surface between class 1 and 2? Plot the result using the discriminative functions. Assign the result to `x_sep` variable.

In [None]:
sep_surf = None

x_sep = None

# Plot
plt.figure(figsize=(8, 5))
plt.plot(x, g1, label='$P(w_1|x)$', color='C0')
plt.plot(x, g2, label='$P(w_2|x)$', color='C1')
plt.vlines(x_sep, 0, np.max([g1, g2]), color='k')
plt.xlabel('Feature values')
plt.ylabel('Density')
plt.legend();

__Step 10__: What is the error of this classification problem?

Derive the error formula by integrating the discriminative functions of __Step 8__.

Compute the error as $P_e = 1 - \int_{-\infty}^{\infty}\max_i(g_i)$

In [None]:
ac = None

error_g = None

print('Error:', error_g)

plt.figure(figsize=(8, 5))
plt.plot(x, ac, color='gray', lw=10, alpha=0.5)
plt.plot(x, g1, label='$g1$', color='C0')
plt.plot(x, g2, label='$g2$', color='C1')
plt.xlabel('Feature values')
plt.ylabel('Density')
plt.legend();

__Step 11__: Compare the error from __Step 10__ with the analytical error.


Calculate the error analiticaly: $ e =  2 \int_{-\infty}^{x_{sep}} g_2 = 2 \int_{-\infty}^{x_{sep}} g_1$ 

$cdf = \int pdf, cdf = \frac12\left[1 + \operatorname{erf}\left( \frac{x-\mu}{\sigma\sqrt{2}}\right)\right] p(w) $

_Hint:_ use the [erf()](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.erf.html) method from scipy.

In [None]:
from scipy.special import erf

error_a = None

print('Error:', error_a)

__Step 12__: Use sklearn to learn and test the classification. Compare the obtained error with the __Step 10__ and __Step 11__.

##### 1. Data generation.

In [None]:
n_points = 1e6

y = np.random.randint(2, size=int(n_points))
X = np.random.randn(int(n_points)) + y.astype('float') * 4. - 2

plt.figure()
plt.hist(X,100);

X = X.reshape(-1, 1)

##### 2. Divide the data in test and train sets.

Use a 50-50 split.

In [None]:

X_train, X_test, y_train, y_test = None

##### 3. Fit and classify. Use the GaussianNB classifier.

The [GaussianNB()](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html) classifier also resorts to the Gaussian to model the likelihood of the features:

$$P(X_n|Y) =\frac{1}{\sqrt{2\pi\sigma_Y^2}}\, exp(-\frac{(X_n - \mu_Y)^2}{2 \sigma_Y^2})$$

In [None]:

gnb = None

y_pred = None

##### 4. Measure the error.

In [None]:
error_gnb = None

print('Error:', error_gnb)

##### 5. Observe the trained parameters.

In [None]:
gnb.var_

In [None]:
gnb.theta_

## 3. Training a Naive Bayes on the Breast Cancer Dataset

##### 2.1.1. Loading the Breast Cancer Data

Let us recover the [Breast Cancer Dataset](https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data) from UCI Repository.

The following method applies all loading and preprocessing steps, returning the ready-to-use train and test sets:

In [None]:
from lab6_code import load_process_breast_cancer_dataset

X_train, X_test, y_train, y_test, features_name = load_process_breast_cancer_dataset()

In [None]:
import pandas as pd
import numpy as np

pd.DataFrame(X_train, columns= features_name).head()

##### 2.1.2. Training the Naive Bayes Classifier

__Exercice 1__: Leveraging the Breast Cancer Dataset, train a Naive Bayes classifier and evaluate the results through the appropriate metrics. 