<center>
<h1> Naives Bayer Classifier </h1>
</center>

### Classification
The idea of any classification is to discriminate datapoints of a set, by assigning them a class. This discrimination is based on the given characteristics or attributes. <br>
In other words, given a training set and a set of hypothesis, we want to find the most probable hypothesis. <br>
In mathematical terms, given an example described by a set of $p$  attributes $\{a_1,...a_p\}$,we want to find the optimal class: 

\begin{align}
    c_{opt} = argmax_{c_j \in C}(P(c_j|a_1,...,a_p))
    \label{eq1}
    \tag{1}
\end{align}
Where: <br>
$C$ is the set of classes composed by $c_1,...,c_j$

### Naive Bayes
The Naive Bayes Theorem proposes a formula to calculate the probabilities of the different hypothesis.

\begin{align}
    c_{opt} = argmax_{c_j \in C}\frac{P(a_1,...,a_p|c_j)P(c_j)}{P(a_1,...,a_p)}
    \label{eq2}
    \tag{2}
\end{align}

As we only care about finding the maximum argument, we can get rid of the denumerator.

\begin{align}
    c_{opt} = argmax_{c_j \in C}P(a_1,...,a_p|c_j)P(c_j)
    \label{eq3}
    \tag{3}
\end{align}
  
This *naive* Bayes classifier assumes that the values of the attributes are independent, given the class. In mathematic terms: 
\begin{align}
    P(a_1,...,a_p|c_j)=\prod_iP(a_i|c_j)
    \label{eq4}
    \tag{4}
\end{align}
  

Thus, replacing \ref{eq4} in \ref{eq3}, we obtain:
\begin{align}
    c_{opt} = argmax_{c_j \in C}\prod_iP(a_i|c_j)P(c_j)
    \label{eq5}
    \tag{5}
\end{align}

Where:<br>
$P(c_j)$ are the classes prior probability (or "a priori" probabilities) <br>
$P(a_i|c_j)$ are the attributes conditional probabilities, given the class.

## Objective
The objective is to implement a naive bayes classifier froms scratch, that can predict/classify if a person is either British or Scotish, taking into account his/her preferences. <br>
The independent variables are the preferences:<br>
$scones \in \{0,1\}$<br>
$beer \in \{0,1\}$<br>
$whisky \in \{0,1\}$<br>
$oatmeal \in \{0,1\}$<br>
$football \in \{0,1\}$<br>
Where 0 indicates that the person dislikes that attribute and 1 indicates that the person does like it. <br>

The hypothesis space C includes the two different classes (the nationalities) that can be predicted: English ($E$) or Scotish ($S$): <br>
$C=\{E,S\}$

Throughout this notebook, the terms 'independent variables', 'attributes', or 'preferences' are used to refer to the same concept. With regards the hipothesis space, the terms 'dependent variable', 'output', 'class' and 'nationality' are used.

In [7]:
import pandas as pd
import numpy as np

## Data
A simple database was created for training the classifier.<br>
Each column represents one attribute (independent variables).<br>
The last comun represents the class/nationality (dependent variable).<br>
Each row corresponds to each of the training datapoints. <br>

As the aim of this notebook is to understand how a Naive Bayesian classifier works, only a small and invented database is used. In order to test the performance of out classifier, a larger dataset would be needed.

In [8]:
# Read an Excel file into a pandas DataFrame.
data = pd.read_excel("preferences.xlsx") 
# Dependent variable name 
hypothesis = list(data.columns)[-1]
# List of independent variable names
attributes = list(data.columns)[:-1] 
print (data)

    scones  beer  whisky  oatmeal  football nationality
0        0     0       1        1         1           E
1        1     0       1        1         0           E
2        1     1       0        0         1           E
3        1     1       0        0         0           E
4        0     1       0        0         1           E
5        0     0       0        1         0           E
6        1     0       0        1         1           S
7        1     1       0        0         1           S
8        1     1       1        1         0           S
9        1     1       0        1         0           S
10       1     1       0        1         1           S
11       1     0       1        1         0           S
12       1     0       1        0         0           S


We transform our data into numpy matrices, to follow the formats that most machine learning libraries used.<br>
$n$: number of datapoints in our training set.<br>
$p$: number of attributes

In [9]:
# Define some variables to better explore the dataset
X_train = np.array(data[attributes])
y_train = np.array(data[hypothesis])

classes = np.unique(y_train)
p = len(data.columns)-1
n = len(data.index)  

## A Priori Class Probabilities
In this case the "a priori" probability of being from a specific class (nationality) is estimated by counting the number of that specific class and dividing it by the total number of samples in our training set. <br>
However, in another problem, this probabilties could be known information (with no need to estimate it from the dataset).

In [10]:
def prior_probs(data):
    ''' Given a training dataset, return the count and the 'a priori' probabilities for each class. '''
    # Array to keep track of the counts
    class_count = np.zeros((1,len(classes)))
    class_prior = np.zeros((1,len(classes)))
    for j in range(len(classes)):
        class_count[0][j] = len([i for i in data[hypothesis] if i == classes[j]])
        class_prior[0][j] = class_count[0][j]/n
    return class_count, class_prior

In [11]:
class_count, class_prior = prior_probs(data)
print(f"The 'a priori' probabilites are P(E) = {class_prior[0][0]:.3f} and P(S) = {class_prior[0][1]:.3f}")

The 'a priori' probabilites are P(E) = 0.462 and P(S) = 0.538


## Conditional A Priori Attribute Probabilities
##### For the english class
- `eprob0`: The probabilites of not liking each attribute, given that the person is english.
- `eprob1`: The probabilty of liking each attribute, given that the person is english.

Laplacian Correction: we add 1 in the numerator and p=2 in the denominator, to deal with 0 probabilities.

In [16]:
eprob0 = []   
for col in range(0, p):
    ecount0 = 0
    for row in range(0,n):
        if X_train[row,col] == 0 and y_train[row] == classes[0]:
            ecount0 += 1
    # Append to list with Laplace correction
    eprob0.append((ecount0+1)/(class_count[0][0]+2))
    
eprob1 = [(1-i) for i in eprob0] 
print(eprob0)
print(eprob1)

[0.5, 0.5, 0.625, 0.5, 0.5]
[0.5, 0.5, 0.375, 0.5, 0.5]


##### For the Scottish class
- `sprob0`: The probabilites of not liking each attribute, given that the person is Scottish.
- `sprob1`: The probabilty of liking each attribute, given that the person is scottish.

In [17]:
sprob0 = []   
for col in range(0, p):
    scount0 = 0
    for row in range(0,n):
        if X_train[row,col] == 0 and y_train[row] == classes[1]:
            scount0 += 1
    # Append to list with Laplace correction
    sprob0.append((scount0+1)/(class_count[0][1]+2))
    
sprob1 = [(1-i) for i in sprob0] 
print(sprob0)
print(sprob1)

[0.1111111111111111, 0.4444444444444444, 0.5555555555555556, 0.3333333333333333, 0.5555555555555556]
[0.8888888888888888, 0.5555555555555556, 0.4444444444444444, 0.6666666666666667, 0.4444444444444444]


These probabilities can be visualized in the following table for better understanding.

<img src="probs.png" width=400/>

For example:<br>
$P(o=1|S)$ is read: The probability that the person likes oatmeal, given that he/she is Scottish.<br>
$P(f=0|E)$ is read: The probability that the person doesn't like football, given that he/she is English.
## Apply Bayes Formula

In [18]:
def bayes(input_vector):
    prob_vector_e = 1
    prob_vector_s = 1
    for i in range(0,len(input_vector)):
        if input_vector[i] == 0:
            eprob = eprob0[i]
            sprob = sprob0[i]
        elif input_vector[i] == 1:
            eprob = eprob1[i]
            sprob = sprob1[i]
        prob_vector_e = prob_vector_e*eprob
        prob_vector_s = prob_vector_s*sprob
    prob_vector_e = prob_vector_e*class_prior[0][0]
    prob_vector_s = prob_vector_s*class_prior[0][1]
    
    #  Optimal Class  
    if prob_vector_e > prob_vector_s:
        return print("Given the attributes, the person is English")
    elif prob_vector_s > prob_vector_e:
         return print("Given the attributes, the person is Scottish")    

## Make prediction!

In [19]:
inputVector = [1, 1, 1, 1, 1]
bayes(inputVector)

Given the attributes, the person is Scottish


## Bayes' Main Dissadvantages

- It works if the attributes are independent between them.
- The computational cost of calculating all the probabilitie might be high.