<h2>Introduction to k-Nearest Neighbours and Naive Bayes</h2>

In [1]:
import warnings
warnings.filterwarnings('ignore')

<h2>1. The k-Nearest Neighbors algorithm</h2>

The k-nearest neighbours (kNN) algorithm is a multi-task model used for regression and classification that bases its prediction scheme on the majority vote of k neighbours.

<img src="Media/knn-illustration.png" width="600px"/>


<h3>1.1. kNN - Classification</h3>

In a classification problem, kNN prediction is based on the majority votes of points around it:

$$
\hat{y} = f(x) = \text{argmax}_{y \in Y} \sum_{i=1}^{k} I(y_i = y)
$$


where $I(y_i = y)$ is the indicator function that equals 1 if $y_i$  is equal to y, and 0 otherwise, 
$\hat{y}$ is the predicted class label for the new data point x, and $y_i$ are the class labels of the 
k nearest neighbors of x. <span style="color:blue;">The predicted class label $\hat{y_i}$ is the class label that occurs most frequently among the k nearest neighbors of the new data point x.</span>

<h3>1.2. kNN - Regression</h3>

In a regression problem, kNN prediction is based on the mean value of k-points closest to the point of interest.

$$
 \hat{y} = f(x) = \frac{1}{k}\sum_{j\in N_k(x)}^{}y_j
$$

where $y_i$ are the target values of the k nearest neighbors of x.

The kNN algorithm yields good predictive results in many supervised learning problems. <span style="color:blue;">The success of the 
method depends on the good estimation of k.</span>

<h3>1.3. kNN - Python Implementation</h3>

In [21]:
import pandas as pd

df = pd.read_csv('../../datasets/Healthcare-Diabetes.csv')
df.head()

Unnamed: 0,Id,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,1,6,148,72,35,0,33.6,0.627,50,1
1,2,1,85,66,29,0,26.6,0.351,31,0
2,3,8,183,64,0,0,23.3,0.672,32,1
3,4,1,89,66,23,94,28.1,0.167,21,0
4,5,0,137,40,35,168,43.1,2.288,33,1


In [22]:
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

X = df.iloc[:, 1:9]#get features
y = df.iloc[:,[-1]]#get target variable

#---Data Scaling
Sc = StandardScaler()
Sc.fit(X)
X_d = Sc.transform(X)

In [23]:
#--Train - Test Split
X_train, X_test,y_train,y_test = train_test_split(X_d,y, test_size=0.2, random_state=1234)

In [24]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics

k = 3 #number of neighbors
knn_classifier = KNeighborsClassifier(n_neighbors=k) #kNN model configuration

knn_classifier.fit(X_train,y_train)
y_pred_test = knn_classifier.predict(X_test)
class_acc_knn = metrics.accuracy_score(y_test,y_pred_test)#get classification accuracy

In [25]:
from sklearn.metrics import classification_report
targets = ['no-diabetes','has-diabetes']

print("kNN (%d) - Test Performance: \n"%k,classification_report(y_test,y_pred_test,target_names=targets))

kNN (3) - Test Performance: 
               precision    recall  f1-score   support

 no-diabetes       0.91      0.97      0.94       349
has-diabetes       0.93      0.84      0.88       205

    accuracy                           0.92       554
   macro avg       0.92      0.90      0.91       554
weighted avg       0.92      0.92      0.92       554



<h3>1.3.2 kNN - Hyperparameter tuning</h3>

In [26]:
from sklearn.model_selection import GridSearchCV

parameters = {'n_neighbors':(2,3,5,6,7)}

knn_clf = KNeighborsClassifier()
clf = GridSearchCV(knn_clf, parameters,cv=5) #VALIDATION SET????
clf.fit(X_train,y_train)
clf.best_params_

{'n_neighbors': 2}

In [27]:
optimal_k = clf.best_params_['n_neighbors']
knn_c = KNeighborsClassifier(n_neighbors=optimal_k)
knn_c.fit(X_train, y_train)

y_pred_test_c = knn_c.predict(X_test)
print("kNN (%d) - Test Performance: \n"%optimal_k,classification_report(y_test,y_pred_test_c,target_names=targets))

kNN (2) - Test Performance: 
               precision    recall  f1-score   support

 no-diabetes       0.91      0.99      0.95       349
has-diabetes       0.98      0.84      0.90       205

    accuracy                           0.93       554
   macro avg       0.94      0.91      0.93       554
weighted avg       0.94      0.93      0.93       554



<h2>2. The Naive Bayes</h2>

The Naive Bayes classifier is a classification method that takes the assumption of independence in features to relax the Bayes theorem and build a classification method.

<img src="Media/naive_bayes-768x419.png"/>

<h3>2.1 The Bayes Theorem</h3>

$$
P(B|A) = \frac{P(A \cap B) }{P(A)} = \frac{P(A|B)P(B)}{P(A)}
$$

The essence of classification in the probabilistic view is to estimate the probability of a feature vector to belong to a given class.

$$
P(Y=k|x_1,x_2,x_3,...x_n) = f(\theta,x_1,x_2,x_3,...x_n)
$$

<h3>2.2 Building a Naive Bayes Classifier</
    h3>

$$
P(Y=k|X=[x_1,x_2,..x_n]) = \frac{P( x_1 \cap x_2 ..\cap  x_n \cap Y=k)}{P(x_1 \cap x_2 ...\cap x_n)} = \frac{P(x_1|x_2,x_3..x_n,Y=k)P(x_2 ..\cap  x_n \cap Y=k)}{P(x_1 \cap x_2 ...\cap x_n)}
$$

$$
P(Y=k|x_1,x_2,..x_n) = \frac{P(x_1|x_2,x_3..x_n,Y=k)P(x_2 ..\cap  x_n \cap Y=k)}{P(x_1 \cap x_2 ...\cap x_n)} =  \frac{P(x_1|x_2,x_3..x_n,Y=k)P(x_2|x_3...x_n,Y=k)P(x_3 ..\cap  x_n \cap Y=k)}{P(x_1 \cap x_2 ...\cap x_n)}
$$

$$
=\frac{P(x_1|x_2,x_3..x_n,Y=k)P(x_2|x_3...x_n,Y=k)P(x_3 |..  x_n, Y=k)P( .. x_n, Y=k)}{P(x_1 \cap x_2 ...\cap x_n)}
$$

$$
P(Y=k|x_1,x_2,..x_n)   =..=\frac{P(x_1|x_2,x_3..x_n,Y=k)P(x_2|x_3...x_n,Y=k)P(x_3 |..  x_n, Y=k)...P(x_n| Y=k)P(Y=k)}{P(x_1 \cap x_2 ...\cap x_n)}
$$

<b>Assumption 1: conditional independence in features</b>

$$
P(x_i|Y=k, x_j) = P (x_i|Y=k)
$$

$$
P(Y=k|x_1,x_2,..x_n)  =\frac{P(x_1|Y=k)P(x_2|Y=k)P(x_3 |Y=k)...P(x_n| Y=k)P(Y=k)}{P(x_1 \cap x_2 ...\cap x_n)}
$$

This relaxation drastically simplifies the computation of the conditional probability P(Y|X) which now only depends on the conditional probabilities P(x_i|Y=k) and the joint probability of features.

<b>Simplication 2: disregard the joint probability of features P(x_1,x_2,..x_n) </b>

$$
P(Y=k|x_1,x_2,..x_n) \text{ } \alpha\text{ } P(x_1|Y=k)P(x_2|Y=k)P(x_3 |Y=k)...P(x_n| Y=k)P(Y=k)
$$

Thus the Naive Bayes classifier does not compute true probability but rather a probability-based score to assign a data point to the most probable class.

$$
\text{Naive Bayes Score }(Y=k|x_1,x_2,..x_n) = P(x_1|Y=k)P(x_2|Y=k)P(x_3 |Y=k)...P(x_n| Y=k)P(Y=k)
$$

For a generic implementation of the classifier, the conditional probabilities P(x_i|Y=k) are estimated from the data using probability density functions, most commonly estimated to be normal (Gaussian Naive Bayes) or using kernel density functions (Kernel Naive Bayes) for more accurate results.

<h3>2.3 Python Implementation - NB</h3>

In [28]:
from sklearn.naive_bayes import GaussianNB

nb_clf = GaussianNB()
nb_clf.fit(X_train,y_train)#train the classifier
#get model prediction on train set and test set
y_pred_test_nb = nb_clf.predict(X_test)

In [20]:
print("Naive Bayes - Test Performance: \n",classification_report(y_test,y_pred_test_nb,target_names=targets))

Naive Bayes - Test Performance: 
               precision    recall  f1-score   support

 no-diabetes       0.78      0.87      0.82       349
has-diabetes       0.72      0.58      0.64       205

    accuracy                           0.76       554
   macro avg       0.75      0.72      0.73       554
weighted avg       0.75      0.76      0.75       554

