## Nearest Neighbor Classification

<br>

<div style="text-align: justify">Nearest neighbor (NN) classification is a non-parametric and
a slow learning algorithm. Since it uses no assumption for the
underlying data, it is a non-parametric method.</div>

<br>

<div style="text-align: justify">The structure of the model is determined directly from the
dataset. This is helpful in those real-world datasets which
do not follow simple assumptions. It learns slowly because it
does not need any training data points for the creation of the
model. It uses all of the training data in the testing phase. This
makes the testing phase slower and memory consuming.</div>

<div style="text-align: justify">The most used algorithm of this family of classifiers is called
k-nearest neighbor (KNN), where K represents the number of
nearest neighbors to consider for classification. K is usually
taken as a small odd number.</div>

<br>

<div style="text-align: justify">For instance, in a two-class problem given in figure below, we
want to predict the label for the test point with the question
marks, ?. If we take K = 1, we have to find one training example
closest to this test point. To classify, we assign the label of this
nearest point to the test point.</div>

<br>

<img src="Images/knn.png" style="margin:auto"/> 

<div style="text-align: justify">For any other arbitrary value of K, we find the K closest point
to the test point, and then classify the test point by a majority
vote of its K neighbors. Each training point in the K closest points votes for its class. The class with the most votes in the
neighborhood is taken as the predicted class for the test point.
To find points closest or similar to the test point, we find the
distance between points. The steps to classify a test point by
KNN are as follows:</div>

* Calculate distance
* Find closest neighbors
* Vote for labels



<div style="text-align: justify">To implement a KNN classifier in Python, we first import the
libraries and packages.</div>

In [None]:



#We load the Digits dataset and split it into test and training
#sets using the following script.

# Train the model using the training sets



#Note that we split our dataset into a ratio of 75:25 for training:test sets.

#use the KNN model for training by specifying the input and output variables of the training set as follows.



#Predict Output


#The following section of the code displays the results using
#the confusion matrix.





<div style="text-align: justify">The predicted and actual labels are shown on the x and y-axis
of the confusion matrix, respectively. The diagonal entries on
the confusion matrix represent correct classification results.
It can be observed that most digits are correctly classified
by the model. However, occasional misclassified results are
shown on the off-diagonal entries of the matrix. The output of
the model shows an accuracy of 98.67 percent.</div>



<div style="text-align: justify"><b>Advantages and Applicability:</b> Nearest neighbor classification
is very easy to implement by just specifying the value of
neighbors and a suitable distance function, e.g., Euclidean
distance.</div>

<br>

<div style="text-align: justify">The nearest neighbor classifier does not learn anything in the
training period. This is referred to as instance-based learning.
Classification using the nearest neighbor is accomplished
by storing the whole training dataset. This makes nearest
neighbor classifiers much faster than logistic regression and
other classification models. Since no training is required
before making predictions, new data can be added easily to
the algorithm without affecting its accuracy.</div>

<div style="text-align: justify"><b>Limitations:</b> It is very slow in the testing phase; thus, it is not
suitable for large datasets because calculating the distance
between the test point and each training point takes a large
amount of time for large datasets.</div>

<br>

<div style="text-align: justify">Furthermore, algorithms based on the nearest neighbor are
sensitive to noise, outliers, and missing feature values. We
have to remove outliers and impute missing values before
applying this class of algorithms to our dataset.</div>

## Naïve Bayes’ Classification

<br>

<div style="text-align: justify">Naive Bayes is one of the most fundamental classification
algorithms. It is based on Bayes’ Theorem of probability.
A basic assumption used by a Naive Bayes’ classifier is the
independence of features. This assumption is considered
naïve, which simplifies computations. In terms of probability,
this assumption is called class conditional independence.</div>

<div style="text-align: justify">To understand Bayes’ theorem, we describe the conditional
probability that is defined as the likelihood of occurrence of an
event based on the occurrence of a previous event. Conditional
probability is calculated by multiplying the probability of the
preceding event by the updated probability of the conditional
event. For example, let us define events A and B:</div>

* Event A: It is raining outside, and let it has a 0.4 (40 percent) chance of raining today. The probability of event A is P(A) = 0.4.
* Event B: A person needs to go outside, and let it has a probability P(B) = 0.3 (30 percent).


<div style="text-align: justify"><b>Joint probability:</b> Let the probability that both events happen
simultaneously is 0.2 or 20 percent. It is written as P(A and B)
or P(A⋂B) and is known as the joint probability of A and B.
The symbol ⋂ is for the intersection of the events.</div>


<div style="text-align: justify"><b>Conditional probability:</b> Now, we are interested to know the
probability or chances of occurrence of rain given the person
has come out. The probability of rain given the person went
out is the conditional probability P(A|B) that can be given as,</div>

$$ P(A|B) = P(A\cap B)/P(B) = 0.2/0.3 = 0.66 = 0.66% $$

<div style="text-align: justify">Besides P(A|B), there is another conditional probability related
to the event: the probability of occurrence of event B given A has already occurred, P(B|A). Bayes’ theorem converts one
conditional probability to the other conditional probability. It
is given as</div>

$$P(B|A)= (P(A|B) P(B))/(P(A)) = (0.66)(0.3)/0.4 = 49.5% $$

<div style="text-align: justify">It is evident from this example that generally, P(A|B) is not
equal to P(B|A).</div>

<div style="text-align: justify">To implement the Naïve Bayes’ algorithm in Python, we may
write the following script.</div>

In [None]:
#Import Gaussian Naive Bayes model


#Note:
#Suppose we want to predict whether we should play based on
#weather conditions and temperature readings. Weather and
#temperature become our features, and the decision to play
#becomes the target variable or the output. We assign features
#and label variables as follows.

# Assigning features and label variables
weather=['Sunny','Sunny','Overcast','Rainy','Rainy','Rainy','Overcast','Sunny','Sunny',
'Rainy','Sunny','Overcast','Overcast','Rainy']
temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild','Mild','Mild','Hot','Mild']

play=['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Yes','No']


#Note:
#Since weather and temperature are given as strings, it is difficult
#to train our model on strings. We transform our features and
#the target variable to numeric data as follows.

#creating label Encoder


# Converting string labels into numbers.


# Encode temp and play columns. Converting string labels into numbers


#combining features weather and temp in a single variable (list of tuples).




# Generate a model using naive bayes classifier in the following steps:
# 1. Create naive bayes classifier
# 2. Fit the dataset on classifier
# 3. Perform prediction

#Create a Gaussian Classifier


# Train the model using the training sets


#Predict Output





<div style="text-align: justify">We convert string feature values and output labels into integers
    using label encoding method <b>preprocessing.LabelEncoder().</b>
    The function <b>fit_transform (weather)</b> converts string labels
for weather conditions into numbers.</div>

<br>

<div style="text-align: justify">We generate a model using Naive Bayes’ classifier by the
following steps:</div>

1. Create naive Bayes’ classifier
2. Fit the dataset on classifier
3. Perform prediction.

In [None]:
from sklearn.naive_bayes import GaussianNB

#Create a Gaussian Classifier
model = GaussianNB()

# Train the model using the training sets
model.fit(features,label)

#Predict Output
predicted= model.predict([[0,2]]) # 0:Overcast, 2:Mild
print ('Predicted Value:', predicted)

In [None]:
from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier(n_neighbors=3)

# Train the model using the training sets
model.fit(features,label)

#Predict Output
predicted= model.predict([[0,2]]) # 0:Overcast, 2:Mild
print(predicted)

<div style="text-align: justify">To predict a test point, we have used the function predict ().</div>

<br>

<div style="text-align: justify"><b>Advantages and Applicability:</b> Naïve Bayes’ is easy to
implement and interpret. It performs better compared to
other similar models when the input features are independent
of each other. A small amount of training data is sufficient for
this model to estimate the test data.</div>

<div style="text-align: justify"><b>Limitations:</b> The main limitation of Naïve Baye’ is the
assumption of independence between the independent
variables. If features are dependent, this algorithm cannot
be applied. Dimensionality reduction techniques can be used
to transform the features into a set of independent features
before applying the Naïve Bayes’ classifier.</div>