# Naive Bayes

### Introduction

It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods

It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.

Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and classifying articles.

### Why is it called Naïve Bayes?

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described as:

    Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent of the occurrence
    of other features.
    
    Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.

### Bayes Theorem

Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which determines the probability of an event with uncertain knowledge.

In probability theory, it relates the conditional probability and marginal probabilities of two random events.

It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).

$$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$


    P(A|B) is known as posterior, which we need to calculate, and it will be read as Probability of hypothesis A when we
    have occurred an evidence B.

    P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we calculate the probability 
    of evidence.

    P(A) is called the prior probability, probability of hypothesis before considering the evidence
    
    P(B) is called marginal probability, pure probability of an evidence.
    
In general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule can be written as:
    
$$ P(A_{i}|B) = \frac{P(B|A_{i})P(A_{i})}{\Sigma^k_{i=1} P(A_{i})P(B|A_{i})} $$

### Working of Naive Bayes

1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

### Limitation

Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the relationship between features.

### Types of Naïve Bayes Model

There are three types of Naive Bayes Model, which are given below:

1. Gaussian: The Gaussian model assumes that features follow a normal distribution. This means if predictors take continuous values instead of discrete, then the model assumes that these values are sampled from the Gaussian distribution.

2. Multinomial: The Multinomial Naïve Bayes classifier is used when the data is multinomial distributed. It is primarily used for document classification problems, it means a particular document belongs to which category such as Sports, Politics, education, etc. The classifier uses the frequency of words for the predictors.

3. Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the predictor variables are the independent Booleans variables. Such as if a particular word is present or not in a document. This model is also famous for document classification tasks.

## Importing the Libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Importing the Dataset

In [None]:
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, :-1].values
Y = dataset.iloc[:, -1].values

In [None]:
print(X)

In [None]:
print(Y)

## Splitting Dataset into Training and Test Set

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25, random_state = 0)

In [None]:
print(X_train)

In [None]:
print(X_test)

In [None]:
print(Y_train)

In [None]:
print(Y_test)

## Feature Scaling 

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [None]:
print(X_train)

In [None]:
print(X_test)

## Training the Naive Bayes Model on Training set

For making the Naive Bayes model we are using the GaussianNB class of naive_bayes library of the sklearn module.

In [None]:
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, Y_train)

## Predicting a new result

prdict method as usual is used to predict the value of target vector according to feature value

In [None]:
classifier.predict(sc.transform([[30,87000]]))

## Predicting the Test set result

To predict the test result we use the predict method on our trained model which requires test feature matrix and returns a predict vector consisiting of predicted value of target vector for test data.

We store the predict vector in a variable which is inded the prediction of the target vector based on test feature matrix supplied.

In [None]:
Y_pred = classifier.predict(X_test)

Now we will concatenate the predicted value of target vector (Y_pred) and test or real value of target vector (Y_test) using the concatenate method of the numpy.

Concatenate method of numpy takes a tuple containing the arrays to be merged and number of output column as argument.

With that we also apply reshape method on each arrays to make them vertical insted of horizontal for better analysis.

reshape method takes len of the column to be reshaped and number of output column as argument.


In [None]:
print(np.concatenate((Y_pred.reshape(len(Y_pred),1), Y_test.reshape(len(Y_test),1)),1))

## Making the Confusion Matrix

A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. It allows the visualization of the performance of an algorithm.

A confusion matrix is also known as an error matrix.

To make confusion matrix we use the confusion_matrix class of the metrics module of sklearn library.
The confusion_matrix method take the test or real target vector(Y_test) and predicted target vector(Y_pred) as argument.

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(Y_test, Y_pred)

In [None]:
print(cm)

To get the accuracy of our model in precentage we use the accuracy_score method of metrics module of sklearn library.

The accuracy_score method takes real or test target vector(Y_test) and predicted target vector(Y_pred) as argument.

In [None]:
accuracy_score(Y_test, Y_pred)

## Visualizing the Training set results

In [None]:
from matplotlib.colors import ListedColormap
X_set, Y_set = sc.inverse_transform(X_train), Y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.25),
                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.25))
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(Y_set)):
    plt.scatter(X_set[Y_set == j, 0], X_set[Y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Logistic Regression (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

## Visualizing the Test set results

In [None]:
from matplotlib.colors import ListedColormap
X_set, Y_set = sc.inverse_transform(X_test), Y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.25),
                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.25))
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(Y_set)):
    plt.scatter(X_set[Y_set == j, 0], X_set[Y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Logistic Regression (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()