<a href="https://colab.research.google.com/github/cagBRT/Machine-Learning/blob/master/NaiveBayesGaussian.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Naïve Bayes** algorithms is a classification technique based on applying Bayes’ theorem with a strong assumption that all the predictors are independent to each other. <br>
In other words, the assumption is that the presence of a feature in a class is independent to the presence of any other feature in the same class.

For example:<br> 
>a phone may be considered as smart if it has a touch screen, internet, a camera etc. Although these features are dependent on each other, they contribute *independently* to the probability of that the phone is a smart phone.

**Pros**<br>
The followings are some pros of using Naïve Bayes classifiers <br>

- Naïve Bayes classification is easy to implement and fast.<br>

- It converges faster than models like logistic regression.<br>

- It requires less training data.<br>

- It is highly scalable in nature. They scale linearly with the number of predictors and data points.<br>

- It can make probabilistic predictions and can handle continuous as well as discrete data.<br>

- Naïve Bayes classification algorithm can be used for binary as well as multi-class classification problems both.



Cons:<br>
The followings are some cons of using Naïve Bayes classifiers <br>

- One of the most important cons of Naïve Bayes classification is its strong feature independence because *in real life it is almost impossible to have a set of features which are completely independent of each other*.<br>

- ‘Zero frequency’ which means that if a categorial variable has a category that is not observed in training data set, then the Naïve Bayes model will assign a zero probability to it and it will be unable to make a prediction.

The following are some common applications of Naïve Bayes classification<br>

- Real-time prediction − Due to its ease of implementation and fast computation, it can be used to do prediction in real-time.

- Multi-class prediction − Naïve Bayes classification algorithm can be used to predict posterior probability of multiple classes of target variable.

- Text classification − Due to the feature of multi-class prediction, Naïve Bayes classification algorithms are well suited for text classification. That is why it is also used to solve problems like spam-filtering and sentiment analysis.

- Recommendation system − Along with the algorithms like collaborative filtering, Naïve Bayes can be used as a Recommendation system which can filter unseen information and to predict weather a user would like the given resource or not.

Gaussian Naïve Bayes
It is the simplest Naïve Bayes classifier having the assumption that the data from each label is drawn from a simple Gaussian distribution.

Install the required libraries

In [None]:
!pip install -U matplotlib

In [None]:
from sklearn.datasets import make_classification
from sklearn.naive_bayes import MultinomialNB
from sklearn.datasets import load_digits
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression 
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import train_test_split

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Create a data set and plot it

In [None]:
from sklearn.datasets import make_blobs
X, y = make_blobs(2000, 2, centers=2, random_state=2, cluster_std=1.5)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='RdBu');

In [None]:
# splitting X and y into training and testing sets 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1) 

In [None]:
plt.figure(figsize=(10,5))
fig = plt.scatter(X_train[:,0],y_train, marker='o', c=X_train[:,1],s=25, cmap='RdBu')
plt.xlabel("X_train[:,0]")
plt.ylabel("y_train")
plt.colorbar(fig)
plt.show()

Create and train the model

In [None]:
model = GaussianNB()
model.fit(X, y);

In [None]:
ypred = model.predict_proba(X_test)
ypredClass = ypred.round()
print(ypredClass)

In [None]:
plt.figure(figsize=(10,5))
fig = plt.scatter(X_train[:,0],y_train, marker='o', c=X_train[:,1],s=25, cmap='RdBu')
plt.xlabel("X_train[:,0]")
plt.ylabel("y_train")
plt.colorbar(fig)
plt.plot(X_test[:,0],ypredClass,'r+')
plt.show()

In [None]:
digits = load_digits()
print(cross_val_score(model, digits.data, digits.target, scoring='accuracy', cv=10).mean())