<a href="https://colab.research.google.com/github/cagBRT/Machine-Learning/blob/master/NaiveBayesGaussian_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Naïve Bayes** algorithms is a classification technique based on applying Bayes’ theorem with a strong assumption that all the predictors are independent to each other. <br>
In other words, the assumption is that the presence of a feature in a class is independent to the presence of any other feature in the same class.

For example:<br> 
>a phone may be considered as smart if it has a touch screen, internet, a camera etc. Although these features are dependent on each other, they contribute *independently* to the probability of that the phone is a smart phone.

**Pros**<br>
The followings are some pros of using Naïve Bayes classifiers <br>

- Naïve Bayes classification is easy to implement and fast.<br>

- It converges faster than models like logistic regression.<br>

- It requires less training data.<br>

- It is highly scalable in nature. They scale linearly with the number of predictors and data points.<br>

- It can make probabilistic predictions and can handle continuous as well as discrete data.<br>

- Naïve Bayes classification algorithm can be used for binary as well as multi-class classification problems both.



**Cons:**<br>
The followings are some cons of using Naïve Bayes classifiers <br>

- One of the most important cons of Naïve Bayes classification is its strong feature independence because *in real life it is almost impossible to have a set of features which are completely independent of each other*.<br>

- ‘Zero frequency’ which means that if a categorial variable has a category that is not observed in training data set, then the Naïve Bayes model will assign a zero probability to it and it will be unable to make a prediction.

The following are some common applications of Naïve Bayes classification<br>

- Real-time prediction − Its ease of implementation and fast computation means it can be used to do prediction in real-time.

- Multi-class prediction − Naïve Bayes classification algorithm can be used to predict the probability of multiple classes of target variable.

- Text classification − Due to the feature of multi-class prediction, Naïve Bayes classification algorithms are well suited for text classification. It is used for spam-filtering and sentiment analysis.

- Recommendation system − Along with the algorithms like collaborative filtering, Naïve Bayes can be used as a Recommendation system which can filter unseen information and predicts whether a user would like a given suggestion

Gaussian Naïve Bayes
It **is the simplest Naïve Bayes classifier** having the assumption that the data from each label is drawn from a simple Gaussian distribution.

https://machinelearningmastery.com/naive-bayes-for-machine-learning/


Install the required libraries

In [None]:
#!pip install -U matplotlib

In [None]:
from sklearn.datasets import make_classification
from sklearn.naive_bayes import MultinomialNB
from sklearn.datasets import load_digits
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression 
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

import numpy as np
import pandas as pd
#import matplotlib.pyplot as plty

from sklearn.datasets import make_blobs
#!pip uninstall matplotlib
#y
!pip install matplotlib==3.1.3
import matplotlib.pyplot as plt

Create a data set and plot it

**make_blobs and make_classification** create multiclass datasets by allocating each class one or more normally-distributed clusters of points.<br><br>

make_blobs provides:<br>
- greater control regarding the centers and standard deviations of each cluster<br>
- is used to demonstrate clustering.<br><br>

make_classification specialises in introducing noise by way of:<br>
- correlated, redundant and uninformative features; <br>- multiple Gaussian clusters per class<br>
- linear transformations of the feature space.

In [None]:
X,y = make_classification(n_samples=100, n_features=10,n_classes=3,class_sep=1.0,n_clusters_per_class=1)
#X, y = make_blobs(2000, 2, centers=2, random_state=2, cluster_std=1.5)

In [None]:
plt.scatter(X[:, 0], X[:, 1])
plt.show()

In [None]:
# splitting X and y into training and testing sets 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1) 

In [None]:
plt.figure(figsize=(10,5))
fig = plt.scatter(X_train[:,0],y_train, marker='o', c=X_train[:,1],s=25, cmap='RdBu')
plt.xlabel("X_train[:,0]")
plt.ylabel("y_train")
plt.colorbar(fig)
plt.show()

Create and train the model

In [None]:
model = GaussianNB()
model.fit(X, y);

In [None]:
ypred = model.predict_proba(X_test)
ypredClass = ypred.round()
#print(ypredClass)

In [None]:
model.score(X_test,y_test)

In [None]:
plt.figure(figsize=(10,5))
fig = plt.scatter(X_train[:,0],y_train, marker='o', c=X_train[:,1],s=25, cmap='RdBu')
plt.xlabel("X_train[:,0]")
plt.ylabel("y_train")
plt.colorbar(fig)
plt.plot(X_test[:,0],ypredClass,'r+')
plt.show()

Assignment<br>
Change the dataset:<br>
>fewer data instances<br>
>more data instances<br>
>number of clusters<br>