<a href="https://colab.research.google.com/github/cagBRT/Machine-Learning/blob/master/NaiveBayesBernoulli_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Another important Naive Bayes model is Bernoulli Naïve Bayes in which features are assumed to be binary (0s and 1s). <br>
Text classification with ‘bag of words’ model can be an application of Bernoulli Naïve Bayes.

In [None]:
# Clone the entire repo.
!git clone -l -s https://github.com/cagBRT/Machine-Learning.git cloned-repo
%cd cloned-repo

If X is random variable Bernoulli-distributed, it can assume only two values (let’s call them 0 and 1)

In [None]:
from sklearn.datasets import make_classification
import numpy as np
from sklearn.naive_bayes import BernoulliNB
from sklearn.model_selection import train_test_split

from IPython.display import Image
import matplotlib.pyplot as plt

Bernoulli naive Bayes expects binary feature vectors, however, the class BernoulliNB has a binarize parameter which allows specifying a threshold that will be used internally to transform the features:

**Generate a random dataset**

In [None]:
nb_samples = 600
X, Y = make_classification(n_samples=nb_samples,n_clusters_per_class=1, n_features=2, n_informative=1, n_redundant=0)

We will use 0.0 as a binary threshold, so each point can be characterized by the quadrant where it’s located

**Scatter plot the dataset**

In [None]:
plt.figure(figsize=(10,5))
fig = plt.scatter(X[:,0],X[:,1], marker='o', c=X[:,1],
            s=25)
plt.xlabel("X[:,0]")
plt.ylabel("X[:,1]")
plt.colorbar(fig)
plt.show()

**Split the test and training data**

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25)

Plot the training data, X is the training data, Y-axis is the label (0,1)

In [None]:
plt.figure(figsize=(10,5))
fig = plt.scatter(X_train[:,0],Y_train, marker='o', c=X_train[:,0], s=25)
plt.xlabel("X_train[:,0]")
plt.ylabel("y_train")
plt.colorbar(fig)
plt.show()

Plot the testing data, X is the test data, Y-axis is the label (0,1)

In [None]:
plt.figure(figsize=(10,5))
fig = plt.scatter(X_test[:,0],Y_test, marker='o', c=X_test[:,0],s=25)
plt.xlabel("X_test[:,0]")
plt.ylabel("y_test")
plt.colorbar(fig)
plt.show()

**Create and train the model**

In [None]:
bnb = BernoulliNB(binarize=0.0)
bnb.fit(X_train, Y_train)

If we want to understand how the binary classifier worked, it’s useful to see how the data have been internally binarized:

**Determine Model performance**

In [None]:
bnb.score(X_test, Y_test)

**Make a prediction with the model**

In [None]:
plt.figure(figsize=(10,5))
fig = plt.scatter(X_test[:,0],Y_test, marker='o', c=X_test[:,0],s=25)
plt.plot([0,0,1,1],[0,1,0,1], 'r+')
plt.xlabel("X_test[:,0]")
plt.ylabel("y_test")
plt.colorbar(fig)
plt.show()

In [None]:
data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
bnb.predict(data)