# Naive Bayes Classifier implementation with python

Naive Bayes is a machine learning method you can use to predict the liklihood that an event will occur given evidence that's present in your data.
In the world of statistics it is called 

## Types of naive bayes models

**Multinomial** good for when your features (categorical or continuous) describe discrete frequency counts (e.g. word counts)

**Bernoulli** good for making predictions from binary features

**Gaussian** good for making predictions from normally distributed features


## Use Cases

- Spam detection
- Customer Classification
- Credit risk pridiction
- Health risk prediction

## Naive Bayes Classifiers

In [3]:
import numpy as np
import pandas as pd
import urllib.request as ur


import sklearn
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import MultinomialNB

from sklearn.model_selection import train_test_split

from sklearn import metrics
from sklearn.metrics import accuracy_score

## Using naive bayes to predict spam

In [4]:
url="https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"
raw_data = ur.urlopen(url)
dataset = np.loadtxt(raw_data, delimiter=',')
print(dataset[0])


[  0.      0.64    0.64    0.      0.32    0.      0.      0.      0.
   0.      0.      0.64    0.      0.      0.      0.32    0.      1.29
   1.93    0.      0.96    0.      0.      0.      0.      0.      0.
   0.      0.      0.      0.      0.      0.      0.      0.      0.
   0.      0.      0.      0.      0.      0.      0.      0.      0.
   0.      0.      0.      0.      0.      0.      0.778   0.      0.
   3.756  61.    278.      1.   ]


In [5]:
X = dataset[:, 0:48]
y = dataset[:, -1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.33, random_state=17)

In [6]:
BernNB = BernoulliNB(binarize = True)
BernNB.fit(X_train, y_train)
print(BernNB)

y_expect = y_test
y_pred = BernNB.predict(X_test)
print("Accouracy Score: ", accuracy_score(y_expect, y_pred))

BernoulliNB(alpha=1.0, binarize=True, class_prior=None, fit_prior=True)
Accouracy Score:  0.8558262014483212


In [7]:
MultiNB = MultinomialNB()

MultiNB.fit(X_train, y_train)
print(MultiNB)

y_pred = MultiNB.predict(X_test)
print("MultiNB Accuracy score: ", accuracy_score(y_expect, y_pred))

MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)
MultiNB Accuracy score:  0.8736010533245556


**you can notice that MultiNB accuracy is become little better than BernoulliNB**

In [10]:
GausNB = GaussianNB()
GausNB.fit(X_train, y_train)
print(GausNB)

y_pred = GausNB.predict(X_test)
print(accuracy_score(y_expect, y_pred))


GaussianNB(priors=None, var_smoothing=1e-09)
0.8130348913759052


Let see if we can improve bernoulli by trail and error 

In [13]:
BernNB = BernoulliNB(binarize = 0.1)
BernNB.fit(X_train, y_train)
print(BernNB)

y_expect = y_test
y_pred = BernNB.predict(X_test)
print("Accouracy Score: ", accuracy_score(y_expect, y_pred))

BernoulliNB(alpha=1.0, binarize=0.1, class_prior=None, fit_prior=True)
Accouracy Score:  0.8953258722843976


This makes BernoulliNB has a better accuracy than other methods