# Naive Bayes Classifer

Naive Bayes is a machine learning method you can use to predict the likelihood that an event will occur given evidence that's present in your data.

    Conditional Probability = P(B|A) = P(A and B)/P(A)
    

## Types of Naive Bayes Classifier


    1. Multinomial - good for when your features (categorical/continuous) describe discrete frequency counts (e.g. word counts)
    2. Bernoulli - good for making predictions from binary features
    3. Gaussian - good for making predictions from normally distributed data
    
## Usecases


    A. Spam Detection
    B. Customer classification
    C. Credit Risk Prediction
    D. Health Risk Prediction
    
## Assumptions


    1. Predictors are independent of each other
    2. A Priori assumption: This assumption that the past condition still hold true. When we make predictions from historical values, we will get incorrect results if present circumstances have changed.
    3. All regression models maintain an a priori assumption as well

In [5]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
from pylab import rcParams
import seaborn as sb

import urllib.request

import scipy
from scipy.stats import spearmanr

import sklearn
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import MultinomialNB


from sklearn.cross_validation import train_test_split
from sklearn import metrics
from sklearn.metrics import accuracy_score

### Using Naive Bayes to predict Spam

In [6]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"
raw_data = urllib.request.urlopen(url)
dataset = np.loadtxt(raw_data, delimiter = ",")
print(dataset[0])

[   0.       0.64     0.64     0.       0.32     0.       0.       0.       0.
    0.       0.       0.64     0.       0.       0.       0.32     0.
    1.29     1.93     0.       0.96     0.       0.       0.       0.       0.
    0.       0.       0.       0.       0.       0.       0.       0.       0.
    0.       0.       0.       0.       0.       0.       0.       0.       0.
    0.       0.       0.       0.       0.       0.       0.       0.778
    0.       0.       3.756   61.     278.       1.   ]


In [7]:
X = dataset[:,0:48]
y= dataset[:,-1]

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.33, random_state = 17)

In [10]:
BernNB = BernoulliNB(binarize = True)
BernNB.fit(X_train, y_train)
print(BernNB)

y_expect = y_test
y_pred = BernNB.predict(X_test)
print(accuracy_score(y_expect, y_pred))

BernoulliNB(alpha=1.0, binarize=True, class_prior=None, fit_prior=True)
0.855826201448


In [11]:
MultiNB = MultinomialNB()
MultiNB.fit(X_train, y_train)
print(MultiNB)

y_pred = MultiNB.predict(X_test)
print(accuracy_score(y_test, y_pred))

MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)
0.873601053325


In [12]:
GausNB = GaussianNB()
GausNB.fit(X_train, y_train)
print(GausNB)

y_pred = GausNB.predict(X_test)
print(accuracy_score(y_test, y_pred))

GaussianNB(priors=None)
0.813034891376


In [13]:
BernNB = BernoulliNB(binarize = 0.1)
BernNB.fit(X_train, y_train)
print(BernNB)

y_expect = y_test
y_pred = BernNB.predict(X_test)
print(accuracy_score(y_expect, y_pred))

BernoulliNB(alpha=1.0, binarize=0.1, class_prior=None, fit_prior=True)
0.895325872284
