# Naive Bayes Lecture Notes

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.naive_bayes import GaussianNB
%matplotlib inline

### Intution Behind Naive Bayes

The Bayes formula was influence by Thomas Bayes, a priest who was trying to prove the existence of the christian God. 

The idea behind Naive Bayes algorithms is to use the information obtained from testing ("fitting" the algorithm) into the algorithm after intial fitting, essentially a feedback loop. This is possible due to the "naive" assumption that each feature is independant from each other. 

The Gaussian Naive Bayes algorithm, used for classification, is what is explored here. In sci-kit learn, this function of updating the algorithm is done via the .partial_fit() method.

#### Applications

- Spam filtering
- Text prediction
- Document Classification

#### Benefits

- Need relatively small sample size (easy to implement)
- Extremely fast learning compared to other ML methods
- "The decoupling of the class conditional feature distributions means that each distribution can be independently estimated as a one dimensional distribution. This in turn helps to alleviate problems stemming from the curse of dimensionality." (Sci-kit learn documentation)

#### Downsides

- bad estimator, probability estimates can not be considered accurate
- Limited applications
- Can break, phrases do not work well, as only take into account frequencies

#### Varieties

- GaussianNB: best for classification
- MultinomialNB: "implements the naive Bayes algorithm for multinomially distributed data, and is one of the two classic naive Bayes variants used in text classification"
- BernoulliNB: "implements the naive Bayes training and classification algorithms for data that is distributed according to multivariate Bernoulli distributions; i.e., there may be multiple features but each one is assumed to be a binary-valued (Bernoulli, boolean) variable."

#### Why Naive?

In a word probability example, the word order is ignored. So the order of the features is ignored, using multiplication of word frequency to calculate probability of each possible class. In other words, the only thing taken into consideration are the frequencies of each piece of evidence.


### Creating Sample Gaussian NB Classifier

In [2]:
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
Y = np.array([1, 1, 1, 2, 2, 2])
clf = GaussianNB()
clf.fit(X, Y)
GaussianNB(priors=None)
print(clf.predict([[-0.8, -1]]))


[1]


#### partial_fit() allows for re-calculation of priors

In [3]:
clf_pf = GaussianNB()
clf_pf.partial_fit(X, Y, np.unique(Y))
GaussianNB(priors=None)
print(clf_pf.predict([[-0.8, -1]]))

[1]


### Creating Accuracy function

In [5]:
def NBAccuracy(features_train, labels_train, features_test, labels_test):
    """ compute the accuracy of your Naive Bayes classifier """
    ### import the sklearn module for GaussianNB
    from sklearn.naive_bayes import GaussianNB

    ### create classifier
    clf = GaussianNB()

    ### fit the classifier on the training features and labels
    clf.fit(features_train,labels_train)

    ### use the trained classifier to predict labels for the test features
    pred = clf.predict(features_test)


    ### calculate and return the accuracy on the test data
    ### this is slightly different than the example, 
    ### where we just print the accuracy
    ### you might need to import an sklearn module
    
    #Metrics Version
    from sklearn.metrics import accuracy_score
    accuracy = accuracy_score(pred, labels_test)
    
    #Built in GaussianNB function
    #accuracy = clf.score(features_test,labels_test)
    return accuracy