# Gaussian Naive Bayes

Naive Bayes uses Bayes Theorem to model the conditional relationship of each attribute to the class variable. 

This recipe shows the fitting of an Naive Bayes algorithm to the iris dataset. 

In [1]:
from sklearn import datasets 
from sklearn import metrics 
from sklearn.naive_bayes import GaussianNB

Load the Iris dataset

Iris flower dataset (4x150, reals, multi-label classification)

1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. class:
-- Iris Setosa = 0
-- Iris Versicolour = 1
-- Iris Virginica = 2

In [2]:
dataset = datasets.load_iris()
print dataset.data[0:10,]
print dataset.target[0:10,]

[[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]
 [ 5.   3.6  1.4  0.2]
 [ 5.4  3.9  1.7  0.4]
 [ 4.6  3.4  1.4  0.3]
 [ 5.   3.4  1.5  0.2]
 [ 4.4  2.9  1.4  0.2]
 [ 4.9  3.1  1.5  0.1]]
[0 0 0 0 0 0 0 0 0 0]


Fit a Naive Bayes model to the data 

In [3]:
model = GaussianNB() 
model.fit(dataset.data, dataset.target) 
print(model)

GaussianNB(priors=None)


Make predictions

In [4]:
expected = dataset.target 
predicted = model.predict(dataset.data) 

Summarize the fit of the model 

In [5]:
print(metrics.classification_report(expected, predicted)) 
print(metrics.confusion_matrix(expected, predicted)) 

             precision    recall  f1-score   support

          0       1.00      1.00      1.00        50
          1       0.94      0.94      0.94        50
          2       0.94      0.94      0.94        50

avg / total       0.96      0.96      0.96       150

[[50  0  0]
 [ 0 47  3]
 [ 0  3 47]]


The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.

The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.

The F-beta score can be interpreted as a weighted harmonic mean of the precision and recall, where an F-beta score reaches its best value at 1 and worst score at 0.

The F-beta score weights recall more than precision by a factor of beta. beta == 1.0 means recall and precision are equally important.

The support is the number of occurrences of each class in y_true.