# Bayes Classifier


# The Technique (Bayes Classifier)

Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps in building the fast machine learning models that can make quick predictions. It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.

# The Problem
Naive Bayes is a probabilistic machine learning algorithm that can be used in a wide variety of classification tasks.
Typical applications include filtering spam, classifying documents, sentiment prediction etc.

# Code

In [None]:
# Import scikit-learn dataset library
from sklearn import datasets

# Load dataset
wine = datasets.load_wine()

# print the names of the 13 features
print("Features: ", wine.feature_names)

# print the label type of wine(class_0, class_1, class_2)
print("Labels: ", wine.target_names)

# print data(feature)shape
wine.data.shape

# print the wine data features (top 5 records)
print(wine.data[0:5])
# print(wine)

# print the wine labels (0:Class_0, 1:class_2, 2:class_2)
# print(wine.target)

# Import train_test_split function
from sklearn.model_selection import train_test_split

# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.3,
                                                    random_state=109)  # 70% training and 30% test

# Import Gaussian Naive Bayes model
from sklearn.naive_bayes import GaussianNB

# Create a Gaussian Classifier
gnb = GaussianNB()

# Train the model using the training sets
gnb.fit(X_train, y_train)

# Predict the response for test dataset
y_pred = gnb.predict(X_test)

print('y_pred', y_pred.shape)
print('y_test', y_test.shape)


# Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics
# from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_score, recall_score, f1_score

Features:  ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']
Labels:  ['class_0' 'class_1' 'class_2']
[[1.423e+01 1.710e+00 2.430e+00 1.560e+01 1.270e+02 2.800e+00 3.060e+00
  2.800e-01 2.290e+00 5.640e+00 1.040e+00 3.920e+00 1.065e+03]
 [1.320e+01 1.780e+00 2.140e+00 1.120e+01 1.000e+02 2.650e+00 2.760e+00
  2.600e-01 1.280e+00 4.380e+00 1.050e+00 3.400e+00 1.050e+03]
 [1.316e+01 2.360e+00 2.670e+00 1.860e+01 1.010e+02 2.800e+00 3.240e+00
  3.000e-01 2.810e+00 5.680e+00 1.030e+00 3.170e+00 1.185e+03]
 [1.437e+01 1.950e+00 2.500e+00 1.680e+01 1.130e+02 3.850e+00 3.490e+00
  2.400e-01 2.180e+00 7.800e+00 8.600e-01 3.450e+00 1.480e+03]
 [1.324e+01 2.590e+00 2.870e+00 2.100e+01 1.180e+02 2.800e+00 2.690e+00
  3.900e-01 1.820e+00 4.320e+00 1.040e+00 2.930e+00 7.350e+02]]
y_pred (54,)
y_test (54,)


# Confuson Matrix

In [None]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_pred)

array([[20,  1,  0],
       [ 2, 15,  2],
       [ 0,  0, 14]])

# Result

In [None]:
print("Accuracy:", metrics.accuracy_score(y_test, y_pred))
print('Confusion Matrix is given by:\n', metrics.confusion_matrix(y_test, y_pred))
print('Precision: ', metrics.precision_score(y_test, y_pred, average=None))
print('Recall:', metrics.recall_score(y_test, y_pred, average=None))
print('F1 Score: ', metrics.f1_score(y_test, y_pred, average='weighted'))

Accuracy: 0.9074074074074074
Confusion Matrix is given by:
 [[20  1  0]
 [ 2 15  2]
 [ 0  0 14]]
Precision:  [0.90909091 0.9375     0.875     ]
Recall: [0.95238095 0.78947368 1.        ]
F1 Score:  0.9053197161724292


# Conclusion

Check for correlated features and try removing the highly correlated ones. Naive Bayes is based on the assumption that the features are independent.

Feature engineering. Combining features (a product) to form new ones that makes intuitive sense might help.