Naive Bayes Machine Learning Implementation

Naive Bayes is a probabilistic classifier based on Bayes' Theorem, with a strong (naive) assumption of feature independence. Different variants are used based on the type of input data.



| Algorithm      | Input Type      | Ideal For                        |
| -------------- | --------------- | -------------------------------- |
| Gaussian NB    | Continuous      | Numeric features (normal dist.)  |
| Multinomial NB | Discrete counts | Text classification (count data) |
| Bernoulli NB   | Binary          | Presence/absence features        |
| Complement NB  | Discrete counts | Text with imbalanced labels      |
| Categorical NB | Categorical     | Finite labeled categories        |


In [36]:
from sklearn.datasets import load_iris

In [37]:
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB, ComplementNB, CategoricalNB


In [38]:
from sklearn.model_selection import train_test_split

In [39]:
# Multi Class Classification

In [40]:
X,y = load_iris(return_X_y=True)

In [41]:
X

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
       [4.9, 3

In [42]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [43]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=0)

In [44]:
from sklearn.naive_bayes import GaussianNB

In [45]:
# 1. Gaussian Naive Bayes
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred_gnb = gnb.predict(X_test)
y_pred_gnb

array([2, 1, 0, 2, 0, 2, 0, 1, 1, 1, 2, 1, 1, 1, 1, 0, 1, 1, 0, 0, 2, 1,
       0, 0, 2, 0, 0, 1, 1, 0, 2, 1, 0, 2, 2, 1, 0, 1, 1, 1, 2, 0, 2, 0,
       0])

In [46]:
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report

In [47]:
print("Naive Bayes Classifier Results:\n")
print("GaussianNB Accuracy:", accuracy_score(y_test, y_pred_gnb))
print("Confusion matrix: \n", confusion_matrix(y_test, y_pred_gnb))
print("Classification report: \n", classification_report(y_test, y_pred_gnb))

Naive Bayes Classifier Results:

GaussianNB Accuracy: 1.0
Confusion matrix: 
 [[16  0  0]
 [ 0 18  0]
 [ 0  0 11]]
Classification report: 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        16
           1       1.00      1.00      1.00        18
           2       1.00      1.00      1.00        11

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



In [49]:
# 2. Multinomial Naive Bayes (requires non-negative integers, we'll floor the values)
X_train_mnb = np.floor(X_train).astype(int)
X_test_mnb = np.floor(X_test).astype(int)

from sklearn.naive_bayes import MultinomialNB

mnb = MultinomialNB()

mnb.fit(X_train_mnb, y_train)
y_pred_mnb = mnb.predict(X_test_mnb)
print("MultinomialNB Accuracy:", accuracy_score(y_test, y_pred_mnb))
print("Naive Bayes Classifier Results:\n")
print("GaussianNB Accuracy:", accuracy_score(y_test, y_pred_gnb))
print("Confusion matrix: \n", confusion_matrix(y_test, y_pred_gnb))
print("Classification report: \n", classification_report(y_test, y_pred_gnb))

MultinomialNB Accuracy: 0.6
Naive Bayes Classifier Results:

GaussianNB Accuracy: 1.0
Confusion matrix: 
 [[16  0  0]
 [ 0 18  0]
 [ 0  0 11]]
Classification report: 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        16
           1       1.00      1.00      1.00        18
           2       1.00      1.00      1.00        11

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



In [50]:
# 3. Bernoulli Naive Bayes (requires binary features, we'll binarize the input)
X_train_bnb = (X_train > np.mean(X_train, axis=0)).astype(int)
X_test_bnb = (X_test > np.mean(X_train, axis=0)).astype(int)

from sklearn.naive_bayes import BernoulliNB

bnb = BernoulliNB()
bnb.fit(X_train_bnb, y_train)

y_pred_bnb = bnb.predict(X_test_bnb)
print("BernoulliNB Accuracy:", accuracy_score(y_test, y_pred_bnb))
print("Naive Bayes Classifier Results:\n")
print("GaussianNB Accuracy:", accuracy_score(y_test, y_pred_gnb))
print("Confusion matrix: \n", confusion_matrix(y_test, y_pred_gnb))
print("Classification report: \n", classification_report(y_test, y_pred_gnb))

BernoulliNB Accuracy: 0.7111111111111111
Naive Bayes Classifier Results:

GaussianNB Accuracy: 1.0
Confusion matrix: 
 [[16  0  0]
 [ 0 18  0]
 [ 0  0 11]]
Classification report: 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        16
           1       1.00      1.00      1.00        18
           2       1.00      1.00      1.00        11

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



In [51]:
# 4. Complement Naive Bayes (also expects counts, like MultinomialNB)
from sklearn.naive_bayes import ComplementNB

cnb = ComplementNB()

cnb.fit(X_train_mnb, y_train)
y_pred_cnb = cnb.predict(X_test_mnb)
print("ComplementNB Accuracy:", accuracy_score(y_test, y_pred_cnb))
print("Naive Bayes Classifier Results:\n")
print("GaussianNB Accuracy:", accuracy_score(y_test, y_pred_gnb))
print("Confusion matrix: \n", confusion_matrix(y_test, y_pred_gnb))
print("Classification report: \n", classification_report(y_test, y_pred_gnb))


ComplementNB Accuracy: 0.6
Naive Bayes Classifier Results:

GaussianNB Accuracy: 1.0
Confusion matrix: 
 [[16  0  0]
 [ 0 18  0]
 [ 0  0 11]]
Classification report: 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        16
           1       1.00      1.00      1.00        18
           2       1.00      1.00      1.00        11

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



In [52]:
# 5. Categorical Naive Bayes (convert continuous features to categories)
X_train_cat = np.digitize(X_train, bins=np.linspace(np.min(X_train), np.max(X_train), 10))
X_test_cat = np.digitize(X_test, bins=np.linspace(np.min(X_train), np.max(X_train), 10))

from sklearn.naive_bayes import CategoricalNB

cat_nb = CategoricalNB()

cat_nb.fit(X_train_cat, y_train)
y_pred_cat = cat_nb.predict(X_test_cat)
print("CategoricalNB Accuracy:", accuracy_score(y_test, y_pred_cat))
print("Naive Bayes Classifier Results:\n")
print("GaussianNB Accuracy:", accuracy_score(y_test, y_pred_gnb))
print("Confusion matrix: \n", confusion_matrix(y_test, y_pred_gnb))
print("Classification report: \n", classification_report(y_test, y_pred_gnb))

CategoricalNB Accuracy: 0.9555555555555556
Naive Bayes Classifier Results:

GaussianNB Accuracy: 1.0
Confusion matrix: 
 [[16  0  0]
 [ 0 18  0]
 [ 0  0 11]]
Classification report: 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        16
           1       1.00      1.00      1.00        18
           2       1.00      1.00      1.00        11

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

