<h1>Naive Bayes Classifier</h1>

Naive Bayes is a simple yet powerful probabilistic classifier based on Bayes' theorem with an assumption of independence between features.Despite its simplicity, Naive Bayes often performs surprisingly well in practice, particularly in text classification and other domains where the assumption of independence holds reasonably well.

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

Data source: Github

In [2]:
df = pd.read_csv('https://raw.githubusercontent.com/illinois-cse/data-fa14/gh-pages/data/iris.csv')
df.head(1)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa


In [3]:
X = df[["sepal_length", "sepal_width", "petal_length", "petal_width"]]
y = df['species']

In [4]:
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(y)

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [6]:
class NaiveBayes:
    def __init__(self, alpha=1.0):
        self.alpha = alpha
        self.classes = None
        self.class_prior_probs = {}
        self.feature_probs = {}

    def fit(self, X, y):
        self.classes = np.unique(y)
        for c in self.classes:
            X_c = X[y == c]
            self.class_prior_probs[c] = (len(X_c) + self.alpha) / (len(X) + self.alpha * len(self.classes))
            self.feature_probs[c] = (np.sum(X_c, axis=0) + self.alpha) / (np.sum(X_c) + self.alpha * X.shape[1])

    def _predict_single(self, x):
        posteriors = {c: np.log(self.class_prior_probs[c]) +
                          np.sum(np.log(self.feature_probs[c]) * x) for c in self.classes}
        return max(posteriors, key=posteriors.get)

    def predict(self, X):
        predictions = [self._predict_single(x) for x in X]
        return predictions


In [7]:
# Train Naive Bayes classifier
nb_classifier = NaiveBayes()
nb_classifier.fit(X_train.to_numpy(), y_train)

# Make predictions
predictions = nb_classifier.predict(X_test.to_numpy())
print(predictions)

[1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 1, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 1, 0, 2, 1, 2, 2, 2, 0, 0]


In [8]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Calculate accuracy
accuracy = accuracy_score(y_test, predictions)

# Calculate precision
precision = precision_score(y_test, predictions, average='weighted')

# Calculate recall
recall = recall_score(y_test, predictions, average='weighted')

# Calculate F1-score
f1 = f1_score(y_test, predictions, average='weighted')

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1-score:", f1)


Accuracy: 0.9
Precision: 0.925
Recall: 0.9
F1-score: 0.899248120300752


Based on the evaluation metrics:

* Accuracy: The classifier achieves an accuracy of 90%, indicating that 90% of the samples in the test set are correctly classified by the model.

* Precision: The precision score of 92.5% suggests that when the classifier predicts a class label, it is correct approximately 92.5% of the time on average.

* Recall: The recall score of 90% indicates that the classifier correctly identifies 90% of the samples belonging to each class.

* F1-score: The F1-score of 89.9% is the weighted average of precision and recall, providing a balanced measure of the classifier's performance.

<b>Conclusion: Overall, the Naive Bayes classifier demonstrates a good performance on the test data, with high accuracy, precision, recall, and F1-score. However, further analysis may be required to understand specific areas of improvement or potential issues, such as class imbalances or misclassifications in certain classes.