#Exercise 4: Implementing Naïve Bayes classifier from scratch using formula that handles continuous-valued input attributes using Gaussian distribution. Use the Iris dataset from sklearn.

Naïve Bayes is a probabilistic classifier based on Bayes’ Theorem, with a strong (naïve) assumption:

All features are independent given the class. Gaussian Naïve Bayes is a version used when the input features are continuous (numerical) — like height, weight, or petal length in the Iris dataset.

It assumes:

Each feature value for a given class is normally (Gaussian) distributed.

That's why we use the Gaussian (bell curve) formula to calculate the likelihood of a feature value given a class.

Compute Likelihood Using Gaussian PDF
Formula: Search it P(x∣μ,σ)=2π​σ1​⋅exp(−2σ2(x−μ)2​)

Why? This gives us the probability of a feature value x for a given class.

Apply Bayes’ Theorem for Posterior Probability
P(class / features)∝P(class)⋅∏P(featurei / class)

Multiply all the feature probabilities together with the class prior (assumed uniform here).

Why? This gives the posterior probability — the probability of the class given the input features.

In [3]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,classification_report, confusion_matrix
data=load_iris()
X=data.data
y=data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
class GaussianNaiveBayes:
  def fit(self,X,y):
    self.classes=np.unique(y)
    self.parameters={}
    for c in self.classes:
      X_c=X[y==c]
      self.parameters[c]={
          'mean':X_c.mean(axis=0),
          'std':X_c.std(axis=0)
      }
  def _calculate_likelihood(self, mean, std, x):
    exponent = np.exp(-((x - mean) ** 2) / (2 * std ** 2))
    return (1 / (np.sqrt(2 * np.pi) * std)) * exponent

  def _calculate_posterior(self, x):
    posteriors = []
    for c in self.classes:
      prior = 1/len(self.classes)
      likelihood = np.prod(self._calculate_likelihood(self.parameters[c]["mean"], self.parameters[c]["std"], x))
      posterior = prior * likelihood
      posteriors.append(posterior)
    return self.classes[np.argmax(posteriors)]

  def predict(self, X):
        return np.array([self._calculate_posterior(x) for x in X])

model=GaussianNaiveBayes()
model.fit(X_train,y_train)
y_pred=model.predict(X_test)
print(accuracy_score(y_test,y_pred))
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

1.0
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

