# Naive Bayes 

## Classes in the Iris Dataset

In the Iris dataset, the target variable we are trying to predict is the species of the Iris flower. There are three possible classes:

1. **Iris-setosa**: Represented by class label `0`.
2. **Iris-versicolor**: Represented by class label `1`.
3. **Iris-virginica**: Represented by class label `2`.

### Features:
Each flower is described by four features:
- **Sepal length** (cm)
- **Sepal width** (cm)
- **Petal length** (cm)
- **Petal width** (cm)

### Goal:
The Naive Bayes classifier will use these four features to predict which of the three classes a given flower belongs to.


# Naive Bayes: Explanation and Math

## 1. Overview of Naive Bayes
Naive Bayes is a **probabilistic classifier** based on **Bayes' Theorem**. It is called "Naive" because it makes a strong assumption: it assumes that all features are **conditionally independent** given the class label.

The goal of Naive Bayes is to classify a new observation $X$ into one of several classes $C_k$ by finding the class that maximizes the **posterior probability** $P(C_k | X)$.

According to **Bayes' Theorem**:

$$
P(C_k | X) = \frac{P(X | C_k) \cdot P(C_k)}{P(X)}
$$

Where:
- $P(C_k | X)$ is the **posterior probability**, i.e., the probability of class $C_k$ given the feature vector $X$.
- $P(X | C_k)$ is the **likelihood**, i.e., the probability of the feature vector $X$ given the class $C_k$.
- $P(C_k)$ is the **prior probability** of class $C_k$.
- $P(X)$ is the **evidence**, which is the total probability of the features $X$.

Since $P(X)$ is the same for all classes, we don’t need to compute it to make predictions. Instead, we only need to maximize the numerator, $P(X | C_k) \cdot P(C_k)$.

---

## 2. Gaussian Naive Bayes

In the case of **Gaussian Naive Bayes**, we assume that the features follow a **Gaussian (Normal) distribution** for each class. For a given feature $X_i$ and class $C_k$, the likelihood $P(X_i | C_k)$ is modeled by the **Gaussian probability density function (PDF)**:

$$
P(X_i | C_k) = \frac{1}{\sqrt{2\pi \sigma_k^2}} \exp\left( -\frac{(X_i - \mu_k)^2}{2\sigma_k^2} \right)
$$

Where:
- $\mu_k$ is the **mean** of feature $X_i$ for class $C_k$.
- $\sigma_k^2$ is the **variance** of feature $X_i$ for class $C_k$.

For each class $C_k$, the algorithm computes:
- The **prior probability** $P(C_k)$, which is the fraction of samples belonging to class $C_k$.
- The **mean** $\mu_k$ and **variance** $\sigma_k^2$ of each feature for class $C_k$ based on the training data.

---

## 3. Bayes' Theorem in Action

To make a prediction, the Naive Bayes algorithm computes the **posterior probability** for each class $C_k$. This is done by combining the prior probability $P(C_k)$ and the likelihoods $P(X_i | C_k)$ for each feature $X_i$.

### Posterior Probability:

$$
P(C_k | X) \propto P(C_k) \cdot \prod_{i=1}^{d} P(X_i | C_k)
$$

Where $d$ is the number of features in the feature vector $X$.

However, to avoid **numerical underflow** when multiplying many small probabilities, we usually take the **logarithm** of the probabilities:

$$
\log(P(C_k | X)) = \log(P(C_k)) + \sum_{i=1}^{d} \log(P(X_i | C_k))
$$

This turns the product into a sum, which is easier to compute and avoids underflow issues.

---

## 4. Summary of the Algorithm:

### Training (fitting the model):
1. **Compute class priors**: For each class $C_k$, compute $P(C_k)$, the proportion of training examples in class $C_k$.
2. **Compute the mean and variance** for each feature in each class. For each class $C_k$, compute $\mu_k$ and $\sigma_k^2$ for each feature.

### Prediction:
1. For each new observation (test sample):
   - Compute the **log posterior probability** for each class:
     $$
     \log(P(C_k | X)) = \log(P(C_k)) + \sum_{i=1}^{d} \log(P(X_i | C_k))
     $$
   - Choose the class $C_k$ that maximizes the posterior probability.

---

## 5. Example Walkthrough:

Let’s walk through a simple example to illustrate how the Naive Bayes classifier works:

### Training Data:
We have 4 samples with 2 features, and two possible classes $C_1$ and $C_2$:

| Feature 1 | Feature 2 | Class |
|-----------|-----------|-------|
| 5.0       | 3.5       | 0     |
| 6.0       | 3.0       | 0     |
| 5.5       | 2.5       | 1     |
| 6.5       | 3.5       | 1     |

- Compute the **mean** and **variance** for each feature in each class.
- Compute the **prior** for each class.

### Prediction:
Given a new sample with feature values $X_1 = 5.5$ and $X_2 = 3.0$, we compute the **posterior probability** for each class:

- Compute the **log prior** for each class.
- Compute the **Gaussian likelihood** for each feature given each class.
- Add the log prior and the sum of the log likelihoods to compute the log posterior for each class.
- **Classify** the new sample based on the class with the highest log posterior probability.

---

## 6. Why is it "Naive"?
The key assumption in Naive Bayes is that all features are **conditionally independent** given the class. This means that the value of one feature does not influence the value of another feature within the same class. This assumption is rarely true in real-world data, but Naive Bayes still performs surprisingly well, especially for **text classification** and **spam filtering**.

---

## 7. Advantages and Limitations:
- **Advantages**:
  - Simple and easy to implement.
  - Works well with small datasets.
  - Particularly effective for **text classification**.
  - Computationally efficient (requires just one pass over the training data).

- **Limitations**:
  - Assumes **conditional independence** of features, which may not hold in real data.
  - Can struggle with datasets where features are highly correlated.
  - For continuous features, Gaussian Naive Bayes assumes a **normal distribution**, which may not be accurate in some cases.

---

### Conclusion:
Naive Bayes is a simple yet powerful classifier that makes predictions based on the application of Bayes' Theorem, using strong (naive) independence assumptions. Despite these assumptions, Naive Bayes often works well in practice and is particularly useful for high-dimensional problems like text classification.


# Naive Bayes - Step-by-step Implementation

## 1. `fit` Method:
The `fit` method computes the **mean**, **variance**, and **prior** for each class.

- **Steps:**
  - Identify unique classes in `y` (target labels).
  - For each class:
    - Extract the subset of the training data corresponding to that class.
    - Compute the mean and variance of each feature within that subset.
    - Compute the prior for that class (i.e., the proportion of data points in that class).

- **Math:**
  - Mean for class $C_k$:
    $$
    \mu_k = \frac{1}{n_k} \sum_{i \in C_k} X_i
    $$
  - Variance for class $C_k$:
    $$
    \sigma_k^2 = \frac{1}{n_k} \sum_{i \in C_k} (X_i - \mu_k)^2
    $$
  - Prior for class $C_k$:
    $$
    P(C_k) = \frac{n_k}{n}
    $$
  where $n_k$ is the number of training samples for class $C_k$, and $n$ is the total number of samples.

**What to do:**
- Store the computed means, variances, and priors for each class in class attributes, such as `self.mean`, `self.var`, and `self.priors`.

---

## 2. `predict` Method:
The `predict` method uses the trained model to predict the class labels for a set of test samples.

- **Steps:**
  - For each sample in the test set, compute the posterior probability for each class using the `_predict` method.
  - Select the class with the highest posterior probability.

- **Math:**
  For each class $C_k$, calculate:
  $$
  P(C_k | X) \propto P(C_k) \prod_{i=1}^{d} P(X_i | C_k)
  $$
  You’ll sum the log of these probabilities to avoid underflow.

**What to do:**
- Iterate over each test sample, call `_predict` for each, and return the predicted class for each sample.

---

## 3. `_predict` Method:
This method computes the **posterior probability** for each class given a test sample, and then selects the class with the highest probability.

- **Steps:**
  - For each class, compute the logarithm of the prior probability $P(C_k)$.
  - Add the sum of the logarithms of the Gaussian probability densities for each feature using the `_pdf` method.
  - Return the class with the highest posterior probability.

- **Math:**
  Posterior probability for class $C_k$:
  $$
  \log(P(C_k | X)) = \log(P(C_k)) + \sum_{i=1}^{d} \log(P(X_i | C_k))
  $$

**What to do:**
- Store these posterior probabilities in a list, and use `np.argmax` to return the class with the highest value.

---

## 4. `_pdf` Method:
The `_pdf` method computes the **Gaussian probability density function** for a given feature value, given a class.

- **Steps:**
  - For each feature in the sample, compute the Gaussian probability density using the class-specific mean and variance.
  - Return the probability for that feature.

- **Math:**
  The Gaussian probability density function is:
  $$
  P(X_i | C_k) = \frac{1}{\sqrt{2\pi \sigma_k^2}} \exp\left( -\frac{(X_i - \mu_k)^2}{2\sigma_k^2} \right)
  $$

**What to do:**
- Use the mean and variance for the class (calculated in the `fit` method) to compute the likelihood of the feature value using the above formula.

---

Import the necessary libraries

In [6]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

Step 2: Load the Iris Dataset

We'll load the Iris dataset into a pandas DataFrame. Since you mentioned the dataset is from UCI, you can directly read it from the file if you have it locally.


In [8]:
# Load the dataset from the correct file path with the extension
df = pd.read_csv(r'C:\Users\Machine-Learning\Downloads\iris\iris.data', header=None)

# Assign column names to the dataset
df.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']

# Display the first few rows of the dataset
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


Step 3: Prepare the Data

Convert the class labels into numerical values and split the data into training and testing sets.


In [10]:
# Convert class labels to numerical values
df['class'] = df['class'].map({'Iris-setosa': 0, 'Iris-versicolor': 1, 'Iris-virginica': 2})

# Separate features and target variable
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 4: Implement the Gaussian Naive Bayes Classifier

Now, let's implement the Naive Bayes classifier. We'll create a class that fits the model to the data and makes predictions.


In [12]:
class GaussianNaiveBayes:
    def fit(self, X, y):
        """
        Fit the Naive Bayes model according to the training data.
        """
        
        #Picking the indexes of each class from the y array
        zero_idx=np.where(y==0)
        ones_idx=np.where(y==1)
        two_idx=np.where(y==2)

        #Picking the rows corresponding to each class from the X Matrix
        X_zeroes = X[zero_idx[0],:]
        X_ones = X[ones_idx[0],:]
        X_twoes = X[two_idx[0],:]

        #Finding the average of the features

        #This code uses the np library to calculate the averages. This will work, but I wanted to hand code it. 
  #      X0_avg = np.average(X_zeroes,axis=0) #if you want the average for each feature (i.e., column) across all samples for a particular class, you should specify axis=0 in np.average, which averages each feature independently.
   #     X1_avg = np.average(X_ones,axis=0)#if you want the average for each feature (i.e., column) across all samples for a particular class, you should specify axis=0 in np.average, which averages each feature independently.
    #    X2_avg = np.average(X_twoes,axis=0)#if you want the average for each feature (i.e., column) across all samples for a particular class, you should specify axis=0 in np.average, which averages each feature independently.

        # The average of class 0
        feature_0=0
        feature_1=0
        feature_2=0
        feature_3=0
        
        for i in range(0, X_zeroes.shape[0]): #Over the rows loop, i= index of sample
            for j in range(0, X_zeroes.shape[1]): #Over the columns loop, j=index of feature

                if j==0:
                    feature_0+=X_zeroes[i][j] #Each feature 0 is added to the variable for each sample
                elif j==1:
                    feature_1 += X_zeroes[i][j] #Each feature 0 is added to the variable for each sample
                elif j==2:
                    feature_2 += X_zeroes[i][j] #Each feature 0 is added to the variable for each sample
                elif j==3:
                    feature_3 += X_zeroes[i][j] #Each feature 0 is added to the variable for each sample

        X0_avg = [feature_0/X_zeroes.shape[0], feature_1/X_zeroes.shape[0], feature_2/X_zeroes.shape[0], feature_3/X_zeroes.shape[0]]

        # The average of class 1
        feature_0=0
        feature_1=0
        feature_2=0
        feature_3=0
        
        for i in range(0, X_ones.shape[0]):
            for j in range(0, X_ones.shape[1]):

                if j==0:
                    feature_0+=X_ones[i][j]
                elif j==1:
                    feature_1 += X_ones[i][j]
                elif j==2:
                    feature_2 += X_ones[i][j]
                elif j==3:
                    feature_3 += X_ones[i][j]
                    
        X1_avg = [feature_0/X_ones.shape[0], feature_1/X_ones.shape[0], feature_2/X_ones.shape[0], feature_3/X_ones.shape[0]]

        # The average of class 2
        feature_0=0
        feature_1=0
        feature_2=0
        feature_3=0
        
        for i in range(0, X_twoes.shape[0]):
            for j in range(0, X_twoes.shape[1]):

                if j==0:
                    feature_0+=X_twoes[i][j]
                elif j==1:
                    feature_1 += X_twoes[i][j]
                elif j==2:
                    feature_2 += X_twoes[i][j]
                elif j==3:
                    feature_3 += X_twoes[i][j]
                    
        X2_avg = [feature_0/X_twoes.shape[0], feature_1/X_twoes.shape[0], feature_2/X_twoes.shape[0], feature_3/X_twoes.shape[0]]
                
        
        #Finding the variance of the features

        # The variance of class 0
        feature_0=0
        feature_1=0
        feature_2=0
        feature_3=0
        
        for i in range(0, X_zeroes.shape[0]):
            for j in range(0, X_zeroes.shape[1]):

                if j==0:
                    feature_0+=(X_zeroes[i][j]-X0_avg[0])**2
                elif j==1:
                    feature_1 += (X_zeroes[i][j]-X0_avg[1])**2
                elif j==2:
                    feature_2 += (X_zeroes[i][j]-X0_avg[2])**2
                elif j==3:
                    feature_3 += (X_zeroes[i][j]-X0_avg[3])**2

        X0_var = [feature_0/X_zeroes.shape[0], feature_1/X_zeroes.shape[0], feature_2/X_zeroes.shape[0], feature_3/X_zeroes.shape[0]]
        
        # The variance of class 1
        feature_0=0
        feature_1=0
        feature_2=0
        feature_3=0
        
        for i in range(0, X_ones.shape[0]):
            for j in range(0, X_ones.shape[1]):

                if j==0:
                    feature_0+=(X_ones[i][j]-X1_avg[0])**2
                elif j==1:
                    feature_1 += (X_ones[i][j]-X1_avg[1])**2
                elif j==2:
                    feature_2 += (X_ones[i][j]-X1_avg[2])**2
                elif j==3:
                    feature_3 += (X_ones[i][j]-X1_avg[3])**2

        X1_var = [feature_0/X_ones.shape[0], feature_1/X_ones.shape[0], feature_2/X_ones.shape[0], feature_3/X_ones.shape[0]]
        
        # The variance of class 2
        feature_0=0
        feature_1=0
        feature_2=0
        feature_3=0
        
        for i in range(0, X_twoes.shape[0]):
            for j in range(0, X_twoes.shape[1]):

                if j==0:
                    feature_0+=(X_twoes[i][j]-X2_avg[0])**2
                elif j==1:
                    feature_1 += (X_twoes[i][j]-X2_avg[1])**2
                elif j==2:
                    feature_2 += (X_twoes[i][j]-X2_avg[2])**2
                elif j==3:
                    feature_3 += (X_twoes[i][j]-X2_avg[3])**2

        X2_var = [feature_0/X_twoes.shape[0], feature_1/X_twoes.shape[0], feature_2/X_twoes.shape[0], feature_3/X_twoes.shape[0]]
        self.mean = np.array([X0_avg, X1_avg, X2_avg])
        self.var = np.array([X0_var, X1_var, X2_var])

        #Calculating the prior probabilities for each class
        prior0 = X_zeroes.shape[0]/X.shape[0]
        prior1 = X_ones.shape[0]/X.shape[0]
        prior2 = X_twoes.shape[0]/X.shape[0]
    
        self.priors=(prior0,prior1,prior2)
        self.classes = np.unique(y) # The labels 0,1,2 stored as [0,1,2]

    
                
    def predict(self, X):
        """
        Perform classification on an array of test vectors X.
        """

        # This will store the predicted class for each sample
        predicted_classes = []

         # Loop through each test sample 'x' in the input 'X'
        for x in X:
            # Call self._predict(x) to get the predicted class for the sample 'x'
            predicted_classes.append(self._predict(x))# Add the predicted class to the list

         # Return the list of all predicted classes
        return predicted_classes
    
    
    def _predict(self, x):
        """
        Compute the posterior probability of each class and return the class with the highest probability.
        """
        posterior_probabilities = []  # This will hold the log-posteriors for each class

        # Loop over each class
        for idx, c in enumerate(self.classes):
            log_prior = np.log(self.priors[idx])  # Get log-prior for the class

        # Get the log-likelihoods for all features
            log_likelihoods = self._pdf(idx, x)  # Returns log-likelihoods

        # Sum the log-likelihoods for all features
            total_log_likelihood = np.sum(log_likelihoods)

        # Calculate the log-posterior for this class
            log_posterior = log_prior + total_log_likelihood

        # Append the log-posterior for this class to the list
            posterior_probabilities.append(log_posterior)

    # Find the class with the highest log-posterior
        best_class = self.classes[np.argmax(posterior_probabilities)]
        return best_class


        
    def _pdf(self, class_idx, x):
        """
        Compute the log of the probability density function of a Gaussian distribution.
        """
    # Retrieve the mean and variance for the class
        mean = self.mean[class_idx]
        var = self.var[class_idx] + 1e-9  # Add epsilon to variance to prevent division by zero

    # Compute the log of the Gaussian probability density function
        numerator = - ((x - mean) ** 2) / (2 * var)
        denominator = - 0.5 * np.log(2 * np.pi * var)
        return numerator + denominator  # Return the log-likelihood



Step 5: Train and Evaluate the Model

Now, train the Naive Bayes classifier using the training data and evaluate its performance on the test set. Explanation

fit method: Computes the mean, variance, and prior probabilities for each class.
predict method: Uses the fitted parameters to compute the posterior probability of each class for a given test point and predicts the class with the highest probability.
_pdf method: Calculates the probability density function for a Gaussian distribution.



In [14]:
# Load your dataset
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split into training and test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Standardize data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize and train your model
nb = GaussianNaiveBayes()
nb.fit(X_train, y_train)

# Make predictions
y_pred = nb.predict(X_test)

# Evaluate model performance
accuracy = accuracy_score(y_test, y_pred)
print(f"Model accuracy: {accuracy * 100:.2f}%")


Model accuracy: 93.33%


# Clarifying `x` in `_predict`

In your `_predict` method, the input `x` represents **a single test sample**.

## What is `x`?
- The variable `x` contains the **features of one test sample**.
- For example, in the Iris dataset, `x` will have 4 feature values: sepal length, sepal width, petal length, and petal width.

For instance, if the sample `x` looks like this:

`x = [5.1, 3.5, 1.4, 0.2]`


This means:
- Sepal length = 5.1
- Sepal width = 3.5
- Petal length = 1.4
- Petal width = 0.2

## What do you do with `x` in `_predict`?
In `_predict`, you need to loop through **each feature** in `x`. 
- For each feature, you will calculate the **likelihood** using the Gaussian PDF.
- The likelihood will be calculated for **each class** (0, 1, and 2) using the `_pdf` method.

### Example of Feature Looping in `_predict`
- Loop through the features in `x` (sepal length, sepal width, etc.).
- For each feature, you will calculate the **log-likelihood** for the current class.

## Where does `X` come into play?
- In the `predict` method (not `_predict`), `X` is the **entire test dataset** containing all test samples.
- `X` contains many samples, and you will loop through them in `predict`.

For example:

`X = [ [5.1, 3.5, 1.4, 0.2], # Sample 1 [7.0, 3.2, 4.7, 1.4], # Sample 2 ... ]`

## How do `x` and `X` differ?
- `X` contains **all test samples**.
- `x` is a **single test sample** with its features.

### Process:
1. In `predict`, you loop through all samples in `X`.
2. For each sample `x`, you call `_predict`.
3. Inside `_predict`, `x` is just one sample, and you loop through its features to calculate the likelihood.

---

# Step-by-Step Explanation of `_predict`

The goal of `_predict` is to compute the **posterior probability** for each class (0, 1, or 2) and return the class with the **highest probability** for a given test sample `x`.

---

## Step 1: Compute the Log of the Prior Probability

For each class $C_k $, start by calculating the **log of the prior**. The prior represents how common the class is in the training data.

```python
log_prior = np.log(self.priors[idx])
```
This gives the initial log-prior probability for the current class.

---
## Step 2: Calculate the Log-Likelihood for Each Feature

For each feature in the test sample $x$, calculate how likely that feature value is for the current class $Ck$​. This is the likelihood, and it’s computed using the Gaussian PDF.

To calculate the likelihood, call `_pdf` for each feature:


```python
likelihood = self._pdf(idx, feature)
log_likelihood = np.log(likelihood)

```
The result is the log-likelihood for that feature.

---

## Step 3: Sum the Log-Likelihoods

You need to sum the log-likelihoods for all the features in the sample. This gives you the total likelihood for that class.

Initialize a variable to keep track of the running total for log-likelihoods:

```python
log_likelihood_sum = 0  # Initialize sum of log-likelihoods
```
Then, inside the loop for each feature:(replaced by vectorized log likelihoods using vectorized operations

```python

log_likelihood_sum += log_likelihood  # Add each log-likelihood to the sum
```

This ensures that you accumulate the log-likelihoods for all the features in the sample.

---

## Step 4: Calculate the Log-Posterior for Each Class

Once you have summed the log-likelihoods for all the features, add this sum to the log-prior to get the log-posterior for the current class.

```python
log_posterior = log_prior + log_likelihood_sum
```

The log-posterior gives you a total score for each class, which represents how likely the sample belongs to that class.

---

Once you have computed the log-posterior for all classes, choose the class with the highest score.

To do this, store the log-posteriors for each class in a list called `posterior_probabilities`. Then, use `np.argmax` to find the class with the highest log-posterior:

```python
posterior_probabilities.append(log_posterior)
best_class = self.classes[np.argmax(posterior_probabilities)]

```
Finally, return the class with the highest log-posterior.

---

## Summary

- Log of the Prior: Start with the log-prior for each class.
- Log-Likelihood: For each feature in `x`, compute the log-likelihood using `_pdf`
- Sum the Log-Likelihoods: Accumulate the log-likelihoods for each feature.
- Log-Posterior: Add the sum of log-likelihoods to the log-prior.
- Return the Best Class: Select the class with the highest log-posterior.

# Summary of `_predict` and `_pdf` Methods

In implementing the Gaussian Naive Bayes classifier, two crucial methods are `_predict` and `_pdf`. These methods are responsible for computing the posterior probabilities and the likelihoods, respectively. Below is a comprehensive summary of these methods, including the challenges faced during implementation and how they were resolved.

---

## `_predict` Method

### Purpose

The `_predict` method computes the posterior probability for each class and returns the class with the highest probability for a given test sample.

### Challenges and Solutions

- Issue: Taking the logarithm of log-likelihoods resulted in `nan` values because log-likelihoods are already in log-space.

- Solution: Removed the unnecessary `np.log` in the `_predict` method when summing log-likelihoods.

- Issue: Incorrect looping over features individually, leading to shape mismatches.

- Solution: Passed the entire feature vector `x` to the `_pdf` method and handled all features at once, leveraging NumPy's vectorization.

## `_pdf` method

## Purpose

The `_pdf` method computes the log of the Gaussian probability density function (PDF) for each feature given a class. Computing in log-space enhances numerical stability and prevents underflow issues with very small probability values.

## Challenges and Solutions

- Issue: Variance values being zero or very small, leading to division by zero or extremely large negative numbers.
 
- Solution: Added a small epsilon `(1e-9)` to the variance to prevent division by zero.

- Issue: Computing probabilities in the original scale led to underflow issues with very small numbers.

- Solution: Performed computations in log-space by directly computing the log of the Gaussian PDF.

- Issue: Shape mismatches when subtracting arrays.

- Solution: Ensured that `x`, `mean`, and `var` are NumPy arrays of the same shape (converted lists to arrays when necessary).

## Key Takeaways

- Compute in Log-Space: Performing calculations in log-space improves numerical stability, especially when dealing with very small probabilities.
- Avoid Redundant Operations: Do not take the logarithm of values that are already in log-space.
- Handle Edge Cases: Add small constants (epsilon) to denominators to avoid division by zero.
- Use Vectorization: Leveraging NumPy's vectorized operations eliminates the need for explicit loops over features, reducing the potential for errors and improving performance.
- Debugging Shape Mismatches: Always check the shapes of arrays when performing element-wise operations to prevent broadcasting errors.

# Conclusion

Implementing the `_predict` and `_pdf` methods was a significant learning experience that reinforced the importance of numerical stability and careful handling of mathematical operations in machine learning algorithms. By addressing the challenges faced, such as handling small variances and avoiding invalid logarithmic operations, the Gaussian Naive Bayes classifier was successfully implemented with high accuracy.

This detailed explanation should serve as a valuable resource for understanding the inner workings of these methods and as a guide for future projects involving probabilistic models.