# Naive Bayes Classifier Implementation and Testing
This notebook tests each function of the `NaiveBayes` class step-by-step and provides a demo using a sample dataset (Iris dataset).

In [2]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from naivebayesfromscratch import NaiveBayes

## Step 1: Load and Prepare Data
We'll load the Iris dataset, split it into training and test sets, and explore its structure.

In [3]:
data = load_iris()
X, y = data.data, data.target

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training data shape: {X_train.shape}")
print(f"Test data shape: {X_test.shape}")
print(f"Labels: {np.unique(y)}")

Training data shape: (120, 4)
Test data shape: (30, 4)
Labels: [0 1 2]


## Step 2: Initialize and Train the Naive Bayes Classifier
We initialize the Naive Bayes classifier and call `fit()` to calculate the mean, variance, and priors for each class.

In [6]:
model = NaiveBayes()
model.fit(X_train, y_train)

# Display computed means, variances, and priors
print(f"shape of Means: {model._mean.shape}")
print('Class Means:\n', model._mean)
print('\nClass Variances:\n', model._var)
print('\nClass Priors:', model._priors)

shape of Means: (3, 4)
Class Means:
 [[4.99       3.4525     1.45       0.245     ]
 [5.9195122  2.77073171 4.24146341 1.32195122]
 [6.53333333 2.96666667 5.52051282 2.        ]]

Class Variances:
 [[0.1239     0.15249375 0.033      0.010975  ]
 [0.28693635 0.10011898 0.22584176 0.04122546]
 [0.4165812  0.0991453  0.28573307 0.08205128]]

Class Priors: [0.33333333 0.34166667 0.325     ]


## Step 3: Test the PDF Function (`_pdf`)
This function calculates the probability density function values for a given sample and class index.

In [8]:
x_sample = X_train[0]
class_idx = 0

pdf_values = model._pdf(class_idx, x_sample)

print(f'PDF values for class {class_idx} and sample {x_sample}:', pdf_values)

PDF values for class 0 and sample [4.6 3.6 1.  0.2]: [0.61348483 0.95126977 0.10213125 3.47249724]


## Step 4: Test the Predict Function (`predict`)
We use the trained classifier to predict the classes of the test set and calculate accuracy.

In [9]:
y_pred = model.predict(X_test)

# Evaluate accuracy
accuracy = np.mean(y_pred == y_test)

print(f'Predictions: {y_pred}')
print(f'True labels: {y_test}')
print(f'Accuracy: {accuracy:.2f}')

Predictions: [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0]
True labels: [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0]
Accuracy: 1.00
