# Benchmark

### Explanation of the Code

1. **Initialization**: The `NaiveBayes` class initializes dictionaries to hold the means, variances, and prior probabilities for each class.

2. **Fit Method**:
   - The `fit` method calculates the mean and variance of each feature for each class and the prior probabilities based on the class distribution in the training data.

3. **Predict Method**:
   - The `predict` method computes the posterior probability for each class using the Gaussian probability density function. It returns the class with the highest posterior probability.

4. **Example Usage**:
   - A simple dataset is created with two features and a binary label. The model is trained on this dataset, and predictions are made.

In [12]:
# posterior probability of each target calss c1 c2 ... when input features are f1 f2 ...
# 
# f1 f2 f3 ... t(c1 c2 ...) 

The Naive Bayes formula is based on Bayes' theorem and is used to calculate the posterior probability of a class given a set of features. The formula can be expressed as follows:

### Bayes' Theorem

The core of the Naive Bayes classifier is Bayes' theorem, which states:

$$
P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}
$$

Where:
- $$ P(A|B) $$ is the **posterior probability**: the probability of class $$ A $$ given the feature $$ B $$.
- $$ P(B|A) $$ is the **likelihood**: the probability of feature $$ B $$ given class $$ A $$.
- $$ P(A) $$ is the **prior probability**: the probability of class $$ A $$ occurring.
- $$ P(B) $$ is the **evidence**: the total probability of feature $$ B $$.

### Naive Assumption

In the context of Naive Bayes, the "naive" assumption refers to the assumption that all features are independent given the class label. This simplifies the computation of the likelihood:

$$
P(B|A) = P(x_1|A) \times P(x_2|A) \times \ldots \times P(x_n|A)
$$

Where $$ x_1, x_2, \ldots, x_n $$ are the features.

### Putting It All Together

The formula for the Naive Bayes classifier can thus be summarized as:

$$
P(A|x_1, x_2, \ldots, x_n) = \frac{P(A) \times P(x_1|A) \times P(x_2|A) \times \ldots \times P(x_n|A)}{P(x_1, x_2, \ldots, x_n)}
$$

In practice, since $$ P(x_1, x_2, \ldots, x_n) $$ is constant for all classes during classification, it can be ignored when comparing probabilities across classes. Therefore, the classification rule becomes:

$$
\text{Class} = \arg\max_{A} \left( P(A) \times \prod_{i=1}^{n} P(x_i|A) \right)
$$

This means you choose the class $$ A $$ that maximizes the product of the prior probability and the likelihood of the features given that class.

In [None]:
# 

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [5]:
class NaiveBayes:
    def __init__(self):
        self.means = {}
        self.variances = {}
        self.priors = {}
        self.classes = None

    def fit(self, X, y):
        # Calculate the mean, variance, and prior probabilities for each categorical_class
        self.classes = np.unique(y)
        for c in self.classes:
            X_c = X[y == c]
            self.means[c] = X_c.mean(axis=0)
            self.variances[c] = X_c.var(axis=0)
            self.priors[c] = X_c.shape[0] / X.shape[0]

    def predict(self, X):
        # Calculate the posterior probabilities for each class
        posteriors = []
        for c in self.classes:
            prior = np.log(self.priors[c])
            likelihood = -0.5*np.sum(np.log(2 * np.pi * self.variances[c])) \
                         -0.5*np.sum(((X-self.means[c])**2)/self.variances[c], axis=1)
            posterior = prior + likelihood
            posteriors.append(posterior)
        
        return self.classes[np.argmax(posteriors, axis=0)]


In [6]:
# Create a simple dataset
data = {
    'feature1': [1.0, 2.0, 1.5, 2.5, 3.0, 3.5],
    'feature2': [1.0, 1.5, 1.2, 1.8, 2.0, 2.5],
    'label': [0, 0, 0, 1, 1, 1]
}

df = pd.DataFrame(data)
X = df[['feature1', 'feature2']].values
y = df['label'].values

# Initialize and train the Naive Bayes classifier
nb = NaiveBayes()
nb.fit(X, y)

# Make predictions
predictions = nb.predict(X)
print("Predictions:", predictions)


Predictions: [0 0 0 1 1 1]


# Custom

In [7]:
from sklearn import datasets
# Load the Iris dataset
iris = datasets.load_iris()

# Load the Breast Cancer dataset
cancer = datasets.load_breast_cancer()

# For Iris dataset
X = iris.data  # Features
y = iris.target  # Target labels

# For Breast Cancer dataset
X = cancer.data
y = cancer.target


In [8]:
from sklearn.model_selection import train_test_split

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [9]:
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X_train, y_train)


In [10]:
import numpy as np
x1 = np.arange(min(X[:,0]), max(X[:,0]), 0.01)
x2 = np.arange(min(X[:,1]), max(X[:,1]), 0.01)
XX, YY = np.meshgrid(x1, x2)

In [11]:
Z = model.predict_proba(np.c_[XX.ravel(), YY.ravel()])

ValueError: X has 2 features, but GaussianNB is expecting 30 features as input.

In [None]:
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
plt.figure()
plt.contourf(XX, YY, Z[:,0].reshape(XX.shape), cmap=cmap_light)
plt.scatter(X_train[:,0], X_train[:,1], c=y_train)