# The Principle of Naive Bayes

The Naive Bayes algorithms rely on the Bayes' theorem. Let's recall it quickly. This theorem calculates the probability of an event based on prior knowledge of potentially related events. It is represented mathematically as:

$P(A|B) = \dfrac{P(B|A)\cdot P(A)}{P(B)}$

Where $P(A|B)$ is the posterior probability of class ($A$) given predictor ($B$). It's what we are trying to calculate. $P(B|A)$ is the likelihood, which is the probability of the predictor given a class. $P(B)$ is the marginal probability of predictor, and $P(A)$ is the prior probability of the class. This formula forms the backbone of the Naive Bayes classifier.  

The term 'naive' refers to the assumption that all variables in a dataset are independent of each other, which may not always be the case in real-life data. Nonetheless, it still offers robust performance and can be easily implemented.

# Naive Bayes Classifier: Derivation and Example

## 1. Bayes Theorem in Classification
In **Naive Bayes Classification**, we want to compute the posterior probability:

$
P(Y=1 \mid X_1=x_1, X_2=x_2, ..., X_n=x_n)
$

Using Bayes' theorem:

$
P(Y \mid X) = \frac{P(X \mid Y) \, P(Y)}{P(X)}
$

- \(P(Y)\): **Prior** — probability of the class before seeing data.
- \(P(X \mid Y)\): **Likelihood** — probability of features given the class.
- \(P(X)\): **Evidence** (denominator, same for all classes).
- \(P(Y \mid X)\): **Posterior** — probability of the class given the features.

Since the denominator \(P(X)\) is constant across classes, the classifier maximizes:

$
P(Y, X) = P(X \mid Y) \, P(Y)
$

This is the foundation of **Naive Bayes Classification**.


## 2. Example Dataset

| Temperature | Humidity | Weather |
|-------------|----------|---------|
| Hot         | High     | Sunny   |
| Hot         | High     | Sunny   |
| Cold        | Normal   | Snowy   |
| Hot         | Normal   | Rainy   |
| Cold        | High     | Snowy   |
| Cold        | Normal   | Snowy   |
| Cold        | Normal   | Sunny   |

Classes for `Weather`: **Sunny (3)**, **Rainy (1)**, **Snowy (3)**.  
Total = 7 instances.


### 3. Prior Probabilities

* $ P(Sunny) = \frac{3}{7} \approx 0.43$

* $P(Rainy) = \frac{1}{7} \approx 0.14$

* $P(Snowy) = \frac{3}{7} \approx 0.43$

### 4. Likelihoods

We compute conditional probabilities of features given the class.

#### For **Sunny** (3 instances):
* $P(Hot \mid Sunny) = \frac{2}{3} \approx 0.67$

* $P(Cold \mid Sunny) = \frac{1}{3} \approx 0.33$
* $P(High \mid Sunny) = \frac{2}{3} \approx 0.67$

* $P(Normal \mid Sunny) = \frac{1}{3} \approx 0.33$



#### For **Rainy** (1 instance):
* $P(Hot \mid Rainy) = 1.00$

* $P(Cold \mid Rainy) = 0.00$

* $P(High \mid Rainy) = 0.00$

* $P(Normal \mid Rainy) = 1.00$



#### For **Snowy** (3 instances):
* $ P(Hot \mid Snowy) = 0.00$

* $P(Cold \mid Snowy) = 1.00$

* $P(High \mid Snowy) = \frac{1}{3} \approx 0.33$

* $P(Normal \mid Snowy) = \frac{2}{3} \approx 0.67$



### 5. Summary
- **Prior probabilities** represent how frequent each weather condition is overall.  
- **Likelihoods** represent how features (Temperature, Humidity) distribute **within each class**.  
- Together, they let us compute the posterior for a new case and classify it by picking the class with the highest posterior probability.


# Implementing Naive Bayes Classifier

In [1]:
import pandas as pd

def calculate_prior_probabilities(y):
    #  Calculate prior probabilities for each class
    return y.value_counts(normalize=True)


def calculate_likehoods(X, y):
    likehoods = {}
    for column in X.columns:
        likehoods[column] = {}
        for class_ in y.unique():
            # Filter feature column data for each class
            class_data = X[y == class_][column]
            counts = class_data.value_counts()
            total_count = len(class_data) # Total count of instances for current class
            likehoods[column][class_] = counts / total_count
    return likehoods

In [2]:
def naive_bayes_classifier(X_test, priors, likelihoods):
    predictions = []
    for _, data_point in X_test.iterrows():
        class_probabilities = {}
        for class_ in priors.index:
            class_probabilities[class_] = priors[class_]
            for feature in X_test.columns:
                # Use .get to safely retrieve probability and get a default of 1/total to handle unseen values
                feature_probs = likelihoods[feature][class_]
                class_probabilities[class_] *= feature_probs.get(data_point[feature], 1 / (len(feature_probs) + 1))

        # Predict class with maximum posterior probability
        predictions.append(max(class_probabilities, key=class_probabilities.get))

    return predictions

### Understanding and Handling Data Issues in Naive Bayes
A recurring challenge in Naive Bayes is the handling of zero probabilities, i.e., when a category does not appear in the training data for a given class, resulting in a zero probability for that category. A known fix for this problem is applying Laplace or Add-1 smoothing, which adds a '1' to each category count to circumvent zero probabilities.

You can integrate Laplace smoothing into the calculate_likelihoods function as follows:

In [3]:
def calculate_likelihoods_with_smoothing(X, y):
    likelihoods = {}
    for column in X.columns:
        likelihoods[column] = {}
        for class_ in y.unique():
            # Calculate normalized counts with smoothing
            class_data = X[y == class_][column]
            counts = class_data.value_counts()
            total_count = len(class_data) + len(X[column].unique())  # total count with smoothing
            likelihoods[column][class_] = (counts + 1) / total_count  # add-1 smoothing
    return likelihoods

The numerator is increased by 1 and the denominator by the count of unique categories to accommodate the added 1's.

## Using Naive Bayes Classifier
Here is a short example of predicting weather with our classifier:

In [4]:
data = {
    'Temperature': ['Hot', 'Hot', 'Cold', 'Hot', 'Cold', 'Cold', 'Cold'],
    'Humidity': ['High', 'High', 'Normal', 'Normal', 'High', 'Normal', 'Normal'],
    'Weather': ['Sunny', 'Sunny', 'Snowy', 'Rainy', 'Snowy', 'Snowy', 'Sunny']
}
df = pd.DataFrame(data)

# Split features and labels
X = df[['Temperature', 'Humidity']]
y = df['Weather']

# Calculate prior probabilities
priors = calculate_prior_probabilities(y)

# Calculate likelihoods with smoothing
likelihoods = calculate_likelihoods_with_smoothing(X, y)

# New observation
X_test = pd.DataFrame([{'Temperature': 'Cold', 'Humidity': 'Normal'}])

# Make prediction
prediction = naive_bayes_classifier(X_test, priors, likelihoods)
print("Predicted Weather: ", prediction[0])  # Output: Predicted Weather:  Snowy

Predicted Weather:  Snowy
