# Naive Bayes

Naive Bayes is a simple yet powerful algorithm used for classification tasks in machine learning. It is based on Bayes' Theorem, which describes the probability of an event based on prior knowledge of conditions that might be related to the event. The "naive" aspect of the algorithm comes from the assumption that all features are independent of each other, which is often not the case in real-world data.

## Dataset

Our dataset,

| Outlook | Temperature | Humidity | Windy | Play |
| :--- | :--- | :--- | :--- | :--- |
| sunny | hot | high | false | no |
| sunny | hot | high | true | no |
| overcast | hot | high | false | yes |
| rainy | mild | high | false | yes |
| rainy | cool | normal | false | yes |
| rainy | cool | normal | true | no |
| overcast | cool | normal | true | yes |
| sunny | mild | high | false | no |
| sunny | cool | normal | false | yes |
| rainy | mild | normal | false | yes |
| sunny | mild | normal | true | yes |
| overcast | mild | high | true | yes |
| overcast | hot | normal | false | yes |
| rainy | mild | high | true | no |

## Theory

###  Bayes' Theorem

According to Bayes' theorem, this is proportional to the **prior** $P(y)$ multiplied by the **likelihood** $P(E | y)$:

$$
P(y | E) \propto P(y) \cdot P(E | y)
$$

The "naive" assumption is that all features are independent, so we can break down the likelihood:

$$
P(E | y) = P(\text{sunny} | y) \cdot P(\text{cool} | y) \cdot P(\text{high} | y) \cdot P(\text{true} | y)
$$

This gives us the final formula we need to compare:

$$
Y_{predicted} = \arg\max_{y \in \{\text{yes, no}\}} \left[ P(y) \cdot P(\text{sunny} | y) \cdot P(\text{cool} | y) \cdot P(\text{high} | y) \cdot P(\text{true} | y) \right]
$$

### Likelihood Calculations (with Laplace Smoothing)

Our code uses **Laplace (or Add-1) Smoothing** to prevent zero-probability problems.  The formula for each conditional probability is:

$$
P(x_i | y) = \frac{\text{count}(x_i, y) + 1}{\text{count}(y) + k}
$$

Where:
* $\text{count}(x_i, y)$ is the number of times the feature value $x_i$ appears with class $y$.
* $\text{count}(y)$ is the total number of times class $y$ appears.
* $k$ is the total number of unique values for that feature (e.g., $k=3$ for Outlook, $k=2$ for Windy).


## Implementation

In [None]:
dataset = [
    ['sunny', 'hot', 'high', 'false', 'no'],
    ['sunny', 'hot', 'high', 'true', 'no'],
    ['overcast', 'hot', 'high', 'false', 'yes'],
    ['rainy', 'mild', 'high', 'false', 'yes'],
    ['rainy', 'cool', 'normal', 'false', 'yes'],
    ['rainy', 'cool', 'normal', 'true', 'no'],
    ['overcast', 'cool', 'normal', 'true', 'yes'],
    ['sunny', 'mild', 'high', 'false', 'no'],
    ['sunny', 'cool', 'normal', 'false', 'yes'],
    ['rainy', 'mild', 'normal', 'false', 'yes'],
    ['sunny', 'mild', 'normal', 'true', 'yes'],
    ['overcast', 'mild', 'high', 'true', 'yes'],
    ['overcast', 'hot', 'normal', 'false', 'yes'],
    ['rainy', 'mild', 'high', 'true', 'no']
]

In [7]:
def train_naive_bayes(data):
    label_counts = {}
    feature_counts = {}

    for row in data:
        outlook, temp, humidity, windy, label = row

        label_counts[label] = label_counts.get(label, 0) + 1

        if label not in feature_counts:
            feature_counts[label] = {"Outlook": {}, "Temp": {}, "Humidity": {}, "Windy": {}}

        feature_counts[label]["Outlook"][outlook] = feature_counts[label]["Outlook"].get(outlook, 0) + 1
        feature_counts[label]["Temp"][temp] = feature_counts[label]["Temp"].get(temp, 0) + 1
        feature_counts[label]["Humidity"][humidity] = feature_counts[label]["Humidity"].get(humidity, 0) + 1
        feature_counts[label]["Windy"][windy] = feature_counts[label]["Windy"].get(windy, 0) + 1

    return label_counts, feature_counts

label_counts, feature_counts = train_naive_bayes(dataset)

In [None]:
def predict_naive_bayes(x, label_counts, feature_counts):
    total = sum(label_counts.values())
    probs = {}
    
    feature_names = ["Outlook", "Temp", "Humidity", "Windy"]

    for label in label_counts:
        probs[label] = label_counts[label] / total

        for i, feature in enumerate(feature_names):
            value = x[i] # Get 'sunny', then 'cool', etc.
            
            count = feature_counts[label][feature].get(value, 0)
            
            num_options = len(feature_counts[label][feature])

            probs[label] *= (count + 1) / (label_counts[label] + num_options)

    return max(probs, key=probs.get)


test_sample = ['sunny', 'cool', 'high', 'true']
prediction = predict_naive_bayes(test_sample, label_counts, feature_counts)

print("Test Sample:", test_sample)
print("Predicted Class:", prediction)