# Naive Baysian Algorithm

It is a classification technique based on __Bayes’ Theorem__ with an independence assumption 
among predictors. A __Naive Bayes__ classifier assumes that the presence of a 
particular feature in a class is unrelated to the presence of any other feature.

Bayes theorem provides a way of computing posterior probability P(c|x) from P(c), P(x) and 
P(x|c). Look at the equation below:
<!-- ![image.png](attachment:image.png) -->

**Above,**
- P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
- P(c) is the prior probability of class.
- P(x|c) is the likelihood which is the probability of the predictor given class.
- P(x) is the prior probability of the predictor.

## Naive Bayse code from Scratch

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split 
from sklearn.metrics import accuracy_score

### Dataset

In [2]:
dataset = pd.read_csv('dataset.csv')
print("Loaded Data: \n",dataset)

Loaded Data: 
      Outlook  Temp Humidity Windy Play
0      Rainy   Hot     High     f   no
1      Rainy   Hot     High     t   no
2   Overcast   Hot     High     f  yes
3      Sunny  Mild     High     f  yes
4      Sunny  Cool   Normal     f  yes
5      Sunny  Cool   Normal     t   no
6   Overcast  Cool   Normal     t  yes
7      Rainy  Mild     High     f   no
8      Rainy  Cool   Normal     f  yes
9      Sunny  Mild   Normal     f  yes
10     Rainy  Mild   Normal     t  yes
11  Overcast  Mild     High     t  yes
12  Overcast   Hot   Normal     f  yes
13     Sunny  Mild     High     t   no
14  Overcast  Cool     High     t   no


In [3]:
# Split data into features and labels
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]

# Split data into training dataset and test dataset
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2,random_state=0)

In [4]:
# Helper function to calculate probabilities
def calculate_prob(data,label_column):
    prob = {}
    labels, counts = np.unique(data[:,label_column],return_counts=True)
    total_samples = len(data)
    
    for label, count in zip(labels, counts):
        pro[label] = count/total_samples
        
    return prob

In [5]:
def train_nb(X, y):
    unique_labels = np.unique(y)
    num_features = X.shape[1]

    prob = {}

    for label in unique_labels:
        label_indices = np.where(y == label)[0]
        label_data = X.iloc[label_indices]
        prob[label] = []

        for i in range(num_features):
            feature_values = np.unique(label_data.iloc[:, i])
            prob[label].append(feature_values)

    return prob

In [6]:
def predict_nb(instance, prob):
    labels = list(prob.keys())
    num_features = len(prob[labels[0]])

    max_prob = 0
    predicted_label = None

    for label in labels:
        label_prob = prob[label]
        instance_prob = 1.0

        for i, value in enumerate(instance):
            if value in label_prob[i]:
                instance_prob *= 1 / len(label_prob[i])
            else:
                instance_prob = 0

        if instance_prob > max_prob:
            max_prob = instance_prob
            predicted_label = label

    return predicted_label

### Train & Test the Naive Bayes 

In [7]:
probabilities = train_nb(X, y)

# Example
test_instance = pd.Series(['Sunny', 'Hot', 'High'])
# Make prediction
prediction = predict_nb(test_instance, probabilities)

print(f'The predicted label for the instance {test_instance.tolist()} is: {prediction}')

The predicted label for the instance ['Sunny', 'Hot', 'High'] is: no
