# Naive Bayes (22MF10006)
---
---
# Subtask 1: Mathematical Understanding

- Naive bayes is a probabilistic classification algorithm which is based on Bayes' theorem with the 'naive' assumption that the input features are independent of each other. It is a supervised machine learning algorithm.

### Principles and Assumptions:

- Naive Bayes is based on the fundamental principles of Bayes' theorem which is used to calculate the probability of an event happening provided that another event has already occurred.

- It assumes that the features are independent of each other and the presence or absence of another feature does not affect any other feature in any manner possible.

### Mathematical Explanation:

- Bayes' Theorem : It is mathematically expressed as following:

            P(class | features) = (P(features | class) * P(class)) / P(features)

     - where P(class | features) represents the probability of class label given the observed features
     - P(features | class) is for the probability of the features given the class label
     - P(class) is the prior probability of the class label (here prior probability refers to the possibility of an event provided there is a finite no of outcomes and each one is equally likely to occur)




- since P(features) will remain constant for all values of our dataset, hence we directly write posterior probability as being directly proportional to the product of conditional probability of each feature belonging to the given class label.

### How model learns and makes predictions:

- During the training phase of the model, likelihood probability or P(features | class) is calculated which represents the probability of observing the features given a specific class label. For features which are categorical, it is calculated as the frequency of each feature within the each class label as we have made the assumption that features are independent. The math has been shown below

       P(features | class) = P(feature_1 | class) * P(feature_2 | class) * P(feature_3 | class)...P(feature_n | class)

for continous features, the likelihood probability is modeled usually using probability density functions, for example in gaussian naive bayes model we assume that each feature assumes gaussian normal distribution.

- The prior probability or P(class) represents the probability of a particular class label in the particular set of data. It is calculated using the no of times that particular label occurs in the data_set.

- During the training phase, the model is provided with data which has labeled class labels and their corresponding features. The algorithm uses the data to calculate the priori and likelihood probabilities.

- To make predictions for the test data, the model uses Bayes' Theorem to calculate P(class | features) or the posterior probabilities for each class label. The class label with highest posteriori probability is considered the predicted class for the particular set of input features.

- In the given data_Set we have numerical features hence we can use gaussian naive bayes model which is a variant of naive bayes. In this model we assume that the features follow a gaussian (Normal) distribution within each class.

- for estimating likelihood probabilities, we estimate the mean (μ) and standard deviation (σ) of each numerical feature within that class which is used to model the gaussian distribution of each feature of each class. 
for a class label(c), and for a numerical feature(x) the likelihood probability, P(x | c) can be found out using probability density function of the gaussian distribution, 

              P(x | c) = (1 / (√(2π) * σ_c)) * exp(-(x - μ_c)^2 / (2 * σ_c^2))

        where μ_c is the mean of feature x for class c
        σ_c is the standard deviation of feature x for class c.
        π is pi (mathematical constant)
        exp() is the exponential function

- as in our classic naive bayes algorithm, here also we estimate prior probabilities P(c) for each class c and to make predictions we calculate the posterior probabilities P(c | x) for each class label c using bayes' theorem and the estimated likelihood and prior probabilities

       P(c | x) = (P(x | c) * P(c)) / P(x)

       where P(c | x) is the posterior probability of class c given the numerical features x.
       P(x | c) is the probability of observing features x given class c (calculated using the Gaussian distribution).
       P(c) is the prior probability of class c
       P(x) is the probability of observing the numerical features x (a normalization factor)

- The class label with the highest posterior probability is considered the predicted class for the given set of numerical features.



----
# Subtask 2: Training and Prediction

In [1]:
import numpy as np
import pandas as pd

In [19]:
class NaiveBayes:

    def fit(self, X, y, alpha=1.0, class_priors=None):
        n_samples, n_features = X.shape
        self._classes = np.unique(y)
        n_classes = len(self._classes)

        # Calculate prior probabilities
        if class_priors is not None:
            self._priors = np.array(class_priors)
        else:
            self._priors = np.zeros(n_classes, dtype=np.float64)
            for idx, c in enumerate(self._classes):
                self._priors[idx] = np.mean(y == c)

        # Laplace smoothing parameter
        self._alpha = alpha

        # calculate mean, var for each class
        self._mean = np.zeros((n_classes, n_features), dtype=np.float64)
        self._var = np.zeros((n_classes, n_features), dtype=np.float64)

        for idx, c in enumerate(self._classes):
            X_c = X[y == c]
            self._mean[idx, :] = X_c.mean(axis=0)
            self._var[idx, :] = X_c.var(axis=0)

        # Add alpha to variances for Laplace smoothing
        self._var += alpha

    def predict(self, X):
        y_pred = [self._predict(x) for x in X]
        return np.array(y_pred)

    def _predict(self, x):
        posteriors = []

        # calculate posterior probability for each class
        for idx, c in enumerate(self._classes):
            prior = np.log(self._priors[idx])
            posterior = np.sum(np.log(self._pdf(idx, x)))
            posterior = posterior + prior
            posteriors.append(posterior)

        # return class with the highest posterior
        return self._classes[np.argmax(posteriors)]

    def _pdf(self, class_idx, x):
        mean = self._mean[class_idx]
        var = self._var[class_idx]
        numerator = np.exp(-((x - mean) ** 2) / (2 * var))
        denominator = np.sqrt(2 * np.pi * var)
        return numerator / denominator


In [7]:
import numpy as np

class MyGaussianNaiveBayes:
    
    def __init__(self, smoothing=1.0):
        self.smoothing = smoothing
        self.means = None
        self.variances = None
        self.class_priors = None
        
    def fit(self, X, y):
        samples_count, features_count = X.shape
        self.classes = np.unique(y)
        classes_count = len(self.classes)

        # class_priors stores prior probabilities, means stores mean for a particular feature provided a given class, similarly, we have variance too
        self.class_priors = np.zeros(classes_count)
        self.means = np.zeros((classes_count, features_count))
        self.variances = np.zeros((classes_count, features_count))

        # Here we calculate prior probabilities of classes and parameters for each feature
        for i, target_class in enumerate(self.classes):
            X_c = X[y == target_class]
            self.class_priors[i] = (len(X_c) + self.smoothing) / (samples_count + classes_count * self.smoothing)
            self.means[i] = X_c.mean(axis=0)
            self.variances[i] = X_c.var(axis=0)

    def _likelihood(self, X, i):
        mean = self.means[i]
        variance = self.variances[i]
        return np.exp(-0.5 * ((X - mean) ** 2) / (variance + 1e-8)) / np.sqrt(2 * np.pi * variance + 1e-8)

    def _predict_one_sample(self, x):
        posteriors = []
        for i, c in enumerate(self.classes):
            likelihood = self._likelihood(np.expand_dims(x, axis=0), i).prod(axis=1)
            posterior = self.class_priors[i] * likelihood
            posteriors.append(posterior)
        return self.classes[np.argmax(posteriors)]

    def predict(self, X):
        return [self._predict_one_sample(x) for x in X]


In [20]:
train = pd.read_csv('ds2_train.csv')
X_train = train[['x_1', 'x_2']].to_numpy()
y_train = train['y'].to_numpy()

test = pd.read_csv('ds2_test.csv')
X_test = test[['x_1', 'x_2']].to_numpy()
y_test = test['y'].to_numpy()

def accuracy_score(y_true, y_pred):
    correct_predictions = 0
    total_samples = len(y_true)

    for true_label, pred_label in zip(y_true, y_pred):
        if true_label == pred_label:
            correct_predictions += 1

    accuracy = correct_predictions / total_samples
    return accuracy

best_accuracy = 0.0
best_alpha = None
alphas = [ 2, 10.0, 100.0, 0.0001, 0.02, 2, 1000]  

for alpha in alphas:
    model = MyGaussianNaiveBayes(smoothing=alpha)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    
    if accuracy > best_accuracy:
        best_accuracy = accuracy
        best_alpha = alpha

print("Best alpha:", best_alpha)
print("Best accuracy:", best_accuracy)

Best alpha: 2
Best accuracy: 0.92
