## Implementation of Naive Bayes Classifier from Scratch

### Objective: 

Develop a Naive Bayes classifier for classification problems without using external ML libraries. This improves the understanding of probabilistic modeling approaches.

#### Methodology:  

- NaiveBayes class implements classifier functionality in an object-oriented manner.
- The fit() method calculates class priors, mean, and variance to learn Gaussian distributions for each feature.  
- The predict() method evaluates log posteriors by multiplying log priors and log PDF per class to return the predicted class label.
- Data is generated using scikit-learn for evaluation on 1000 samples with 2 classes and 10 features.

#### Evaluation:

- Train-test split is done to validate on held-out data. 
- Accuracy metric is used with 96.5% achieved, showing algorithm effectiveness.

#### Skills:

- Probabilistic modeling concepts
- Implementation of Bayes' theorem for classification  
- NumPy for numeric computing
- Data generation and preprocessing 
- Model evaluation best practices

#### Outcome:
- Intuitive and rigorous implementation establishes a solid understanding of probabilistic algorithms.  
- Successful evaluation of simulated data proves the viability of the from-scratch approach.
- Key probability concepts like density functions, priors, and posteriors are seamlessly integrated.


In conclusion, this project demonstrates capabilities in algorithms, probabilistic modeling, and the ability to build classifiers from the ground up without libraries. A clean, modularized code structure and end-to-end validation approach are showcased. Developing basic machine learning techniques from first principles aids conceptual clarity.

### Importing the libraries

In [2]:
import numpy as np

### Naive Bayes Classifier Architecture 

In [3]:
class NaiveBayes:
    def fit(self, X, y):
        N, n = X.shape
        self.classes = np.unique(y)
        n_classes = len(self.classes)

        self.mean = np.zeros((n_classes, n), dtype=np.float64)
        self.var = np.zeros((n_classes, n), dtype=np.float64)
        self.priors = np.zeros(n_classes, dtype=np.float64)

        for idx, c in enumerate(self.classes):
            X_c = X[ y==c ]
            self.mean[idx, :] = X_c.mean(axis=0)
            self.var[idx, :] = X_c.var(axis=0)
            self.priors[idx] = len(X_c) / N

    def predict(self, X):
        y_pred = [self._predict(x) for x in X]
        return np.array(y_pred)

    def _predict(self, x):
        posteriors = []
        
        for idx, c in enumerate(self.classes):
            prior = self.priors[idx]
            posterior = np.sum( np.log(self._pdf(idx, x)) )
            posterior = posterior + prior
            posteriors.append(posterior)

        return self.classes[np.argmax(posteriors)]

   
    def _pdf(self, idx, x):
        mean = self.mean[idx, :]
        var = self.var[idx, :]

        numerator = np.exp(-(x-mean)**2/(2*var**2))
        denuminator = np.sqrt(2 * np.pi * var**2) 
        return numerator / denuminator



### Importing the dataset

In [4]:
from sklearn import datasets
X, y = datasets.make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=123)

### Splitting the dataset into the Training set and Test set

In [6]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

### Training the KNN

In [7]:
nb = NaiveBayes()
nb.fit(X_train, y_train)

### Predicting the Test set & Evaluation  

In [8]:
predictions = nb.predict(X_test)

In [9]:
 def accuracy(y_true, y_pred):
     accuracy = np.sum(y_true == y_pred) / len(y_true)
     return accuracy

print("Naive Bayes classification accuracy", accuracy(y_test, predictions))

Naive Bayes classification accuracy 0.965
