# Naive Bayes classifier implementation:

The Naive Bayes algorithm is a simple but powerful probabilistic classifier based on applying Bayes' theorem. This implementation builds the classifier **from scratch** in Python.

### Key aspects:

- **NaiveBayes class** contains methods for fitting the model from training data and predicting new samples.

- fit() method calculates mean, variance, and class priors for each feature dimension per class. This learns the probability distributions. 

- predict() takes the log of posterior probabilities per class and returns the class with the highest probability via argmax(). 

- _predict() calculates posterior as the log of priors multiplied by log PDF (probability density function) of each feature per class.

- PDF uses Gaussian distribution with learned means and variances as it assumes independence between features (naive assumption).

- **Data** for evaluation is generated using sklearn make_classification() with 1000 samples, 2 classes and 10 features. 

- Train test split is done to evaluate held-out data.

- Predictions are made and accuracy is calculated, showing ~96.5% accuracy validating the approach.

This provides an intuitive understanding of the Naive Bayes algorithm and how each component like priors, posterior calculation, fitting, and predicting fits together. 

The from-scratch implementation without libraries helps cement the conceptual and implementation aspects of probabilistic classification. Proper evaluation establishes the viability of this simple but effective algorithm.

## The fit method:

This class function extracts statistics like mean, variance, and priors from the raw training data for each class-feature combination. These learned statistics are then used during prediction to apply the Naive Bayes assumptions. Here is a detailed line-by-line explanation of the fit method:


def fit(self, X, y):
- Fits the NB model to the training data X and labels y

N, n = X.shape  
- Gets the number of samples N and features n from X shape

self.classes = np.unique(y)
- Finds unique class labels present in y

n_classes = len(self.classes)
- Counts number of classes 

self.mean = np.zeros((n_classes, n))
self.var = np.zeros((n_classes, n))
- Initializes mean and variance arrays to store per-class stats

self.priors = np.zeros(n_classes)
- Initializes priors array 

for idx, c in enumerate(self.classes):
- Loops through each class

X_c = X[y==c]  
- Filters X to only samples of class c

self.mean[idx, :] = X_c.mean(axis=0)
- Takes the mean of each feature in X_c and stores in the mean 

self.var[idx, :] = X_c.var(axis=0)  
- Takes variance of each feature in X_c and stores in var

self.priors[idx] = len(X_c) / N
- Calculates prior as a fraction of samples of class c

## The prediction method:

This class function provides a comprehensive understanding of how the Naive Bayes model makes probabilistic predictions by efficiently calculating posteriors class-by-class. Here's a detailed line-by-line explanation of the predict method:


def predict(self, X):
- This method takes the trained NaiveBayes model and new input data X to make predictions on.

y_pred = [self._predict(x) for x in X]
- A list is initialized to hold the predicted class labels for each sample in X. 
- The _predict helper method is called on each sample x to get its prediction.

return np.array(y_pred)
- The list of predictions is converted to a NumPy array and returned.

def _predict(self, x):
- Helper method that takes a single sample x and predicts its class.

posteriors = []
- An empty list is used to store the posterior probability calculated for each class.

for idx, c in enumerate(self.classes):
- Loops through each unique class learned during fit().

prior = self.priors[idx]  
- Extracts the prior probability of this class from what was stored during fit().

posterior = np.sum(np.log(self._pdf(idx, x)))
- Calls method to get log PDF values of x for this class and sums them  
- Takes the log as posterior is the product of log priors and log PDFs

posterior = posterior + prior
- Adds the log prior to the log posterior 

posteriors.append(posterior)
- Appends the posterior to the list

return self.classes[np.argmax(posteriors)]
- Returns the class label with the highest posterior probability via argmax