# Gaussian Naive Bayes
Gaussian Naive Bayes is a probabilistic machine learning algorithm used for classification tasks, particularly well-suited for datasets with continuous features.
## Key Concepts and Assumptions
The algorithm is a variant of the Naive Bayes classifier, which is based on Bayes' Theorem but makes two key simplifying assumptions:
- 1. Naive Independence: It assumes that all features are conditionally independent of each other, given the class label ($P(x_i | y, x_1, \dots, x_{i-1}, x_{i+1}, \dots, x_n) = P(x_i | y)$). This is the "naive" part of the name, as this assumption is often violated in real-world data, yet the classifier can still perform surprisingly well.
- 2. Gaussian Distribution: It assumes that the continuous values associated with each class are distributed according to a Gaussian (Normal) distribution. This is the "Gaussian" part of the name.
## How it Works
The goal is to find the class $C_k$ that maximizes the posterior probability $P(C_k | \mathbf{X})$, where $\mathbf{X}$ is the feature vector of a new data point.
 1. Bayes' Theorem is applied to calculate the posterior probability:$$P(C_k | \mathbf{X}) = \frac{P(C_k) \cdot P(\mathbf{X} | C_k)}{P(\mathbf{X})}$$
  2. Due to the Naive Independence assumption, the likelihood $P(\mathbf{X} | C_k)$ is simplified into the product of individual feature likelihoods:$$P(\mathbf{X} | C_k) \propto P(C_k) \cdot \prod_{i=1}^{n} P(x_i | C_k)$$(The denominator $P(\mathbf{X})$ is ignored because it is a constant across all classes).
  3. The Gaussian Distribution assumption is used to calculate the individual feature likelihood $P(x_i | C_k)$ using the Gaussian Probability Density Function (PDF):$$P(x_i | C_k) = \frac{1}{\sqrt{2\pi\sigma_{k,i}^2}} e^{-\frac{(x_i - \mu_{k,i})^2}{2\sigma_{k,i}^2}}$$where $\mu_{k,i}$ (mean) and $\sigma_{k,i}^2$ (variance) are estimated from the training data for feature $x_i$ belonging to class $C_k$.The classifier predicts the class $C_k$ that yields the highest value for $P(C_k) \cdot \prod_{i=1}^{n} P(x_i | C_k)$.
# Python Implementation
In Python, the scikit-learn library provides an easy-to-use implementation via the GaussianNB class
