# Naive Bayes 
Naive Bayes is a classification algorithm based on Bayes' Theorem with the "naive" assumption of conditional independence between every pair of features given the class value.
```bash
Bayes Theorem Foundation
Bayes Theorem:
P(A|B) = [P(B|A) × P(A)] / P(B)

In classification context:

P(Class|Features) = [P(Features|Class) × P(Class)] / P(Features)

The "Naive" Assumption
The algorithm assumes that all features are conditionally independent given the class:

P(Feature₁, Feature₂, ..., Featureₙ|Class) = P(Feature₁|Class) × P(Feature₂|Class) × ... × P(Featureₙ|Class)

This simplifies the computation dramatically, even though this assumption is rarely true in real life (hence "naive").

Mathematical Formulation
For a given instance with features x₁, x₂, ..., xₙ, we predict the class C that maximizes:

P(C|X) ∝ P(C) × Π P(xᵢ|C)

Where:

P(C) = Prior probability of class C

P(xᵢ|C) = Likelihood of feature xᵢ given class C

Π = Product over all features

Types of Naive Bayes Classifiers
1. Gaussian Naive Bayes
Assumes continuous features follow normal distribution

Used for continuous data

2. Multinomial Naive Bayes
Used for discrete counts (e.g., word counts in text)

Common for text classification

3. Bernoulli Naive Bayes
Used for binary/boolean features

Common for document classification with word presence/absence

Step-by-Step Example 1: Weather Classification
Problem: Predict whether to play tennis based on weather conditions

Training Data:
Outlook	Temperature	Humidity	Wind	Play Tennis
Sunny	 Hot	    High     	Weak	      No
Sunny	 Hot	    High	   Strong	      No
Overcast Hot	    High	   Weak	          Yes
Rain	 Mild	    High	   Weak	          Yes
Rain	 Cool	    Normal	   Weak	          Yes
Rain	 Cool	    Normal	   Strong         No
Overcast Cool	    Normal	   Strong	      Yes
Sunny	 Mild	    High	   Weak	          No
Sunny	 Cool	    Normal	   Weak	          Yes
Rain	 Mild	    Normal	   Weak	          Yes
Sunny	 Mild	    Normal	   Strong	      Yes
Overcast Mild	    High	   Strong	      Yes
Overcast Hot	    Normal	   Weak	          Yes
Rain	 Mild	    High	   Strong	      No
Classify: (Sunny, Cool, High, Strong)
Step 1: Calculate Priors

P(Yes) = 9/14 ≈ 0.643

P(No) = 5/14 ≈ 0.357

Step 2: Calculate Likelihoods for "Yes"

P(Sunny|Yes) = 2/9 ≈ 0.222

P(Cool|Yes) = 3/9 ≈ 0.333

P(High|Yes) = 3/9 ≈ 0.333

P(Strong|Yes) = 3/9 ≈ 0.333

Step 3: Calculate Likelihoods for "No"

P(Sunny|No) = 3/5 = 0.6

P(Cool|No) = 1/5 = 0.2

P(High|No) = 4/5 = 0.8

P(Strong|No) = 3/5 = 0.6

Step 4: Calculate Posterior Probabilities

P(Yes|X) ∝ 0.643 × 0.222 × 0.333 × 0.333 × 0.333 ≈ 0.0053

P(No|X) ∝ 0.357 × 0.6 × 0.2 × 0.8 × 0.6 ≈ 0.0206

Step 5: Normalize

P(Yes|X) = 0.0053 / (0.0053 + 0.0206) ≈ 0.205

P(No|X) = 0.0206 / (0.0053 + 0.0206) ≈ 0.795

Prediction: NO (don't play tennis)

Step-by-Step Example 2: Text Classification (Spam Detection)
Problem: Classify emails as "Spam" or "Not Spam"

Training Data:
Email Text	Label
"win money now"	Spam
"meeting tomorrow"	Not Spam
"free lottery"	Spam
"project meeting"	Not Spam
"buy now win free"	Spam
"team lunch"	Not Spam
Classify: "free meeting"
Step 1: Calculate Priors

P(Spam) = 3/6 = 0.5

P(Not Spam) = 3/6 = 0.5

Step 2: Build Vocabulary & Calculate Likelihoods
Vocabulary: {win, money, now, meeting, free, lottery, project, buy, team, lunch}

For Spam:

Total words in spam: 8

P("free"|Spam) = (1 + 1) / (8 + 10) = 2/18 ≈ 0.111 (with Laplace smoothing)

P("meeting"|Spam) = (0 + 1) / (8 + 10) = 1/18 ≈ 0.056

For Not Spam:

Total words in not spam: 6

P("free"|Not Spam) = (0 + 1) / (6 + 10) = 1/16 ≈ 0.0625

P("meeting"|Not Spam) = (2 + 1) / (6 + 10) = 3/16 ≈ 0.1875

Step 3: Calculate Posterior Probabilities

P(Spam|"free meeting") ∝ 0.5 × 0.111 × 0.056 ≈ 0.00311

P(Not Spam|"free meeting") ∝ 0.5 × 0.0625 × 0.1875 ≈ 0.00586

Prediction: NOT SPAM

Step-by-Step Example 3: Gaussian Naive Bayes (Continuous Data)
Problem: Classify flowers based on petal length and width

Training Data (simplified):
Petal Length	Petal Width	Species
1.4	0.2	Setosa
1.3	0.2	Setosa
4.7	1.4	Versicolor
4.5	1.5	Versicolor
6.0	2.5	Virginica
5.9	1.8	Virginica
Classify: (Petal Length = 5.1, Petal Width = 1.9)
Step 1: Calculate Priors

P(Setosa) = 2/6 ≈ 0.333

P(Versicolor) = 2/6 ≈ 0.333

P(Virginica) = 2/6 ≈ 0.333

Step 2: Calculate Gaussian Parameters
For Versicolor:

Petal Length: μ = 4.6, σ = 0.141

Petal Width: μ = 1.45, σ = 0.071

For Virginica:

Petal Length: μ = 5.95, σ = 0.071

Petal Width: μ = 2.15, σ = 0.495

Step 3: Calculate Gaussian Probabilities
Gaussian PDF: P(x|μ,σ) = (1/(σ√(2π))) × exp(-(x-μ)²/(2σ²))

For Versicolor:

P(Length=5.1|Versicolor) ≈ 0.0002 (using Gaussian calculation)

P(Width=1.9|Versicolor) ≈ 0.0001

For Virginica:

P(Length=5.1|Virginica) ≈ 0.53

P(Width=1.9|Virginica) ≈ 0.80

Step 4: Calculate Posteriors

P(Versicolor|X) ∝ 0.333 × 0.0002 × 0.0001 ≈ 6.67e-9

P(Virginica|X) ∝ 0.333 × 0.53 × 0.80 ≈ 0.141

Prediction: VIRGINICA
```