# 1. Introduction

Naive Bayes is a family of probabilistic algorithms based on Bayes' Theorem, particularly suited for classification tasks. It is widely used in spam detection, sentiment analysis, document classification, and medical diagnosis due to its simplicity and effectiveness.

# 2. Why is it Called "Naive Bayes"?

- Bayes: It is based on Bayes' Theorem, which describes the probability of an event based on prior knowledge of conditions related to the event.

- Naive: It assumes independence between features, i.e., the presence of one feature does not affect the presence of another. This assumption is rarely true in real-world data, hence the term "naive."

Despite its simplicity, Naive Bayes often performs surprisingly well in many complex real-world situations.

# 3. Key Features of Naive Bayes Classifiers

- Simple and fast
- Handles high-dimensional data well
- Performs well with categorical and text data
- Requires small amounts of training data

# 4. Assumptions in Naive Bayes

- Feature Independence: All features contribute independently to the probability of a class.
- Equal Importance: Each feature contributes equally.
- No Missing Values: Data should ideally have no missing values, or they should be handled before training.



# 5. Understanding Bayes' Theorem

Bayes' Theorem is the foundation of the Naive Bayes classifier. It is expressed as:

![image.png](attachment:4d82b24c-8da0-4a56-a03a-3994edff18d2.png)

Where:

- P(C|X): Posterior probability of class C given the feature vector X
- P(X|C): Likelihood of the feature vector X given class C
- P(C): Prior probability of class C
- P(X): Evidence or marginal likelihood of feature vector X

Since P(C) is constant for all classes, we use:

P(C|X) ‚ç∫ P(X|C) * P(C)


# 6. Types of Naive Bayes Classifiers

- Gaussian Naive Bayes: Assumes features follow a normal distribution (used for continuous features).
- Multinomial Naive Bayes: Works well with discrete features (e.g., word counts).
- Bernoulli Naive Bayes: Works with binary/boolean features (e.g., word presence).

# 7. Numerical

| Age | Income | Buy |
| --- | ------ | --- |
| 25  | 50     | No  |
| 30  | 60     | No  |
| 45  | 80     | Yes |
| 35  | 65     | Yes |


Step 1: Calculate Priors

P(Buy = Yes) = 2/4 = 0.5

P(Buy = No) = 2/4 = 0.5

Step 2: Calculate Mean and Standard Deviation for Each Feature by Class
For Buy = No:

Age: mean = (25+30)/2 = 27.5, std = sqrt(((25-27.5)¬≤ + (30-27.5)¬≤)/2) = 2.5

Income: mean = (50+60)/2 = 55, std = sqrt(((50-55)¬≤ + (60-55)¬≤)/2) = 5

For Buy = Yes:

Age: mean = (45+35)/2 = 40, std = sqrt(((45-40)¬≤ + (35-40)¬≤)/2) = 5

Income: mean = (80+65)/2 = 72.5, std = sqrt(((80-72.5)¬≤ + (65-72.5)¬≤)/2) = 7.5


![image.png](attachment:211c6623-26a8-4614-a8c7-a46b72b6014e.png)

## Test sample (Age: 40 and Income 70 vako bela buy yes or no)

![image.png](attachment:b39d4d34-1812-48a1-aac9-9466c40370bc.png)

Calculate Gaussian Probabilities for Prediction (Age=40, Income=70)

For Buy = No:

P(Age=40|No) = (1/(‚àö(2œÄ)*2.5)) * exp(-(40-27.5)¬≤/(2*2.5¬≤)) ‚âà 0.000443

P(Income=70|No) = (1/(‚àö(2œÄ)*5)) * exp(-(70-55)¬≤/(2*5¬≤)) ‚âà 0.000886

Joint likelihood = 0.5 * 0.000443 * 0.000886 ‚âà 1.96e-7

For Buy = Yes:

P(Age=40|Yes) = (1/(‚àö(2œÄ)*5)) * exp(-(40-40)¬≤/(2*5¬≤)) ‚âà 0.079788

P(Income=70|Yes) = (1/(‚àö(2œÄ)*7.5)) * exp(-(70-72.5)¬≤/(2*7.5¬≤)) ‚âà 0.053939

Joint likelihood = 0.5 * 0.079788 * 0.053939 ‚âà 0.002152

Step 4: Calculate Posterior Probabilities

Total evidence = (1.96e-7 + 0.002152) ‚âà 0.002152

P(No|Age=40,Income=70) ‚âà 1.96e-7 / 0.002152 ‚âà 0.000091 (0.0091%)

P(Yes|Age=40,Income=70) ‚âà 0.002152 / 0.002152 ‚âà 0.999909 (99.9909%)

Prediction:

The model predicts "Yes" (Buy) with 99.99% probability for Age=40 and Income=70.