<h1 style="text-align:center"><b>NAIVE BAYES CLASSIFICATION</b></h1>

Naive Bayes is a classification algorithm based on Bayes' theorem with a strong assumption of independence between features. It is a simple yet effective algorithm widely used in text classification and spam filtering tasks. 

The Naive Bayes classifier calculates the probability of a given input belonging to a particular class by using Bayes' theorem. The theorem states that the probability of a hypothesis (class) being true, given the observed evidence (features), is proportional to the likelihood of the evidence given the hypothesis, multiplied by the prior probability of the hypothesis. In the case of Naive Bayes, the assumption of independence allows us to simplify the calculation of the likelihood.

### **1. PROBABILITY**
#### **INDEPENDENT EVENT:**

Independent events are events that have no influence on each other. In probability theory, two events are considered independent if the occurrence of one event does not affect the probability of the other event occurring.

Mathematically, for two events A and B to be independent, the probability of their joint occurrence (A and B happening together) should be equal to the product of their individual probabilities. In other words, if $P(A)$ represents the probability of event A happening and $P(B)$ represents the probability of event B happening, then:

$P(A and B) = P(A) * P(B)$

This equation shows that the probability of both events happening together is simply the product of their individual probabilities.

To determine whether events are independent, you can also check if the conditional probability of one event given the occurrence of the other event is equal to the probability of the first event. In other words:

$P(A | B) = P(A)$

If the above equation holds, it implies that the occurrence of event B has no impact on the probability of event A.

It's important to note that independence of events is different from mutually exclusive events. Mutually exclusive events cannot occur together, whereas independent events can occur simultaneously without influencing each other.


*Example:*
1. Consider flipping a fair coin twice. Let's define two events:
- Event A: The first coin flip results in a heads.
- Event B: The second coin flip results in a heads.

In this scenario, Event A and Event B are independent events because the outcome of the first coin flip does not affect the outcome of the second coin flip.

The probability of getting a heads on a fair coin is 0.5. So, the probability of Event A (getting heads on the first coin flip) is $P(A) = 0.5$.

Similarly, the probability of getting heads on the second coin flip is also 0.5. So, the probability of Event B (getting heads on the second coin flip) is $P(B) = 0.5.$

Now, let's calculate the joint probability of both events occurring together. Since the coin flips are independent, the joint probability can be calculated as the product of their individual probabilities:

$P(A and B) = P(A) * P(B) = 0.5 * 0.5 = 0.25$

So, the probability of both coin flips resulting in heads is 0.25.

In this example, the independence of the events is evident as the outcome of the first coin flip does not affect the outcome of the second coin flip.

2. Rolling a Dice.

#### **DEPENDENT EVENT:**

In probability theory, dependent events are events that are influenced by or affected by the occurrence of other events. In other words, the outcome or probability of one event is influenced by the outcome or occurrence of another event.

There are two types of dependence between events: dependent and independent.

1. Independent Events: Independent events are those in which the occurrence or outcome of one event has no influence on the occurrence or outcome of another event. In this case, the probability of one event happening does not affect the probability of the other event happening. For example, when flipping a fair coin twice, the outcome of the first flip (heads or tails) does not affect the outcome of the second flip.

2. Dependent Events: Dependent events are events where the outcome or occurrence of one event affects the outcome or occurrence of another event. The probability of one event happening is influenced by the outcome of the other event. For example, drawing cards from a deck without replacement is an example of dependent events. The probability of drawing a certain card in the second draw depends on the outcome of the first draw because the deck has changed.

To determine the probability of dependent events, you need to consider the information from previous events. You can use conditional probability, which is the probability of an event given that another event has already occurred.

When calculating the probability of dependent events, you often use the multiplication rule. The multiplication rule states that the probability of two dependent events A and B occurring is equal to the probability of event A happening multiplied by the probability of event B happening given that event A has occurred. Mathematically, it can be written as 
$P(A and B) = P(A) * P(B|A).$

Understanding the dependence between events is important in various fields, including statistics, probability theory, and decision-making. It helps in analyzing real-life situations, making predictions, and understanding the relationships between different events.

*Example:*

Suppose you have a bag containing 5 red marbles and 3 blue marbles. You randomly select two marbles from the bag without replacement. Let's consider the following events:

Event A: Selecting a red marble on the first draw.
Event B: Selecting a blue marble on the second draw.

In this scenario, the events are dependent because the outcome of the first draw affects the outcome of the second draw. Let's calculate the probabilities:

The probability of event A (selecting a red marble on the first draw) is P(A) = 5/8 because there are 5 red marbles out of a total of 8 marbles in the bag.

After the first draw, we have 4 red marbles and 3 blue marbles remaining in the bag. So, for event B (selecting a blue marble on the second draw), the probability P(B|A) is 3/7 because there are 3 blue marbles left out of a total of 7 marbles remaining.

Using the multiplication rule, we can calculate the probability of both events occurring:

$P(A and B) = P(A) * P(B|A)$
           = $(5/8) * (3/7)$
           ≈ $0.2679$

So, the probability of selecting a red marble on the first draw and a blue marble on the second draw, without replacement, is approximately 0.2679 or 26.79%.

### **2. BAYES THEOREM:**

Bayes' theorem is a fundamental concept in probability theory and statistics. It provides a way to update the probability of a hypothesis (an event or proposition) based on new evidence or observed data. The theorem is named after Thomas Bayes, an 18th-century English mathematician.

Mathematically, Bayes' theorem can be stated as follows:

$P(A|B) = (P(B|A) * P(A)) / P(B)$

Where:
- P(A|B) is the posterior probability of hypothesis A given the evidence B.
- P(B|A) is the likelihood, or the probability of observing the evidence B given that hypothesis A is true.
- P(A) is the prior probability of hypothesis A, i.e., the probability of A being true before considering any evidence.
- P(B) is the probability of observing the evidence B, regardless of the hypothesis.

The theorem allows us to update our beliefs or probabilities about a hypothesis based on new evidence. Here's a step-by-step explanation of how Bayes' theorem is applied:

1. Establish the prior probability: Start with an initial estimate or belief about the probability of the hypothesis A before considering any evidence, denoted by $P(A)$.

2. Determine the likelihood: Assess the likelihood of observing the evidence B given that hypothesis A is true, denoted by $P(B|A)$. This represents how well the evidence supports the hypothesis.

3. Calculate the marginal likelihood: Determine the probability of observing the evidence B, regardless of the hypothesis, denoted by $P(B)$. This is calculated by considering the total probability of all possible ways in which B can occur.

4. Compute the posterior probability: Apply Bayes' theorem to calculate the updated probability of the hypothesis A given the evidence B, denoted by $P(A|B)$. This is done by multiplying the prior probability by the likelihood and dividing by the marginal likelihood.

Bayes' theorem is widely used in various fields, including statistics, machine learning, data analysis, and decision-making. It provides a formal framework for updating beliefs and making rational inferences based on observed data or evidence. The theorem has numerous applications, such as spam filtering, medical diagnosis, natural language processing, and Bayesian inference in statistics.


In this example, the occurrence of event A (selecting a red marble) affects the probability of event B (selecting a blue marble) because the number of available red and blue marbles changes after the first draw.


#### **CONDITIONAL PROBABILITY:**

Conditional probability is a concept in probability theory that measures the probability of an event occurring given that another event has already occurred. It allows us to calculate the probability of an event A happening, given that event B has occurred.

The conditional probability of event A given event B is denoted as P(A|B), read as "the probability of A given B". It can be calculated using the formula:

$P(A|B) = P(A ∩ B) / P(B)$

Where:
- $P(A ∩ B)$ represents the probability of both events A and B occurring simultaneously, i.e., the intersection of A and B.
- $P(B)$ is the probability of event B occurring.

To calculate the conditional probability, we divide the probability of the joint occurrence of events A and B by the probability of event B alone. This normalization accounts for the fact that we are considering event B as the new sample space.

It's important to note that the conditional probability is only defined when P(B) is not equal to zero, i.e., when event B has a non-zero probability of occurring.

Here's an example to illustrate conditional probability:

Suppose you have a deck of 52 cards, and you draw one card at random. Let's define the following events:
- A: Drawing a heart card.
- B: Drawing a red card.

Now, we want to calculate the probability of drawing a heart card given that we have already drawn a red card (event B).

The probability of drawing a red card (B) is $P(B) = 26/52 = 1/2$, since half of the cards in the deck are red.

The probability of drawing a heart card and a red card simultaneously $(A ∩ B) is P(A ∩ B) = 13/52 = 1/4$, as there are 13 red hearts in the deck.

Using the formula for conditional probability, we have:
$P(A|B) = P(A ∩ B) / P(B) = (1/4) / (1/2) = 1/2$

Therefore, the probability of drawing a heart card given that we have already drawn a red card is 1/2.

Conditional probability plays a crucial role in various fields, including statistics, machine learning, and decision-making, as it allows us to update probabilities based on new information or events that have occurred.


#### **CUMULATIVE PROBABILITY:**

Cumulative probability, also known as cumulative distribution function (CDF), refers to the probability that a random variable takes on a value less than or equal to a given value. In other words, it measures the probability of a random variable being less than or equal to a specific value.

The cumulative probability is denoted as $P(X ≤ x)$, where X is the random variable and x is the value for which we want to calculate the cumulative probability. The cumulative probability function gives the probability of the random variable being less than or equal to a particular value.

Mathematically, the cumulative probability function can be defined as:

$F(x) = P(X ≤ x)$

The cumulative probability function satisfies the following properties:

1. The cumulative probability is always between 0 and 1: 0 ≤ F(x) ≤ 1.
2. The cumulative probability is a non-decreasing function: If x1 < x2, then F(x1) ≤ F(x2).
3. The cumulative probability approaches 1 as x approaches positive infinity: lim(x→∞) F(x) = 1.
4. The cumulative probability approaches 0 as x approaches negative infinity: lim(x→-∞) F(x) = 0.

The cumulative probability function provides a complete description of the probability distribution of a random variable. By evaluating the cumulative probability function at different values of x, we can determine the probabilities associated with various ranges of values or calculate percentiles.

It's worth noting that the cumulative probability function is closely related to the probability density function (PDF) for continuous random variables. The PDF gives the probability density at a specific value, while the cumulative probability function integrates the PDF from negative infinity up to that value, yielding the cumulative probability.

Cumulative probability is widely used in statistics, probability theory, and data analysis. It helps in understanding the overall distribution of a random variable and allows for the calculation of various statistics and quantiles associated with the distribution.

**Here's a step-by-step overview of how the Naive Bayes algorithm works:**

1. Data Preparation: First, you need to prepare your training data by representing each instance with a set of features or attributes. These features can be categorical or numerical.

2. Training: The Naive Bayes algorithm calculates the probabilities of each feature given each class in the training dataset. It estimates the prior probability of each class by counting the frequency of each class in the training data.

3. Feature Independence Assumption: Naive Bayes assumes that all features are independent of each other given the class label. Although this assumption is rarely true in real-world scenarios, Naive Bayes can still perform well in many cases.

4. Likelihood Calculation: Naive Bayes calculates the likelihood of each feature given the class using the training data. For categorical features, it estimates the probability by counting the frequency of each feature value for each class. For numerical features, it typically assumes a probability distribution (e.g., Gaussian distribution) and estimates the mean and standard deviation for each class.

5. Applying Bayes' Theorem: Once the likelihoods and priors are calculated, Naive Bayes applies Bayes' theorem to calculate the posterior probability of each class given the observed features. The class with the highest posterior probability is then assigned as the predicted class.

6. Classification: Finally, the trained Naive Bayes model can be used to classify new, unseen instances by applying the same calculations to calculate the posterior probabilities and selecting the most probable class.

One of the main advantages of Naive Bayes is its simplicity and efficiency. It can handle large feature spaces and is particularly effective when the independence assumption holds reasonably well. However, its performance may suffer if the features are highly dependent or if there is insufficient training data. Additionally, Naive Bayes is not suitable for tasks that require capturing complex relationships between features.

**Q. A Person Played Tennis or Not at Different Condition?**

<center>

| DAY | <div style="text-align: center;">OUTLOOK</div> | <div style="text-align:center;">TEMPERATURE</div> | <div style="text-align:center;">HUMIDITY</div> | <div style="text-align:center;">WIND</div> | <div style="text-align:center;">PLAY TENNIS</div> |
|-----|---------|-------------|----------|------|-------------|
| D1  | Sunny   | Hot         | High     | Weak | No          |
| D2  | Sunny   | Hot         | High     | Strong| No         |
| D3  | Overcast| Hot         | High     | Weak | Yes         |
| D4  | Rainy   | Mild        | High     | Weak | Yes         |
| D5  | Rainy   | Cool         | Normal     | Weak | Yes          |
| D6  | Rainy   | Cool         | Normal     | Strong | No          |
| D7  | Overcast| Cool         | Normal     | Strong | Yes          |
| D8  | Sunny   | Mild         | High     | Weak | No          |
| D9  | Sunny   | Cool         | Normal     | Weak | Yes          |
| D10 | Rainy   | Mild         | Normal     | Weak | Yes          |
| D11 | Sunny   | Mild         | Normal     | Strong | Yes          |
| D12 | Overcast| Mild         | High     | Strong | Yes          |
| D13 | Overcast| Hot         | Normal     | Weak | Yes          |
| D14 | Rainy   | Mild         | High     | Strong | No          |

</center>

**OUTLOOK FEATURE**
- There are 3 unique categories in `OUTLOOK` Feature. i.e. Sunny, Overcast, Rainy.

<center>

| CATEGORY | YES | NO | PR(E/YES) | PR(E/NO) |
|----------|-----|----|-----------|----------|
| Sunny    | 2   | 3  | 2/9       | 3/5      |
| Overcast | 4   | 0  | 4/9       | 0/5      |
| Rain     | 3   | 2  | 3/9       | 2/5      |
|<b>Total</b>     | 9   | 5  |           |          |
</CENTER>

**TEMPERATURE FEATURE**
- There are 3 unique categories in `TEMPERATURE` Feature. i.e. Hot, Mild, Cool.

<center>

| CATEGORY | YES | NO | PR(E/YES) | PR(E/NO) |
|----------|-----|----|-----------|----------|
| Hot    | 2   | 2  | 2/9       | 2/5      |
| Mild | 4   | 2  | 4/9       | 2/5      |
| Cool     | 3   | 1  | 3/9       | 1/5      |
|<b>Total</b>     | 9   | 5  |           |          |
</CENTER>

Total Yes = 9

Total No = 5

$Pr(Yes) = 9/14$

$Pr(No) = 5/14$


$Pr(Yes/(Sunny, Hot))$ = $Pr(Yes) * Pr(Sunny/Yes) * Pr(Hot/Yes)$

                       = (9/14) * (2/9) * (2/9)

                       = 2/63

$Pr(Yes/(Sunny, Hot))$ = $0.031$


$Pr(No/(Sunny, Hot))$ = $Pr(No) * Pr(Sunny/No) * Pr(Hot/No)$

                       = (5/14) * (3/5) * (2/5)

                       
$Pr(No/(Sunny, Hot))$ = $0.085$


$Pr(Yes/(Sunny, Hot)) = (0.031)/(0.031 + 0.085) = 0.27 = 27% $

$Pr(No/(Sunny, Hot)) = (0.085)/(0.031 + 0.085) = 0.7327 = 73.27% $




**Q. FIND OUT PROBABILITY OF OUTLOOK - RAINY AND TEMPERATURE - MILD?**

$Pr(Yes/(Rainy, Mild)$ = $Pr(Yes) * Pr(Rainy/Yes) * Pr(Mild/Yes)$

                       = (9/14) * (3/9) * (4/9)
                       
$Pr(Yes/(Rainy, Mild))$ = $0.095$


$Pr(No/(Rainy, Mild))$ = $Pr(No) * Pr(Rainy/No) * Pr(Mild/No)$

                       = (5/14) * (2/5) * (2/5)

                       
$Pr(No/(Rainy, Mild))$ = $0.0571$


$Pr(Yes/(Rainy, Mild)) = (0.095) / (0.095 + 0.0571) = .6244 = 62.44$

$Pr(No/(Rainy, Mild)) = (0.0571) / (0.095 + 0.0571) = .3754 = 37.54$





### **TYPES OF NAIVE BAYES**

#### **1. Bernoulli Naive Bayes:**
- **BernoulliNB** implements the naive Bayes training and classification algorithms for data that is distributed according to multivariate Bernoulli distributions; i.e., there may be multiple features but each one is assumed to be a binary-valued (Bernoulli, boolean) variable. Therefore, this class requires samples to be represented as binary-valued feature vectors; if handed any other kind of data, a BernoulliNB instance may binarize its input (depending on the binarize parameter).


#### **2. Multinomial Naive Bayes:**
- **MultinomialNB** implements the naive Bayes algorithm for multinomially distributed data, and is one of the two classic naive Bayes variants used in text classification (where the data are typically represented as word vector counts, although tf-idf vectors are also known to work well in practice). 

#### **3. Gaussian Naive Bayes:**
- Whenever Your Independent Features Follows Gaussian Distribution then We Will use Gaussian Naive Bayes.
- Independent Features ==> Follows Gaussian Naive Bayes.

**IMPORT LIBRARIES**

In [None]:
import pandas as pd
import numpy as np

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.naive_bayes import GaussianNB

In [None]:
X, y = load_iris(return_X_y=True)

In [None]:
X

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
       [4.9, 3

In [None]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .30, random_state=42)

In [None]:
gnb =  GaussianNB()

In [None]:
gnb.fit(X_train, y_train)

In [None]:
y_pred_g = gnb.predict(X_test)

In [None]:
y_pred_g

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 2, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 2, 1, 1, 0,
       0])

In [None]:
print(f"Number of Mislabeled Points Out of a Total {X_test.shape[0]} Points: {(y_test != y_pred_g).sum()}")

Number of Mislabeled Points Out of a Total 45 Points: 1


In [None]:
round(accuracy_score(y_test, y_pred_g)*100,2)

97.78