## NAvie Bayes Algorithm 
- Probability
- Bayes Theorem

The Naive Bayes algorithm is a popular probabilistic classification algorithm based on Bayes' theorem with the assumption of independence among the features. The mathematical derivation of the Naive Bayes equation involves applying Bayes' theorem and making the independence assumption.

Let's consider a classification problem with two classes: class A and class B. We want to classify a new instance X based on its feature vector x = (x₁, x₂, ..., xₙ), where xᵢ represents the value of the i-th feature.

Bayes' theorem states:

P(A|B) = (P(B|A) * P(A)) / P(B)

In our case, we want to find the probability that the instance belongs to class A given its feature vector, P(A|x).

Applying Bayes' theorem to our problem, we have:

P(A|x) = (P(x|A) * P(A)) / P(x)

Now, let's make the naive assumption that the features are conditionally independent given the class. This assumption simplifies the equation by allowing us to express the joint probability P(x|A) as the product of the individual probabilities P(xᵢ|A) for each feature:

P(A|x) = (P(x₁|A) * P(x₂|A) * ... * P(xₙ|A) * P(A)) / P(x)

Similarly, we can calculate the probability for class B, P(B|x):

P(B|x) = (P(x₁|B) * P(x₂|B) * ... * P(xₙ|B) * P(B)) / P(x)

To classify the instance x, we compare the values of P(A|x) and P(B|x) and choose the class with the highest probability.

Note that to compute the probabilities P(xᵢ|A) and P(xᵢ|B), we typically use training data to estimate them. For example, if the features are binary (0 or 1), we can calculate the probability of each feature value given a class by counting the occurrences in the training data and dividing by the total number of instances in that class.

The Naive Bayes algorithm is a simple and effective machine learning algorithm for solving classification problems. It is based on Bayes' theorem, which is a mathematical formula that describes the probability of an event occurring given the probability of other events that have already occurred.

The Naive Bayes algorithm makes the following assumptions:

The presence of a particular feature in a class is unrelated to the presence of any other feature. This is called the independence assumption.
The features are conditionally independent given the class. This means that the probability of a feature occurring is the same for all classes, given that the class has already been determined.
The Naive Bayes algorithm can be used to solve a variety of classification problems, such as spam filtering, sentiment analysis, and medical diagnosis.

Here is an example of how the Naive Bayes algorithm can be used to solve a classification problem. Suppose we have a dataset of emails, and we want to build a model that can predict whether an email is spam or not. We can use the Naive Bayes algorithm to build this model by following these steps:

We first need to identify the features that we will use to train our model. In this case, we could use the following features:

The length of the email
The number of exclamation points in the email
The presence of certain words or phrases in the email (such as "free" or "offer")
We then need to collect a training dataset of emails that have already been labeled as spam or not spam. We can use this dataset to calculate the probability of each feature occurring for each class (spam or not spam).

Once we have calculated the probabilities of each feature occurring for each class, we can use the Naive Bayes algorithm to calculate the probability that a new email is spam. This is done by multiplying the probabilities of each feature occurring, given that the email is spam.

The Naive Bayes algorithm is a simple and effective machine learning algorithm that can be used to solve a variety of classification problems. It is especially useful for problems where the number of features is large. However, it is important to note that the Naive Bayes algorithm makes the independence assumption, which may not always be true. This can lead to inaccurate prediction

## Variants of Naive Bayes
- 1. Bernouli Navie Bayes : When feature (indepedent are yes or no type) are following Bernouli Distribution.
- 2. Multinomial Navie Bayes :  Example : I/p = test or spam or] not.
- 3. Gaussian Navie Bayes: if the indepentdent feature are follows gaussian Navie Bayes.



## Navie Bayes Implementaion

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

In [6]:
X,y  = load_iris(return_X_y=True)

In [11]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [12]:
X_train, X_test, y_train,  y_test =  train_test_split(X,y,
                                       test_size=0.3, random_state=42)

In [13]:
from sklearn.naive_bayes  import GaussianNB
gnb = GaussianNB()

In [14]:
gnb.fit(X_train,y_train)

In [15]:
y_pred = gnb.predict(X_test)

In [16]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [17]:
print(accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test,y_pred))

0.9777777777777777
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      0.92      0.96        13
           2       0.93      1.00      0.96        13

    accuracy                           0.98        45
   macro avg       0.98      0.97      0.97        45
weighted avg       0.98      0.98      0.98        45

[[19  0  0]
 [ 0 12  1]
 [ 0  0 13]]


In [18]:
import seaborn as sns

In [19]:
df = sns.load_dataset("tips")

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2
