## Baye's Theorem

Bayes' theorem is a formula in probability theory that describes the probability of an event occurring based on prior knowledge or information. It is named after Reverend Thomas Bayes, who first formulated it.

The theorem states that:

\begin{equation}
P(A|B) = P(B|A) * \frac{P(A)}{P(B)}
\end{equation}

where:

- `P(A|B)` is the probability of event `A` occurring given that event `B` has occurred.
- `P(B|A)` is the probability of event `B` occurring given that event `A` has occurred.
- `P(A)` is the prior probability of event `A` occurring.
- `P(B)` is the prior probability of event `B` occurring.


![image.png](attachment:ef1bcb32-d533-4a9b-9e0c-07572e308519.png)

## Assumption of Naive Bayes
- Feature independence: The features of the data are conditionally independent of each other, given the class label.
- Continuous features are normally distributed: If a feature is continuous, then it is assumed to be normally distributed within each class.
- Discrete features have multinomial distributions: If a feature is discrete, then it is assumed to have a multinomial distribution within each class.
- Features are equally important: All features are assumed to contribute equally to the prediction of the class label.
- No missing data: The data should not contain any missing values.

### Naive Bayes Classifier

![image.png](attachment:df27bfc3-7c0e-47a9-b0f8-4a5d5bca64f5.png)

## Code

In [4]:
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import LabelEncoder

In [2]:
data = {
    "Outlook": ["Rainy", "Rainy", "Overcast", "Sunny", "Sunny", "Sunny", "Overcast", "Rainy", "Rainy", "Sunny", "Rainy", "Overcast", "Overcast", "Sunny"],
    "Temperature": ["Hot", "Hot", "Hot", "Mild", "Cool", "Cool", "Cool", "Mild", "Cool", "Mild", "Mild", "Mild", "Hot", "Mild"],
    "Humidity": ["High", "High", "High", "High", "Normal", "Normal", "Normal", "High", "Normal", "Normal", "Normal", "High", "Normal", "High"],
    "Windy": [False, True, False, False, False, True, True, False, False, False, True, True, False, True],
    "Play Golf": ["No", "No", "Yes", "Yes", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "No"]
}
df = pd.DataFrame(data)

In [9]:
encoder = LabelEncoder()
for each in df.columns:
    df[each] = encoder.fit_transform(df[each])

In [10]:
df

Unnamed: 0,Outlook,Temperature,Humidity,Windy,Play Golf
0,1,1,0,0,0
1,1,1,0,1,0
2,0,1,0,0,1
3,2,2,0,0,1
4,2,0,1,0,1
5,2,0,1,1,0
6,0,0,1,1,1
7,1,2,0,0,0
8,1,0,1,0,1
9,2,2,1,0,1


In [15]:
X = df.drop(['Play Golf'], axis = 1).values
y = df['Play Golf'].values

In [16]:
classifier = GaussianNB()
classifier.fit(X,y)

In [18]:
classifier.predict_proba([[2,1,1,0]])

array([[0.12487422, 0.87512578]])

In [19]:
classifier.predict([[2,1,1,0]])

array([1])