# **Bayes theorem**


*Bayes theorem calculates the probability of an event based on prior knowledge of conditions related to the event.


   $  P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} $



 * P(A|B) : Posterior probability(probability
  of A given B)

* P(B|A) : Likelihood(Probability of B given A)

* P(A): PriorProbability (probability of A happening without any condition)

*  p(B) : Evidence (probability of B)

**NAIVE BAYES THEOREM**



*   A simple classification algorithm based on Bayes' theorem.
*  Assumes independence between features(the "naive" part).

*   Work well even with this assumption.
*   Fast and efficient for many tasks like spam detection and text classification





**Steps in naive bayes:**



*   Calculate prior probability for each class .
*   Compute conditional probabilities for features .

*   Use Bayes'  theorem to calculate posterior probabilities for each class .
*   Assign the class with the highest posterior probability .





In [19]:
#data set

outlook = ['sunny','sunnny','overcast','Rainy', 'Rainy', 'Rainy', 'Overcast', 'Sunny', 'Sunny',
           'Rainy', 'Sunny', 'Overcast', 'Overcast', 'Rainy']

temperature = ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool', 'Mild', 'Mild', 'Mild', 'Hot', 'Mild']

humidity = ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal', 'Normal', 'Normal',
            'High', 'Normal', 'High']

windy = ['False', 'True', 'False', 'False', 'False', 'True', 'True', 'False', 'False', 'False', 'True', 'True', 'False', 'True']

play_tennis = ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']

In [20]:
df = list(zip(outlook, temperature, humidity, windy, play_tennis))
df

[('sunny', 'Hot', 'High', 'False', 'No'),
 ('sunnny', 'Hot', 'High', 'True', 'No'),
 ('overcast', 'Hot', 'High', 'False', 'Yes'),
 ('Rainy', 'Mild', 'High', 'False', 'Yes'),
 ('Rainy', 'Cool', 'Normal', 'False', 'Yes'),
 ('Rainy', 'Cool', 'Normal', 'True', 'No'),
 ('Overcast', 'Cool', 'Normal', 'True', 'Yes'),
 ('Sunny', 'Mild', 'High', 'False', 'No'),
 ('Sunny', 'Cool', 'Normal', 'False', 'Yes'),
 ('Rainy', 'Mild', 'Normal', 'False', 'Yes'),
 ('Sunny', 'Mild', 'Normal', 'True', 'Yes'),
 ('Overcast', 'Mild', 'High', 'True', 'Yes'),
 ('Overcast', 'Hot', 'Normal', 'False', 'Yes'),
 ('Rainy', 'Mild', 'High', 'True', 'No')]

In [21]:
import pandas as pd

columns = ['outlook','temperature', 'humidity', 'windy', 'play_tennis']
df = pd.DataFrame(df,columns=columns)
df

Unnamed: 0,outlook,temperature,humidity,windy,play_tennis
0,sunny,Hot,High,False,No
1,sunnny,Hot,High,True,No
2,overcast,Hot,High,False,Yes
3,Rainy,Mild,High,False,Yes
4,Rainy,Cool,Normal,False,Yes
5,Rainy,Cool,Normal,True,No
6,Overcast,Cool,Normal,True,Yes
7,Sunny,Mild,High,False,No
8,Sunny,Cool,Normal,False,Yes
9,Rainy,Mild,Normal,False,Yes


In [22]:
# encode data

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['outlook'] = le.fit_transform(df['outlook'])
df['temperature'] = le.fit_transform(df['temperature'])
df['humidity'] = le.fit_transform(df['humidity'])
df['windy'] = le.fit_transform(df['windy'])
df['play_tennis'] = le.fit_transform(df['play_tennis'])
df

Unnamed: 0,outlook,temperature,humidity,windy,play_tennis
0,5,1,0,0,0
1,4,1,0,1,0
2,3,1,0,0,1
3,1,2,0,0,1
4,1,0,1,0,1
5,1,0,1,1,0
6,0,0,1,1,1
7,2,2,0,0,0
8,2,0,1,0,1
9,1,2,1,0,1


In [28]:
for column in df:
  df[column] = le.fit_transform(df[column])            #using for loop

df

Unnamed: 0,outlook,temperature,humidity,windy,play_tennis
0,5,1,0,0,0
1,4,1,0,1,0
2,3,1,0,0,1
3,1,2,0,0,1
4,1,0,1,0,1
5,1,0,1,1,0
6,0,0,1,1,1
7,2,2,0,0,0
8,2,0,1,0,1
9,1,2,1,0,1


In [29]:
from sklearn.model_selection import train_test_split        #train test split
 # feature and label dividing
X = df.drop('play_tennis', axis=1)
y = df['play_tennis']

In [30]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [31]:
X

Unnamed: 0,outlook,temperature,humidity,windy
0,5,1,0,0
1,4,1,0,1
2,3,1,0,0
3,1,2,0,0
4,1,0,1,0
5,1,0,1,1
6,0,0,1,1
7,2,2,0,0
8,2,0,1,0
9,1,2,1,0


In [32]:
y

Unnamed: 0,play_tennis
0,0
1,0
2,1
3,1
4,1
5,0
6,1
7,0
8,1
9,1


In [33]:
# Initialize naive bayes classifier
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()

In [34]:
# train model
gnb.fit(X_train, y_train)

In [36]:
y_pred = gnb.predict(X_test)
y_pred

array([1, 0, 0])

In [39]:
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")


Accuracy: 0.67
