# Naive Bayes Implementation

$$
\hat{y}_{\text{pred}} = \arg\max_{k \in \{1, 2, \dots, K\}} P(C_k) \prod_{i=1}^{N} P(X_i | C_k)
$$

In [1]:
import numpy as np
import pandas as pd
data = pd.read_csv('/content/play_tennis.csv')
print("The Shape of the data is: ", data.shape)
data.head()


The Shape of the data is:  (14, 6)


Unnamed: 0,day,outlook,temp,humidity,wind,play
0,D1,Sunny,Hot,High,Weak,No
1,D2,Sunny,Hot,High,Strong,No
2,D3,Overcast,Hot,High,Weak,Yes
3,D4,Rain,Mild,High,Weak,Yes
4,D5,Rain,Cool,Normal,Weak,Yes


## **Problem 1**
- **Outlook** = Sunny
- **Temp** = Hot
- **Humidity** = High
- **Wind** = Weak

Decide => Play or No Play

In [2]:
"""
Solution:
 - P(Yes | Sunny, Hot, High, Weak)  = P(Sunny | Yes) * P(Hot | Yes) * P(High | Yes) * P(Weak | Yes) * P(Yes)
 - P(No | Sunny, Hot, High, Weak)  = P(Sunny | No) * P(Hot | No) * P(High | No) * P(Weak | No) * P(No)
Compare and decide using Maximum Posterior Rule
"""

## **Problem 2**
- **Outlook** = Overcast
- **Temp** = Cold
- **Humidity** = Low
- **Wind** = Weak

Decide => Play or No Play

In [3]:
"""
Solution:
 - P(Yes | Overcast, Cold, Low, Weak)  = P(Overcast | Yes) * P(Cold | Yes) * P(Low | Yes) * P(Weak | Yes) * P(Yes)
 - P(No | Overcast, Cold, Low, Weak)  = P(Overcast | No) * P(Cold | No) * P(Low | No) * P(Weak | No) * P(No)
Compare and decide using Maximum Posterior Rule
"""

' \nSolution:\n - P(Yes | Overcast, Cold, Low, Weak)  = P(Overcast | Yes) * P(Cold | Yes) * P(Low | Yes) * P(Weak | Yes) * P(Yes)\n - P(No | Overcast, Cold, Low, Weak)  = P(Overcast | No) * P(Cold | No) * P(Low | No) * P(Weak | No) * P(No)\nCompare and decide using Maximum Posterior Rule\n'

Naive Bayes creates a `LookUp Table` {Dictionary} that stores all the probabilities.

`Lookup Table` : Set of Probabilties

# **Creating a Lookup Table**

## ***P(Yes) and P(No)***

In [6]:
print("Value Counts of playing or not playing the tennis: ")
data['play'].value_counts()

Value Counts of playing or not playing the tennis: 


Unnamed: 0_level_0,count
play,Unnamed: 1_level_1
Yes,9
No,5


In [8]:
probability_of_Yes = 9/14
probability_of_No = 5/14
print("Probability of Yes = P(Yes): ", probability_of_Yes)
print("Probability of No = P(No): ", probability_of_No)

Probability of Yes = P(Yes):  0.6428571428571429
Probability of No = P(No):  0.35714285714285715


Lets Calculate the `P(Yes)` and `P(No)` for each column

## ***Outlook Column***

In [9]:
pd.crosstab(data['outlook'], data['play'])

play,No,Yes
outlook,Unnamed: 1_level_1,Unnamed: 2_level_1
Overcast,0,4
Rain,2,3
Sunny,3,2


In [15]:
probability_of_overcast_No = 0
probability_of_overcast_Yes = 4/9
probability_of_rainy_Yes = 3/9
probability_of_rainy_No = 2/5
probability_of_sunny_No = 3/5
probability_of_sunny_Yes = 2/9

print("P(Overcast|No):", probability_of_overcast_No)
print("P(Overcast|Yes):", probability_of_overcast_Yes)
print("P(Rainy|No):", probability_of_rainy_No)
print("P(Rainy|Yes):", probability_of_rainy_Yes)
print("P(Sunny|No):", probability_of_sunny_No)
print("P(Sunny|Yes):", probability_of_sunny_Yes)

P(Overcast|No): 0
P(Overcast|Yes): 0.4444444444444444
P(Rainy|No): 0.4
P(Rainy|Yes): 0.3333333333333333
P(Sunny|No): 0.6
P(Sunny|Yes): 0.2222222222222222


## ***Temp Column***

In [12]:
pd.crosstab(data['temp'], data['play'])

play,No,Yes
temp,Unnamed: 1_level_1,Unnamed: 2_level_1
Cool,1,3
Hot,2,2
Mild,2,4


In [16]:
probability_of_cool_Yes = 3/9
probability_of_cool_No = 1/5
probability_of_hot_Yes = 2/9
probability_of_hot_No = 2/5
probability_of_mild_Yes = 4/9
probability_of_mild_No = 2/5
print("P(Cool|Yes):", probability_of_cool_Yes)
print("P(Cool|No):", probability_of_cool_No)
print("P(Hot|Yes):", probability_of_hot_Yes)
print("P(Hot|No):", probability_of_hot_No)
print("P(Mild|Yes):", probability_of_mild_Yes)
print("P(Mild|No):", probability_of_mild_No)

P(Cool|Yes): 0.3333333333333333
P(Cool|No): 0.2
P(Hot|Yes): 0.2222222222222222
P(Hot|No): 0.4
P(Mild|Yes): 0.4444444444444444
P(Mild|No): 0.4


## ***Humidity Column***

In [17]:
pd.crosstab(data['humidity'], data['play'])

play,No,Yes
humidity,Unnamed: 1_level_1,Unnamed: 2_level_1
High,4,3
Normal,1,6


In [18]:
probability_of_high_Yes = 3/9
probability_of_high_No =  4/5
probability_of_normal_Yes = 6/9
probability_of_normal_No = 1/5
print("P(High|Yes):", probability_of_high_Yes)
print("P(High|No):", probability_of_high_No)
print("P(Normal|Yes):", probability_of_normal_Yes)
print("P(Normal|No):", probability_of_normal_No)

P(High|Yes): 0.3333333333333333
P(High|No): 0.8
P(Normal|Yes): 0.6666666666666666
P(Normal|No): 0.2


## ***Wind Column***

In [19]:
pd.crosstab(data['wind'], data['play'])

play,No,Yes
wind,Unnamed: 1_level_1,Unnamed: 2_level_1
Strong,3,3
Weak,2,6


In [20]:
probability_of_strong_Yes = 3/9
probability_of_strong_No = 3/5
probability_of_weak_Yes = 6/9
probability_of_weak_No = 2/5
print("P(Strong|Yes):", probability_of_strong_Yes)
print("P(Strong|No):", probability_of_strong_No)
print("P(Weak|Yes):", probability_of_weak_Yes)
print("P(Weak|No):", probability_of_weak_No)

P(Strong|Yes): 0.3333333333333333
P(Strong|No): 0.6
P(Weak|Yes): 0.6666666666666666
P(Weak|No): 0.4


# **Putting All Probabilities Together**

In [22]:
print("P(Yes):", probability_of_Yes)
print("P(No):", probability_of_No)
print()
print(" ------------------ Outlook Column ------------------ ")
print("P(Overcast|No):", probability_of_overcast_No)
print("P(Overcast|Yes):", probability_of_overcast_Yes)
print("P(Rainy|No):", probability_of_rainy_No)
print("P(Rainy|Yes):", probability_of_rainy_Yes)
print("P(Sunny|No):", probability_of_sunny_No)
print("P(Sunny|Yes):", probability_of_sunny_Yes)
print()
print(" ------------------ Temp Column ------------------ ")
print("P(Cool|Yes):", probability_of_cool_Yes)
print("P(Cool|No):", probability_of_cool_No)
print("P(Hot|Yes):", probability_of_hot_Yes)
print("P(Hot|No):", probability_of_hot_No)
print("P(Mild|Yes):", probability_of_mild_Yes)
print("P(Mild|No):", probability_of_mild_No)
print()
print(" ------------------ Humidity Column ------------------ ")
print("P(High|Yes):", probability_of_high_Yes)
print("P(High|No):", probability_of_high_No)
print("P(Normal|Yes):", probability_of_normal_Yes)
print("P(Normal|No):", probability_of_normal_No)
print()
print(" ------------------ Wind Column ------------------ ")
print("P(Strong|Yes):", probability_of_strong_Yes)
print("P(Strong|No):", probability_of_strong_No)
print("P(Weak|Yes):", probability_of_weak_Yes)
print("P(Weak|No):", probability_of_weak_No)

P(Yes): 0.6428571428571429
P(No): 0.35714285714285715

 ------------------ Outlook Column ------------------ 
P(Overcast|No): 0
P(Overcast|Yes): 0.4444444444444444
P(Rainy|No): 0.4
P(Rainy|Yes): 0.3333333333333333
P(Sunny|No): 0.6
P(Sunny|Yes): 0.2222222222222222

 ------------------ Temp Column ------------------ 
P(Cool|Yes): 0.3333333333333333
P(Cool|No): 0.2
P(Hot|Yes): 0.2222222222222222
P(Hot|No): 0.4
P(Mild|Yes): 0.4444444444444444
P(Mild|No): 0.4

 ------------------ Humidity Column ------------------ 
P(High|Yes): 0.3333333333333333
P(High|No): 0.8
P(Normal|Yes): 0.6666666666666666
P(Normal|No): 0.2

 ------------------ Wind Column ------------------ 
P(Strong|Yes): 0.3333333333333333
P(Strong|No): 0.6
P(Weak|Yes): 0.6666666666666666
P(Weak|No): 0.4


## **Problem 1**
- **Outlook** = Overcast
- **Temp** = Cold
- **Humidity** = Normal
- **Wind** = Weak

Decide => Play or No Play

In [23]:
"""
Solution:
 - P(Yes | Overcast, Cold, Normal, Weak)  = P(Overcast | Yes) * P(Cold | Yes) * P(Normal | Yes) * P(Weak | Yes) * P(Yes)
 - P(No | Overcast, Cold, Normal, Weak)  = P(Overcast | No) * P(Cold | No) * P(Normal | No) * P(Weak | No) * P(No)
Compare and decide using Maximum Posterior Rule
"""
playing_tennis_probability = (probability_of_Yes * probability_of_overcast_Yes * probability_of_cool_Yes * probability_of_normal_Yes * probability_of_weak_Yes)
not_playing_tennis_probability = (probability_of_No * probability_of_overcast_No * probability_of_cool_No * probability_of_normal_No * probability_of_weak_No)
print("P(Yes):", playing_tennis_probability)
print("P(No):", not_playing_tennis_probability)
if(playing_tennis_probability > not_playing_tennis_probability):
  print("Tennis will be played")
else:
  print("Tennis will not be played")

P(Yes): 0.042328042328042326
P(No): 0.0
Tennis will be played


## **Problem 2**
- **Outlook** = Sunny
- **Temp** = Hot
- **Humidity** = High
- **Wind** = Weak

Decide => Play or No Play

In [24]:
"""
Solution:
 - P(Yes | Sunny, Hot, High, Weak)  = P(Sunny | Yes) * P(Hot | Yes) * P(High | Yes) * P(Weak | Yes) * P(Yes)
 - P(No | Sunny, Hot, High, Weak)  = P(Sunny | No) * P(Hot | No) * P(High | No) * P(Weak | No) * P(No)
Compare and decide using Maximum Posterior Rule
"""
playing_tennis_probability = (probability_of_Yes * probability_of_sunny_Yes * probability_of_hot_Yes * probability_of_high_Yes * probability_of_weak_Yes)
not_playing_tennis_probability = (probability_of_No * probability_of_sunny_No * probability_of_hot_No * probability_of_high_No * probability_of_weak_Yes)
print("P(Yes):", playing_tennis_probability)
print("P(No):", not_playing_tennis_probability)
if(playing_tennis_probability > not_playing_tennis_probability):
  print("Tennis will be played")
else:
  print("Tennis will not be played")

P(Yes): 0.007054673721340388
P(No): 0.045714285714285714
Tennis will not be played
