<a href="https://colab.research.google.com/github/SaadARazzaq/Probabilistic-Tennis-Classifier/blob/main/notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Dataset Used: https://www.kaggle.com/datasets/fredericobreno/play-tennis

In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.read_csv("play_tennis.csv")

In [3]:
df.head()

Unnamed: 0,day,outlook,temp,humidity,wind,play
0,D1,Sunny,Hot,High,Weak,No
1,D2,Sunny,Hot,High,Strong,No
2,D3,Overcast,Hot,High,Weak,Yes
3,D4,Rain,Mild,High,Weak,Yes
4,D5,Rain,Cool,Normal,Weak,Yes


In [4]:
# As we dont need the day column so we drop them
df.drop(columns="day", axis = 1)

Unnamed: 0,outlook,temp,humidity,wind,play
0,Sunny,Hot,High,Weak,No
1,Sunny,Hot,High,Strong,No
2,Overcast,Hot,High,Weak,Yes
3,Rain,Mild,High,Weak,Yes
4,Rain,Cool,Normal,Weak,Yes
5,Rain,Cool,Normal,Strong,No
6,Overcast,Cool,Normal,Strong,Yes
7,Sunny,Mild,High,Weak,No
8,Sunny,Cool,Normal,Weak,Yes
9,Rain,Mild,Normal,Weak,Yes


In [5]:
len(df)

14

## **Lets consider a Problem:**

### **Problem 1:**

### We know a day when:

*   **Outlook= Sunny**
*   **Temperature= Hot**
*   **Humidity= High**
*   **Wind= Weak**

### Our Task is to find out whether or not **we should play on that day or not⁉️**

### **Intuition:**

### We have to find out the below probabilities:


*   P(Yes | Sunny,Hot,High,Weak)=P(Sunny | Yes) * P(Hot | Yes) * P(High | Yes) * P(Weak | Yes) * P(Yes)
*   P(No | Sunny,Hot,High,Weak)=P(Sunny | No) * P(Hot | No) * P(High | No) * P(Weak | No) * P(No)

### Compare the results of both the probabilities and choose the one which is the **Highest**. This is called the ***Maximum a poteriori rule.***

---

## **Training:**

In [6]:
# For training, then Naiive Bayes creates a Lookup table(Dictionary) and calculates and stores all the probabilities in that table.

In [7]:
# P(Yes)
# P(No)

In [8]:
df['play'].value_counts()  # For counting the total values in each of the class

Yes    9
No     5
Name: play, dtype: int64

In [9]:
result = df['play'].value_counts()
total_yes = result['Yes']
total_no = result['No']
total_values = len(df['play'])

In [10]:
probability_of_yes = total_yes / total_values  # 9/14
probability_of_no = total_no / total_values  # 5/14

In [11]:
print(probability_of_yes)
print(probability_of_no)

0.6428571428571429
0.35714285714285715


### **FOR OUTLOOK FEATURE**

In [12]:
pd.crosstab(df['outlook'], df['play'])

play,No,Yes
outlook,Unnamed: 1_level_1,Unnamed: 2_level_1
Overcast,0,4
Rain,2,3
Sunny,3,2


In [13]:
probability_of_Overcast_given_Yes = 4 / 9
probability_of_Rain_given_Yes = 3 / 9
probability_of_Sunny_given_Yes = 2 / 9

probability_of_Overcast_given_No = 0 / 5
probability_of_Rain_given_No = 2 / 5
probability_of_Sunny_given_No = 3 / 5

### **FOR TEMP FEATURE**

In [14]:
pd.crosstab(df['temp'], df['play'])

play,No,Yes
temp,Unnamed: 1_level_1,Unnamed: 2_level_1
Cool,1,3
Hot,2,2
Mild,2,4


In [15]:
probability_of_Cool_given_Yes = 3 / 9
probability_of_Hot_given_Yes = 2 / 9
probability_of_Mild_given_Yes = 4 / 9

probability_of_Cool_given_No = 1 / 5
probability_of_Hot_given_No = 2 / 5
probability_of_Mild_given_No = 2 / 5

### **FOR HUMIDITY FEATURE**

In [16]:
pd.crosstab(df['humidity'], df['play'])

play,No,Yes
humidity,Unnamed: 1_level_1,Unnamed: 2_level_1
High,4,3
Normal,1,6


In [17]:
probability_of_Normal_given_Yes = 6 / 9
probability_of_High_given_Yes = 3 / 9

probability_of_Normal_given_No = 1 / 5
probability_of_High_given_No = 4 / 5

### **FOR WIND FEATURE**

In [18]:
pd.crosstab(df['wind'],df['play'])

play,No,Yes
wind,Unnamed: 1_level_1,Unnamed: 2_level_1
Strong,3,3
Weak,2,6


In [19]:
probability_of_Weak_given_Yes = 6 / 9
probability_of_Strong_given_Yes = 3 / 9

probability_of_Weak_given_No = 2 / 5
probability_of_Strong_given_No = 3 / 5

In [20]:
p_Yes_given_All = probability_of_Sunny_given_Yes * probability_of_Hot_given_Yes * probability_of_High_given_Yes * probability_of_Weak_given_Yes * probability_of_yes
p_No_given_All = probability_of_Sunny_given_No * probability_of_Hot_given_No * probability_of_High_given_No * probability_of_Weak_given_No * probability_of_no

In [21]:
print("Probability of Playing Tennis P(yes) => ", p_Yes_given_All)
print("Probability of NOT Playing Tennis P(no) => ", p_No_given_All)

Probability of Playing Tennis P(yes) =>  0.007054673721340387
Probability of NOT Playing Tennis P(no) =>  0.02742857142857143


In [22]:
# Comparing results using Maximum a poteriori rule.
if p_Yes_given_All < p_No_given_All:
    print('You should Not Play Tennis at all.')
else:
    print('You should Play Tennis.')

You should Not Play Tennis at all.
