# Problem Statement:-
Suppose we have a dataset of weather conditions and the corresponding target variable "Play". So using this dataset we need to decide whether we should play or not on a particular day according to the weather conditions. So to solve this problem, we need to follow the below steps:



- Convert the given dataset into frequency tables.
- Generate a Likelihood table by finding the probabilities of given features.
- Now, use the Bayes theorem to calculate the posterior probability.

In [1]:
# Importing libraries 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [2]:
data = pd.read_csv('play_tennis (1).csv')
data

Unnamed: 0,day,outlook,temp,humidity,wind,play
0,D1,Sunny,Hot,High,Weak,No
1,D2,Sunny,Hot,High,Strong,No
2,D3,Overcast,Hot,High,Weak,Yes
3,D4,Rain,Mild,High,Weak,Yes
4,D5,Rain,Cool,Normal,Weak,Yes
5,D6,Rain,Cool,Normal,Strong,No
6,D7,Overcast,Cool,Normal,Strong,Yes
7,D8,Sunny,Mild,High,Weak,No
8,D9,Sunny,Cool,Normal,Weak,Yes
9,D10,Rain,Mild,Normal,Weak,Yes


# Feature Engineering

In [3]:
# Encoding the categorical features

In [4]:
from category_encoders import OrdinalEncoder
oe = OrdinalEncoder()
oe

In [5]:
data.drop('day',1,inplace=True)

In [6]:
data.head()

Unnamed: 0,outlook,temp,humidity,wind,play
0,Sunny,Hot,High,Weak,No
1,Sunny,Hot,High,Strong,No
2,Overcast,Hot,High,Weak,Yes
3,Rain,Mild,High,Weak,Yes
4,Rain,Cool,Normal,Weak,Yes


In [7]:
data = oe.fit_transform(data)

In [8]:
# In Outlook column --> 1=Sunny,2=Overcast,3=Rain
# In temp column --> 1 = Hot, 2 = Mild, 3=Cool
# In humidity column --> 1 = High, 2 = Normal
# In wind column --> 1 = Weak, 2 = Strong
# In play column --> 1 = No, 2 = Yes
data.head()

Unnamed: 0,outlook,temp,humidity,wind,play
0,1,1,1,1,1
1,1,1,1,2,1
2,2,1,1,1,2
3,3,2,1,1,2
4,3,3,2,1,2


# Seperate X and y


In [9]:
x = data.drop('play',1)
y = data['play']

# Apply Gussian NB on x and y

In [10]:
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb

In [11]:
from sklearn.model_selection import cross_val_score
cross_val_score(gnb,x,y,cv=5)

array([0.33333333, 1.        , 0.66666667, 0.        , 0.5       ])

In [12]:
gnb.fit(x,y)

# Predicting value 

In [13]:
y_pred = gnb.predict(x)
y_pred

array([1, 1, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 1])

# Checking accuracy score 

In [14]:
from sklearn.metrics import accuracy_score, confusion_matrix
accuracy_score(y_pred,y)

0.7857142857142857

# Checking confusion matrix

In [15]:
confusion_matrix(y_pred,y)

array([[4, 2],
       [1, 7]], dtype=int64)