## Naive Bayes Classifiers using Weather Dataset with Python

Let's first load required libraries:

In [1]:
import pandas as pd
import numpy as np

### Load Data From CSV File

In [2]:
df=pd.read_csv("weather.csv")
df.head()

Unnamed: 0,Serial,weather,temp,play
0,1,Sunny,Hot,No
1,2,Sunny,Hot,No
2,3,Overcast,Hot,Yes
3,4,Rainy,Mild,Yes
4,5,Rainy,Cool,Yes


In [3]:
df.shape

(14, 4)

Let’s see how many of __class__ has there in our data set

In [4]:
df["play"].value_counts()

Yes    9
No     5
Name: play, dtype: int64

In [5]:
df["weather"].value_counts()

Sunny       5
Rainy       5
Overcast    4
Name: weather, dtype: int64

In [6]:
df["temp"].value_counts()

Mild    6
Hot     4
Cool    4
Name: temp, dtype: int64

### Convert Categorical features to numerical values

In [7]:
df["outcome"]=df.play.replace({"Yes":1,"No":0})
df.head()

Unnamed: 0,Serial,weather,temp,play,outcome
0,1,Sunny,Hot,No,0
1,2,Sunny,Hot,No,0
2,3,Overcast,Hot,Yes,1
3,4,Rainy,Mild,Yes,1
4,5,Rainy,Cool,Yes,1


In [8]:
df["weathered"]=df["weather"].replace({"Sunny":0, "Rainy":1, "Overcast":2})
df.head()

Unnamed: 0,Serial,weather,temp,play,outcome,weathered
0,1,Sunny,Hot,No,0,0
1,2,Sunny,Hot,No,0,0
2,3,Overcast,Hot,Yes,1,2
3,4,Rainy,Mild,Yes,1,1
4,5,Rainy,Cool,Yes,1,1


In [9]:
df["temped"]=df["temp"].replace({"Mild":0, "Hot":1, "Cool":2})
df.head()

Unnamed: 0,Serial,weather,temp,play,outcome,weathered,temped
0,1,Sunny,Hot,No,0,0,1
1,2,Sunny,Hot,No,0,0,1
2,3,Overcast,Hot,Yes,1,2,1
3,4,Rainy,Mild,Yes,1,1,0
4,5,Rainy,Cool,Yes,1,1,2


### Feature Selection


Let's define feature sets, X:

In [10]:
X=df[["weathered", "temped"]]
X[:5]

Unnamed: 0,weathered,temped
0,0,1
1,0,1
2,2,1
3,1,0
4,1,2


What will be our lables?

In [11]:
y=df[["outcome"]]
y[:5]

Unnamed: 0,outcome
0,0
1,0
2,1
3,1
4,1


## Normalize Data


Data Standardization give data zero mean and unit variance

In [12]:
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler

In [13]:
X=preprocessing.StandardScaler().fit(X).transform(X)
X[0:5]

array([[-1.16275535,  0.17149859],
       [-1.16275535,  0.17149859],
       [ 1.34164079,  0.17149859],
       [ 0.08944272, -1.02899151],
       [ 0.08944272,  1.37198868]])

## Train Test Split

In [14]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.4, random_state=8)

print ('Train set:', X_train.shape,  y_train.shape)
print ('Test set:', X_test.shape,  y_test.shape)


Train set: (8, 2) (8, 1)
Test set: (6, 2) (6, 1)


## Naive Bayes Classifiers

In [15]:
from sklearn.naive_bayes import GaussianNB

In [16]:
gnb=GaussianNB()
gnb.fit(X_train, y_train)

  y = column_or_1d(y, warn=True)


GaussianNB()

In [17]:
#After being fitted, the model can then be used to predict new values

yhat = gnb.predict(X_test)
yhat [0:5]

array([0, 0, 1, 1, 1])

## Evaluation


Next, let's import metrics from sklearn and check the accuracy of our model.

In [18]:
from sklearn import metrics

print("Gaussian Naive Bayes model accuracy:", metrics.accuracy_score(y_test, yhat))

Gaussian Naive Bayes model accuracy: 0.8333333333333334


## Making a Predictive Machine

In [19]:
input_data =(2,1)

#changing the input data to numpy array
input_data_as_numpy_array = np.asarray(input_data)


#reshape the array as we are predicting for one instance
input_data_reshape = input_data_as_numpy_array.reshape(1,-1)

#standardized the input data
std_data=StandardScaler().fit(input_data_reshape).transform(input_data_reshape)
print(std_data)

prediction = gnb.predict(std_data)
print(prediction)


if(prediction[0] == 0):
    
    print("We do not play Football")
    
else :
    
    print("We play Football")

[[0. 0.]]
[1]
We play Football
