# Naive Bayes

## Introduction

Naive Bayes is based upon independence. The variables (features) or the probability of occurance of each feature are independen from each other.

<img src="images/naive_bayes_1.png">

<img src="images/naive_bayes_2.png">

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

## Golf play day example

### Load data

In [2]:
golf_data = pd.read_csv('files/golf_play_days.csv')

In [3]:
golf_data.head()

Unnamed: 0,Outlook,Temp,Humidity,Windy,Play Golf
0,Rainy,Hot,High,False,No
1,Rainy,Hot,High,True,No
2,Overcast,Hot,High,False,Yes
3,Sunny,Mild,High,False,Yes
4,Sunny,Cool,Normal,False,Yes


In [5]:
golf_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14 entries, 0 to 13
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Outlook    14 non-null     object
 1   Temp       14 non-null     object
 2   Humidity   14 non-null     object
 3   Windy      14 non-null     bool  
 4   Play Golf  14 non-null     object
dtypes: bool(1), object(4)
memory usage: 590.0+ bytes


In [7]:
golf_data_cat = golf_data.apply(lambda x:x.astype('category'))
golf_data_cat.head()

Unnamed: 0,Outlook,Temp,Humidity,Windy,Play Golf
0,Rainy,Hot,High,False,No
1,Rainy,Hot,High,True,No
2,Overcast,Hot,High,False,Yes
3,Sunny,Mild,High,False,Yes
4,Sunny,Cool,Normal,False,Yes


In [9]:
# Change to categorical data
golf_data_cat = golf_data_cat.apply(lambda x:x.cat.codes)
golf_data_cat.head()

Unnamed: 0,Outlook,Temp,Humidity,Windy,Play Golf
0,1,1,0,0,0
1,1,1,0,1,0
2,0,1,0,0,1
3,2,2,0,0,1
4,2,0,1,0,1


### Extract Data

In [10]:
x_data = golf_data_cat.drop("Play Golf", axis=1)
y_data = golf_data_cat['Play Golf']

### Split Data

In [13]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.30, random_state=1)

### Generate Model and train

In [14]:
from sklearn.naive_bayes import MultinomialNB

In [15]:
nb_model = MultinomialNB()
nb_model = nb_model.fit(x_train, y_train)

### Predict

In [16]:
y_pred = nb_model.predict(x_test)

### Evaluate Results

In [17]:
from sklearn.metrics import accuracy_score

In [21]:
as1 = accuracy_score(y_test, y_pred)
as2 = nb_model.score(x_test, y_test)
print(as1)
print(as2)

0.8
0.8


0.8