# Naive Bayes Classification

- Naive Bayes is a statistical classification technique based on Bayes Theorem. It is one of the simplest supervised learning algorithms. Naive Bayes classifier is the fast, accurate and reliable algorithm. Naive Bayes classifiers have high accuracy and speed on large datasets.

- Given an example of weather conditions and playing sports. You need to calculate the probability of playing sports. Now, you need to classify whether players will play or not, based on the weather condition.

- Naive Bayes classifier calculates the probability of an event in the following steps:
   - Step 1: Calculate the prior probability for given class labels.
   - Step 2: Find Likelihood probability with each attribute for each class.
   - Step 3: Put these value in Bayes Formula and calculate posterior probability.
   - Step 4: See which class has a higher probability, given the input belongs to the higher probability class.

- For simplifying prior and posterior probability calculation you can use the two tables frequency and likelihood tables. Both of these tables will help you to calculate the prior and posterior probability. The Frequency table contains the occurrence of labels for all features. There are two likelihood tables. Likelihood Table 1 is showing prior probabilities of labels and Likelihood Table 2 is showing the posterior probability.

![image.png](attachment:image.png)

In [1]:
# import libraries

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from sklearn import preprocessing

In [2]:
#load the data

data = pd.read_csv('weather.csv')

In [3]:
#print first 5 column of data
data.head()

Unnamed: 0,outlook,temperature,humidity,windy,play
0,sunny,hot,high,False,no
1,sunny,hot,high,True,no
2,overcast,hot,high,False,yes
3,rainy,mild,high,False,yes
4,rainy,cool,normal,False,yes


In [4]:
#creating labelEncoder
#used for encoding nominal data in numeric form
#data preprocessing

le = preprocessing.LabelEncoder()

data['outlook'] = le.fit_transform(data['outlook'])
data['temperature'] = le.fit_transform(data['temperature'])
data['humidity'] = le.fit_transform(data['humidity'])
data['windy'] = le.fit_transform(data['windy'])
data['play'] = le.fit_transform(data['play'])

In [5]:
#print first 5 column of Encoded data
data.head()

Unnamed: 0,outlook,temperature,humidity,windy,play
0,2,1,0,0,0
1,2,1,0,1,0
2,0,1,0,0,1
3,1,2,0,0,1
4,1,0,1,0,1


In [6]:
#devide data in train and test set
x = data.drop(columns = ['play'])
y = data['play']

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 10)

In [7]:
#train the model using Gaussian Naive bayes

model = GaussianNB()
model.fit(x_train, y_train)

GaussianNB(priors=None, var_smoothing=1e-09)

In [8]:
#predict the y value from test data

y_pred = model.predict(x_test)
y_pred

array([1, 0, 1])

In [9]:
#accuracy evaluation

accuracy = accuracy_score(y_test, y_pred)*100
accuracy

100.0