# Naive Bayes

- Bayesian networks aim to model **conditional dependence**, and therefore causation, by representing conditional dependence by edges in a directed graph. 
- **conditional independence** between two random variables, A and B, given another random variable, C, is equivalent to satisfying the following property: 
1. P(A,B|C) = P(A|C) * P(B|C)
2. P(A|B,C) = P(A|C).
- A Bayesian network is a **directed acyclic graph** in which each edge corresponds to a conditional dependency, and each node corresponds to a unique random variable. 

![](pic1.png)

![](problem.png)

In [1]:
# Libraries
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB

# Now Solving Gaussian Naive Bayes Example:

![](pic2.png)

In [2]:
#Reading CSV files
df = pd.read_csv('weather.csv')

In [3]:
df.head()

Unnamed: 0,Outlook,Temp,Humidity,Windy,Play
0,Rainy,Hot,High,f,no
1,Rainy,Hot,High,t,no
2,Overcast,Hot,High,f,yes
3,Sunny,Mild,High,f,yes
4,Sunny,Cool,Normal,f,yes


In [4]:
# finding the count of different labels
df['Play'].value_counts()

yes    9
no     5
Name: Play, dtype: int64

In [5]:
# label encoding
label_encode = LabelEncoder()

In [6]:
labels = label_encode.fit_transform(df.Play)

In [7]:
# appending the label
df['target'] = labels

In [8]:
df.head()

Unnamed: 0,Outlook,Temp,Humidity,Windy,Play,target
0,Rainy,Hot,High,f,no,0
1,Rainy,Hot,High,t,no,0
2,Overcast,Hot,High,f,yes,1
3,Sunny,Mild,High,f,yes,1
4,Sunny,Cool,Normal,f,yes,1


In [9]:
df['target'].value_counts()

1    9
0    5
Name: target, dtype: int64

play yes = 1

play no = 0

In [10]:
#Dropping the target variable and make it is as newframe
inputs = df.drop('Play',axis='columns')
target = df['Play']
target

0      no
1      no
2     yes
3     yes
4     yes
5      no
6     yes
7      no
8     yes
9     yes
10    yes
11    yes
12    yes
13     no
Name: Play, dtype: object

In [11]:
# Creating the new dataframe
inputs['outlook_n']= label_encode.fit_transform(inputs['Outlook'])
inputs['Temp_n']= label_encode.fit_transform(inputs['Temp'])
inputs['Humidity_n']= label_encode.fit_transform(inputs['Humidity'])
inputs['windy_n']= label_encode.fit_transform(inputs['Windy'])
inputs

Unnamed: 0,Outlook,Temp,Humidity,Windy,target,outlook_n,Temp_n,Humidity_n,windy_n
0,Rainy,Hot,High,f,0,1,1,0,0
1,Rainy,Hot,High,t,0,1,1,0,1
2,Overcast,Hot,High,f,1,0,1,0,0
3,Sunny,Mild,High,f,1,2,2,0,0
4,Sunny,Cool,Normal,f,1,2,0,1,0
5,Sunny,Cool,Normal,t,0,2,0,1,1
6,Overcast,Cool,Normal,t,1,0,0,1,1
7,Rainy,Mild,High,f,0,1,2,0,0
8,Rainy,Cool,Normal,f,1,1,0,1,0
9,Sunny,Mild,Normal,f,1,2,2,1,0


In [12]:
# Dropping the string values
inputs_n = inputs.drop(['target','Outlook','Temp','Humidity','Windy'],axis='columns')
inputs_n

Unnamed: 0,outlook_n,Temp_n,Humidity_n,windy_n
0,1,1,0,0
1,1,1,0,1
2,0,1,0,0
3,2,2,0,0
4,2,0,1,0
5,2,0,1,1
6,0,0,1,1
7,1,2,0,0
8,1,0,1,0
9,2,2,1,0


In [13]:
# Applying the Gaussian naivebayes
Classifier = GaussianNB()
Classifier.fit(inputs_n,target)

In [14]:
# checking accuracy
Classifier.score(inputs_n,target)

0.8571428571428571

In [15]:
Classifier.predict([[2,3,0,0]])



array(['no'], dtype='<U3')