## Naive Bayes Classifiers with Python

Let's first load required libraries:

In [1]:
import pandas as pd
import numpy as np

### Load Data From CSV File

In [2]:
df=pd.read_csv("Updated_Color_Data.csv")
df.head()

Unnamed: 0,Color,Legs,Height,Smelly,Species
0,white,3,short,yes,m
1,green,2,tall,no,m
2,green,3,short,yes,m
3,white,3,short,yes,m
4,green,2,short,no,h


In [3]:
df.shape

(49, 5)

Let’s see how many of __Color__ class is in our data set

In [4]:
df["Color"].value_counts()

white    30
green    19
Name: Color, dtype: int64

Let’s see how many of __Height__ class is in our data set

In [5]:
df["Height"].value_counts()

short    32
tall     17
Name: Height, dtype: int64

Let’s see how many of __Smelly__ class is in our data set

In [6]:
df["Smelly"].value_counts()

yes    25
no     24
Name: Smelly, dtype: int64

Let’s see how many of __Species__ class is in our data set

In [7]:
df["Species"].value_counts()

m    25
h    24
Name: Species, dtype: int64

### Convert Categorical features to numerical values

How many number of rows and columns has in this dataset

In [8]:
df["Colored"]=df["Color"].apply(lambda x : 1 if(x=='white') else 0)
df.head()

Unnamed: 0,Color,Legs,Height,Smelly,Species,Colored
0,white,3,short,yes,m,1
1,green,2,tall,no,m,0
2,green,3,short,yes,m,0
3,white,3,short,yes,m,1
4,green,2,short,no,h,0


In [9]:
df["Heights"]=df["Height"].apply(lambda x : 1 if(x=='short') else 0)
df.head()

Unnamed: 0,Color,Legs,Height,Smelly,Species,Colored,Heights
0,white,3,short,yes,m,1,1
1,green,2,tall,no,m,0,0
2,green,3,short,yes,m,0,1
3,white,3,short,yes,m,1,1
4,green,2,short,no,h,0,1


In [10]:
df["Smellies"]=df["Smelly"].apply(lambda x : 1 if(x=='yes') else 0)
df.head()

Unnamed: 0,Color,Legs,Height,Smelly,Species,Colored,Heights,Smellies
0,white,3,short,yes,m,1,1,1
1,green,2,tall,no,m,0,0,0
2,green,3,short,yes,m,0,1,1
3,white,3,short,yes,m,1,1,1
4,green,2,short,no,h,0,1,0


In [11]:
df["Outcomes"]=df["Species"].apply(lambda x : 1 if(x=='m') else 0)
df.head()

Unnamed: 0,Color,Legs,Height,Smelly,Species,Colored,Heights,Smellies,Outcomes
0,white,3,short,yes,m,1,1,1,1
1,green,2,tall,no,m,0,0,0,1
2,green,3,short,yes,m,0,1,1,1
3,white,3,short,yes,m,1,1,1,1
4,green,2,short,no,h,0,1,0,0


### Feature Selection


Let's define feature sets, X:

In [12]:
X=df[["Colored","Legs","Heights","Smellies"]]
X[:5]

Unnamed: 0,Colored,Legs,Heights,Smellies
0,1,3,1,1
1,0,2,0,0
2,0,3,1,1
3,1,3,1,1
4,0,2,1,0


What will be our lables?

In [13]:
y=df["Outcomes"].values
y[:5]

array([1, 1, 1, 1, 0], dtype=int64)

## Normalize Data


Data Standardization give data zero mean and unit variance

In [14]:
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler

In [15]:
X=preprocessing.StandardScaler().fit(X).transform(X)
X[0:5]

array([[ 0.79582243,  1.25656172,  0.72886899,  0.9797959 ],
       [-1.25656172, -0.79582243, -1.37198868, -1.02062073],
       [-1.25656172,  1.25656172,  0.72886899,  0.9797959 ],
       [ 0.79582243,  1.25656172,  0.72886899,  0.9797959 ],
       [-1.25656172, -0.79582243,  0.72886899, -1.02062073]])

## Train Test Split

In [16]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=4)

print ('Train set:', X_train.shape,  y_train.shape)
print ('Test set:', X_test.shape,  y_test.shape)


Train set: (39, 4) (39,)
Test set: (10, 4) (10,)


## Naive Bayes Classifiers

In [17]:
from sklearn.naive_bayes import GaussianNB

In [18]:
gnb=GaussianNB()
gnb.fit(X_train, y_train)

GaussianNB()

In [19]:
#After being fitted, the model can then be used to predict new values

yhat = gnb.predict(X_test)
yhat [0:5]

array([0, 0, 1, 0, 0], dtype=int64)

## Evaluation


Next, let's import metrics from sklearn and check the accuracy of our model.

In [20]:
from sklearn import metrics

In [21]:
print("Gaussian Naive Bayes model accuracy:", metrics.accuracy_score(y_test, yhat))

Gaussian Naive Bayes model accuracy: 0.9


In [22]:
from sklearn.metrics import accuracy_score

## Making a Predictive Machine

In [23]:
input_data =(1,3,1,1)

#changing the input data to numpy array
input_data_as_numpy_array = np.asarray(input_data)


#reshape the array as we are predicting for one instance
input_data_reshape = input_data_as_numpy_array.reshape(1,-1)

#standardized the input data
std_data=StandardScaler().fit(input_data_reshape).transform(input_data_reshape)
print(std_data)

prediction = gnb.predict(std_data)
print(prediction)


if(prediction[0] == 0):
    
    print("The lavel is H ")
    
else :
    
    print("The lavel is M")

[[0. 0. 0. 0.]]
[1]
The lavel is M
