# Application of Naive Bayes Algorithm

In this notebook I will implement a Naive Bayes classifier to predict the quality of the car given a few of the other car attributes.

I will be working with the [Car Evaluation](https://archive-beta.ics.uci.edu/ml/datasets/car+evaluation) dataset:

This dataset consists of <b>1728</b> car evaluations.

Cars are classified as: 

- <b>Unacceptable</b>  (unacc)

- <b>Acceptable</b>    (acc)

- <b>Very Good</b>     (vgood)

- <b>Good</b>          (good)


After evaluation based on the following 6 attributes:

- <b>buying</b>: vhigh, high, med, low.

- <b>maint</b>: vhigh, high, med, low. 

- <b>doors</b>: 2, 3, 4, 5more. 

- <b>persons</b>: 2, 4, more. 

- <b>lug_boot</b>: small, med, big. 

- <b>safety</b>: low, med, high.

Class Distribution:

| Class | N | N(%) |
|---|---|---|
| unacc | 1210 | 70.023 |
| acc | 384 | 22.222 |
| good | 69 | 3.993 |
| vgood | 65 | 3.762 |






### 1. Download the dataset 

Taken directly from the UCI repository, use the pandas module to read the data and check a few records.

In [94]:
import numpy as np
import pandas as pd
from urllib.request import urlretrieve

dataset_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data"

urlretrieve(dataset_url, 'car.data')

data = pd.read_csv('car.data', names=['buying','maint','doors','persons','lug_boot','safety','class'])

print(data.head())

  buying  maint doors persons lug_boot safety  class
0  vhigh  vhigh     2       2    small    low  unacc
1  vhigh  vhigh     2       2    small    med  unacc
2  vhigh  vhigh     2       2    small   high  unacc
3  vhigh  vhigh     2       2      med    low  unacc
4  vhigh  vhigh     2       2      med    med  unacc


### 2. Identify the target variable.

The target variable is marked as a class in the data frame. The values are present in string format. However, the algorithm requires the variables to be coded into its equivalent integer codes. We can convert the string categorical values into an integer code using factorize method of the pandas library.

In [95]:
data['class'], class_names = pd.factorize(data['class'])

Check the encoded values:

In [96]:
print(class_names)
print(data['class'].unique())

Index(['unacc', 'acc', 'vgood', 'good'], dtype='object')
[0 1 2 3]


### 3. Identify the predictor variables and encode any string variables to equivalent integer codes

In [97]:
data['buying'],_ = pd.factorize(data['buying'])
data['maint'],_ = pd.factorize(data['maint'])
data['doors'],_ = pd.factorize(data['doors'])
data['persons'],_ = pd.factorize(data['persons'])
data['lug_boot'],_ = pd.factorize(data['lug_boot'])
data['safety'],_ = pd.factorize(data['safety'])
data.head()

Unnamed: 0,buying,maint,doors,persons,lug_boot,safety,class
0,0,0,0,0,0,0,0
1,0,0,0,0,0,1,0
2,0,0,0,0,0,2,0
3,0,0,0,0,1,0,0
4,0,0,0,0,1,1,0


### 4. Select the predictor features and target variable

In [98]:
X = data.iloc[:,:-1]
y = data.iloc[:,-1]

### 5. Train test split
split data randomly into 70% training and 30% test

In [99]:
from sklearn import metrics , model_selection

X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.3, random_state=123)

### 6. Training/model fitting

In [100]:
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import MultinomialNB

model = BernoulliNB()

model.fit(X_train, y_train)

BernoulliNB()

### 7. Model parameters study

Use the model to make predictions with the test data

Evaluate performance

In [101]:
y_pred = model.predict(X_test)

count_misclassified = (y_test != y_pred).sum()
print('Misclassified samples: {}'.format(count_misclassified))

accuracy = metrics.accuracy_score(y_test, y_pred)
print('Accuracy: {:.2f}'.format(accuracy))

Misclassified samples: 90
Accuracy: 0.83
