# Assignment A5-1 Car Rental Data Classification

_Subject_, **Artificial Intelligence**  
_Topic_, **Machine Learning**  
_Subtopic_, **Supervised Machine Learning: Regression**  

### Resources
* ...

### Content
_Only works in jupyter notebooks_
* ...

### Assignment
Build a Decision Tree model to find car, which clients are likely to bye.  
See Activity 10 and Activity 11 from the textbook for details.

| class | N    | N[%]       | fullname     |
|-------|:----:|------------|--------------|
|unacc  | 1210 | (70.023 %) | unacceptable |
|acc    | 384  | (22.222 %) | acceptable   |
|good   | 69   | ( 3.993 %) | good         |
|vgood | 65   | ( 3.762 %) | very good    |

**Features**

| feature  | values                 | numeric values         |
|----------|:----------------------:|------------------------|
| buying   | v-high, high, med, low | 0.00, 0.33, 0.67, 1.00 | 
| maint    | v-high, high, med, low | 1.00, 0.67, 0.33, 0.00 |
| doors    | 2, 3, 4, 5-more        | 0.00, 0.33, 0.67, 1.00 | 
| persons  | 2, 4, more             | 0.00, 0.50, 1.00       |
| lug_boot | small, med, big        | 0.00, 0.50, 1.00       |
| safety   | low, med, high         | 0.00, 0.50, 1.00       |

## Random Forest Classifier
... page 455

## Prepare Environment

#### Imports

In [135]:
import pandas
import numpy as np

# sklearn for machine learning methods
from sklearn import tree
from sklearn import model_selection
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

#### Global Constants

In [165]:
HEADERS = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'classifier']
CLASSIFIERS = ['unacc', 'acc', 'good', 'vgood']
DATA_URL = 'https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data'

## Prepare Data

Creates a dataframe from the `DATA_URL` and setting the headers to `HEADERS`  

In [137]:
df = pandas.read_csv(DATA_URL, names=HEADERS)

Displaying the shape to see the size of the dataset

In [138]:
df.shape

(1728, 7)

Displaying the top of the dataset to see the structure

In [139]:
df.head()

Unnamed: 0,buying,maint,doors,persons,lug_boot,safety,classifier
0,vhigh,vhigh,2,2,small,low,unacc
1,vhigh,vhigh,2,2,small,med,unacc
2,vhigh,vhigh,2,2,small,high,unacc
3,vhigh,vhigh,2,2,med,low,unacc
4,vhigh,vhigh,2,2,med,med,unacc


Displaying the information of the dataset, to see if any values should be modified

In [140]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1728 entries, 0 to 1727
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   buying      1728 non-null   object
 1   maint       1728 non-null   object
 2   doors       1728 non-null   object
 3   persons     1728 non-null   object
 4   lug_boot    1728 non-null   object
 5   safety      1728 non-null   object
 6   classifier  1728 non-null   object
dtypes: object(7)
memory usage: 94.6+ KB


In [141]:
df.loc[(df.buying == 'low'),'buying'] = 0
df.loc[(df.buying == 'med'),'buying'] = 1
df.loc[(df.buying == 'high'),'buying'] = 2
df.loc[(df.buying == 'vhigh'),'buying'] = 3

##
df.loc[(df.maint == 'low'),'maint'] = 0
df.loc[(df.maint == 'med'),'maint'] = 1
df.loc[(df.maint == 'high'),'maint'] = 2
df.loc[(df.maint == 'vhigh'),'maint'] = 3

##
df.loc[(df.doors == '2'),'doors'] = 0
df.loc[(df.doors == '3'),'doors'] = 1
df.loc[(df.doors == '4'),'doors'] = 2
df.loc[(df.doors == '5more'),'doors'] = 3

##
df.loc[(df.persons == '2'),'persons'] = 0
df.loc[(df.persons == '4'),'persons'] = 1
df.loc[(df.persons == 'more'),'persons'] = 2

##
df.loc[(df.lug_boot == 'small'),'lug_boot'] = 0
df.loc[(df.lug_boot == 'med'),'lug_boot'] = 1
df.loc[(df.lug_boot == 'big'),'lug_boot'] = 2

##
df.loc[(df.safety == 'low'),'safety'] = 0
df.loc[(df.safety == 'med'),'safety'] = 1
df.loc[(df.safety == 'high'),'safety'] = 2

##
df.loc[(df.classifier == 'unacc'),'classifier'] = 0
df.loc[(df.classifier == 'acc'),'classifier'] = 1
df.loc[(df.classifier == 'good'),'classifier'] = 2
df.loc[(df.classifier == 'vgood'),'classifier'] = 3

The model only takes numeric values, therefore are the values defined as `float64`

In [162]:
df['buying'] = df['buying'].astype('float64')
df['maint'] = df['maint'].astype('float64')
df['doors'] = df['doors'].astype('float64')
df['persons'] = df['persons'].astype('float64')
df['lug_boot'] = df['lug_boot'].astype('float64')
df['safety'] = df['safety'].astype('float64')
df['classifier'] = df['classifier'].astype('float64')

Unnamed: 0,buying,maint,doors,persons,lug_boot,safety,classifier
1698,0.0,0.0,2.0,2.0,2.0,0.0,0.0
1699,0.0,0.0,2.0,2.0,2.0,1.0,2.0
1700,0.0,0.0,2.0,2.0,2.0,2.0,3.0
1701,0.0,0.0,3.0,0.0,0.0,0.0,0.0
1702,0.0,0.0,3.0,0.0,0.0,1.0,0.0
1703,0.0,0.0,3.0,0.0,0.0,2.0,0.0
1704,0.0,0.0,3.0,0.0,1.0,0.0,0.0
1705,0.0,0.0,3.0,0.0,1.0,1.0,0.0
1706,0.0,0.0,3.0,0.0,1.0,2.0,0.0
1707,0.0,0.0,3.0,0.0,2.0,0.0,0.0


Displaying information about the columns

In [144]:
df.describe()

Unnamed: 0,buying,maint,doors,persons,lug_boot,safety,classifier
count,1728.0,1728.0,1728.0,1728.0,1728.0,1728.0,1728.0
mean,1.5,1.5,1.5,1.0,1.0,1.0,0.414931
std,1.118358,1.118358,1.118358,0.816733,0.816733,0.816733,0.7407
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.75,0.75,0.75,0.0,0.0,0.0,0.0
50%,1.5,1.5,1.5,1.0,1.0,1.0,0.0
75%,2.25,2.25,2.25,2.0,2.0,2.0,1.0
max,3.0,3.0,3.0,2.0,2.0,2.0,3.0


Show the amount of `null` values in the colunms

In [145]:
df.isnull().sum()

buying        0
maint         0
doors         0
persons       0
lug_boot      0
safety        0
classifier    0
dtype: int64

Shows the spread of classifiers

In [146]:
df.groupby('classifier').size()

classifier
0.0    1210
1.0     384
2.0      69
3.0      65
dtype: int64

## Preprocess Data

Converst the `dataframe` to a `list`

In [147]:
dataset = df.values

Creates sub sets as `features` and `labels`

In [148]:
features, labels = dataset[:, :-1], dataset[:, -1]

Separate input data into classes based on labels

In [149]:
class0 = np.array(features[labels==0])
class1 = np.array(features[labels==1])
class2 = np.array(features[labels==2])
class3 = np.array(features[labels==3])

Show the sizes of the grouped lists

In [150]:
print('class 0', class0.shape)
print('class 1', class1.shape)
print('class 2', class2.shape)
print('class 3', class3.shape)

class 0 (1210, 6)
class 1 (384, 6)
class 2 (69, 6)
class 3 (65, 6)


## Train Model

Split the dataset into into training and testing sets in proportion 8:2 
- 80% of it as training data
- 20% as a validation dataset

Initialize seed parameter for the random number generator used for the split

In [151]:
set_prop = 0.2
seed = 7

Seperating the dataset into subsets for training and testing

In [152]:
X_train, X_test, y_train, y_test = model_selection.train_test_split(features, labels, test_size=set_prop, random_state=seed)

Random Forest

In [154]:
# classifier = DecisionTreeClassifier(**params)
classifier = RandomForestClassifier(n_estimators = 100, max_depth = 6)
classifier.fit(X_train, y_train)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=6, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)

## Test Model

In [164]:
# data 1:: 0.0, 3.0, 3.0, 0.0, 0.0, 0.0 => 0.0 | row 0
data_1 = [[0.0, 3.0, 3.0, 0.0, 0.0, 0.0]]
print('data 1', classifier.predict(data_1))

# data 2:: 0.0, 0.0, 3.0, 1.0, 0.0, 1.0 => 1.0 | row 1711
data_2 = [[0.0, 0.0, 3.0, 1.0, 0.0, 1.0]]
print('data 2', classifier.predict(data_2))

# data 3:: 0.0, 0.0, 3.0, 2.0, 1.0, 1.0 => 2.0 | row 1723
data_3 = [[0.0, 0.0, 3.0, 2.0, 1.0, 1.0]]
print('data 3', classifier.predict(data_3))

# data 4:: 0.0, 0.0, 3.0, 2.0, 2.0, 2.0 => 3.0 | row 1727
data_4 = [[0.0, 0.0, 3.0, 2.0, 2.0, 2.0]]
print('data 4', classifier.predict(data_4))

data 1 [0.]
data 2 [1.]
data 3 [2.]
data 4 [3.]


In [156]:
print ("Accuracy is ", accuracy_score(y_test,y_testp))

Accuracy is  0.953757225433526
