Dataset complete info: 'https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data'

#### Attribute information

1. buying  $\quad$     v-high, high, med, low
2. maint    $\quad$    v-high, high, med, low
3. doors      $\quad$  2, 3, 4, 5-more
4. persons    $\quad$  2, 4, more
5. lug_boot    $\quad$ small, med, big
6. safety      $\quad$ low, med, high

In [1]:
# importing library
import pandas as pd 
import numpy as np
from sklearn.model_selection import train_test_split, cross_validate
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import plot_confusion_matrix
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA


### Putting the data into dataframe

In [2]:
# Loading the data
data='https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data'

columns=['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'acceptability']

data=pd.read_csv(data, names=columns)

data

Unnamed: 0,buying,maint,doors,persons,lug_boot,safety,acceptability
0,vhigh,vhigh,2,2,small,low,unacc
1,vhigh,vhigh,2,2,small,med,unacc
2,vhigh,vhigh,2,2,small,high,unacc
3,vhigh,vhigh,2,2,med,low,unacc
4,vhigh,vhigh,2,2,med,med,unacc
...,...,...,...,...,...,...,...
1723,low,low,5more,more,med,med,good
1724,low,low,5more,more,med,high,vgood
1725,low,low,5more,more,big,low,unacc
1726,low,low,5more,more,big,med,good


#### Data cleaning

1. buying $\quad$ low:0, med :1, high:2, vhigh:3

2. maint $\quad$ low':0, 'med':1,'high':2, 'vhigh':3

3. 'doors' $\quad$ '2':2,'3':3,'4':4,'5more':5

4. 'lug_boot' $\quad$ 'small':0,'med':1,'big':2

5. 'persons' $\quad$ '2':2, '4':4,'more':5

6. 'safety' $\quad$ 'low':0, 'med':1,'high':2

7. 'target' $\quad$ 'unacc': 0,'acc':1, 'good': 2, 'vgood' :3




In [3]:
data['buying'] = data['buying'].map({'low':0, 'med':1,'high':2, 'vhigh':3})
data['maint'] = data['maint'].map({'low':0, 'med':1,'high':2, 'vhigh':3})
data['doors']=data['doors'].map({'2':2,'3':3,'4':4,'5more':5})
data['lug_boot']=data['lug_boot'].map({'small':0,'med':1,'big':2})
data['persons'] = data['persons'].map({'2':2, '4':4,'more':5})
data['safety'] = data['safety'].map({'low':0, 'med':1,'high':2})
data['target']=data['acceptability'].map({'unacc': 0,'acc':1, 'good': 2, 'vgood' :3})


In [4]:
# selecting the required data 
X, y = data.iloc[:, :-2], data.iloc[:, -1]

#### Slipting the data

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.2)

### Normalising the data

In [6]:
scaler=StandardScaler()
X_train=scaler.fit_transform(X_train)
X_test=scaler.transform(X_test)

#### Classifying the data

In [6]:
# Instances of Random Forest Classifier and fit with the normalised train
clf_forest=RandomForestClassifier(n_estimators=5, max_depth=None, 
                                    min_samples_split=2, random_state=0)
                                    
clf_forest.fit(X_train, y_train)
clf_forest.predict(X_test)
score_forest=clf_forest.score(X_train, y_train)


print('Accuracy of Random Forest :{:.4}'.format(score_forest))

Accuracy of Random Forest :0.9971


In [7]:
# Instances of Decission Tree Classifier and fit with the normalised train
clf_tree = DecisionTreeClassifier(random_state=0)
clf_tree = clf_tree.fit(X_train, y_train)
clf_tree.predict(X_test)
score_tree=clf_tree.score(X_train, y_train)

print('Accuracy of Decision Tree :{:.4}'.format(score_tree))

Accuracy of Decision Tree :0.7836


### Applying PCA on the data

In [8]:
# instances of PCA with n=2 components
pca = PCA(n_components=2)
X_pca_Train= pca.fit_transform(X_train)
X_pca_Test=pca.transform(X_test)


In [9]:
# fitting the random forest classification on  PCA features
pca_forest=clf_forest.fit(X_pca_Train,y_train)
clf_forest.predict(X_pca_Test)
clf_forest.predict_proba(X_pca_Test)
pca_score_forest=clf_forest.score(X_pca_Train, y_train)

# fitting Decission Tree classifier on the PCA features
pca_tree=clf_tree.fit(X_pca_Train, y_train)
clf_tree.predict(X_pca_Test)
clf_tree.predict_proba(X_pca_Test)
pca_score_tree=clf_tree.score(X_pca_Train, y_train)


In [10]:
print('Accuracy of PCA & Random Forest  :{:.4}'.format(score_tree))
print('Accuracy of PCA & Decision Tree :{:.4}'.format(score_tree))

Accuracy of PCA & Random Forest  :0.7836
Accuracy of PCA & Decision Tree :0.7836
