# Car Evaluation using Random Forest and Decision Trees
Welcome to this tutorial on implementing Random Forest and Decision Trees! Decision trees and their ensemble method, Random Forest, are popular machine learning techniques used for both classification and regression tasks. Decision Trees are simple to understand and interpret, making them a great choice for building intuitive models. On the other hand, Random Forest combines multiple decision trees to reduce overfitting and improve model accuracy.

In this tutorial, we will start by introducing the concepts behind Decision Trees and Random Forest. We will then proceed to implement these techniques using Python's scikit-learn library. Along the way, we will discuss the key hyperparameters involved in building these models and their impact on model performance. Finally, we will compare the performance of both techniques using evaluation metrics such as accuracy, precision, recall, and F1-score.

Whether you're new to machine learning or an experienced practitioner looking to expand your skill set, this tutorial will provide you with a comprehensive guide to implementing Random Forest and Decision Trees. So, let's get started!

Import the necessary the libraries

In [1]:
import numpy as np
import pandas as pd

Read in the dataset in to the dataframe

In [2]:
dataset = pd.read_csv('car_evaluation.csv')
dataset.head()

Unnamed: 0,vhigh,vhigh.1,2,2.1,small,low,unacc
0,vhigh,vhigh,2,2,small,med,unacc
1,vhigh,vhigh,2,2,small,high,unacc
2,vhigh,vhigh,2,2,med,low,unacc
3,vhigh,vhigh,2,2,med,med,unacc
4,vhigh,vhigh,2,2,med,high,unacc


Split Features from True values

In [3]:
# Split Features from True values
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

In [6]:
X.head()

AttributeError: 'numpy.ndarray' object has no attribute 'head'

Use train test split function to split training set and testing set

In [4]:
#I'm defining training and testing variables using train_test_split function
from sklearn.model_selection import train_test_split
#implement train_test_split to a test ratio of 0.30 and then 0.25; compare end accuracy scores for both
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)
X1_train, X1_test, y1_train, y1_test = train_test_split(X, y, test_size = 0.25, random_state = 42)

Use ordinal encoder to encode the strings to int or float values

In [5]:
from sklearn.preprocessing import OrdinalEncoder
#define name for preprocessor 
encoder = OrdinalEncoder()
X_train = encoder.fit_transform(X_train)
X_test = encoder.transform(X_test)

X1_train = encoder.fit_transform(X1_train)
X1_test = encoder.transform(X1_test)

Use Random forest Classifier to fit the training Features to the training true values

In [15]:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(random_state=42)
rf.fit(X_train, y_train)
y_pred=rf.predict(X_test)

rf.fit(X1_train, y1_train)
y1_pred=rf.predict(X1_test)

from sklearn.metrics import accuracy_score 
print('Random Forest accuracy score with 10 decision-trees at 30% testing: {0:0.4f}'.format(accuracy_score(y_test, y_pred)))
print('Random Forest accuracy score with 10 decision-trees at 25% testing: {0:0.4f}'.format(accuracy_score(y1_test, y1_pred)))


Random Forest accuracy score with 10 decision-trees at 30% testing: 0.9653
Random Forest accuracy score with 10 decision-trees at 25% testing: 0.9583


Now implement the same using Decision Tree Classifier and use accuracy metric to evaluate results

In [17]:
#now implement the algorithm DecisionTreeClassifier 
from sklearn.tree import DecisionTreeClassifier 
dt = DecisionTreeClassifier(random_state=42)

dt.fit(X_train, y_train)
y_pred=dt.predict(X_test)

dt.fit(X1_train, y1_train)
y1_pred=dt.predict(X1_test)

from sklearn.metrics import accuracy_score 
print('Decision Tree accuracy score at 30% testing: {0:0.4f}'.format(accuracy_score(y_test, y_pred)))
print('Decision Tree accuracy score at 25% testing: {0:0.4f}'.format(accuracy_score(y1_test, y1_pred)))
#test accuracy as in my code

Decision Tree accuracy score with 10 decision-trees at 30% testing: 0.9769
Decision Tree accuracy score with 10 decision-trees at 25% testing: 0.9769


For Random Forest:
The testing size affects the acurracy of the model. The more testing the data has the more acurrate it will be.

For Decision Tree
The testing size does not matter.