# Regression Tree

***

# 1. Auto MPG Dataset

### Problem Description :

The basic idea of analysing the Auto mpg dataset is to get a fair idea about the factors affecting the aggregate fuel consumption of each car.

Cars of different variants of different models of various Car manufacturing companies from origin countries such as USA, Japan and Europe.

Fuel consumption of car is affected by various factors such as Model year, Horsepower, Number of Cylinders present, displacement, Weight and Acceleration of the car.

We need to find which factors mostly affect the Fuel consumption of a car in order to improve the mpg value.

Hence build a model to predict the mpg value of each car.


### Features on the dataset :

**cylinders**: contains the number of cylinders present in the car

**displacement**: contains the Displacement of the car

**horsepower**: contains the Horsepower of the car

**weight**: contains the weight of the car

**acceleration**: contains the Acceleration of the car

**model_year**: contains the model year of the car

**origin**: contains the origin country which car belong to

**car_name**: contains the name of the car(Brand-Model-Variant)

### Target Variable :

**mpg** : contains the fuel consumption value of the car(Brand_Model_Variant)

In [5]:
# Importing our libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd


# Sklearn all
from sklearn.metrics import mean_squared_error
from sklearn import datasets
from sklearn.model_selection import train_test_split

from sklearn import tree
from sklearn.metrics import accuracy_score
from sklearn.tree import plot_tree

from sklearn.tree import DecisionTreeRegressor

from sklearn.linear_model import LinearRegression

import warnings
warnings.filterwarnings("ignore")
pd.set_option('display.float_format', lambda x: '%.2f' % x)
np.set_printoptions(suppress=True)

In [13]:
# Import data
from mlxtend.data import autompg_data
auto_X, auto_y = autompg_data()

In [14]:
# Convert to pandas and remove nan column
auto_X = pd.DataFrame(auto_X, columns = ['cylinders', 'displacement',  'horsepower', 'weight', 'acceleration', 'model_year', 'origin', 'car_name'])
auto_X = auto_X.drop(['car_name'], axis=1)

In [15]:
# Create train test split
X_train, X_test, y_train, y_test = train_test_split(auto_X, auto_y, test_size=0.2, random_state=3)


In [16]:
# Instantitate Decision Tree Regressor
dt = DecisionTreeRegressor(max_depth=8,
                          min_samples_leaf=0.13,
                          random_state = 43)


# Fit dt to the training set
dt.fit(X_train,y_train)

DecisionTreeRegressor(max_depth=8, min_samples_leaf=0.13, random_state=43)

In [30]:
# Compute prediction
y_pred_dt = dt.predict(X_test)

y_pred_dt

array([29.62857143, 19.42931034, 14.50138889, 19.42931034, 24.19210526,
       24.19210526, 14.50138889, 33.60517241, 19.42931034, 14.50138889,
       14.50138889, 29.62857143, 24.19210526, 19.42931034, 24.19210526,
       33.60517241, 14.50138889, 33.60517241, 19.42931034, 33.60517241,
       24.19210526, 14.50138889, 19.42931034, 14.50138889, 29.62857143,
       14.50138889, 33.60517241, 14.50138889, 24.19210526, 19.42931034,
       24.19210526, 14.50138889, 29.62857143, 19.42931034, 24.19210526,
       29.62857143, 33.60517241, 24.19210526, 14.50138889, 29.62857143,
       24.19210526, 24.19210526, 19.42931034, 19.42931034, 14.50138889,
       14.50138889, 24.19210526, 14.50138889, 33.60517241, 19.42931034,
       29.62857143, 24.19210526, 24.19210526, 14.50138889, 19.42931034,
       14.50138889, 14.50138889, 14.50138889, 29.62857143, 33.60517241,
       19.42931034, 14.50138889, 14.50138889, 33.60517241, 14.50138889,
       24.19210526, 19.42931034, 33.60517241, 33.60517241, 24.19

In [21]:
# Evaluate the model
mse_dt = mean_squared_error(y_test, y_pred)
print("Test set MSE of dt: {:.2f}".format(mse_dt))

Test set MSE of dt: 10.79


- Comparision of the Regression Tree with Linear Regression Model

In [23]:
# Instantiate Linear Regression Model

lr = LinearRegression()

# Fit lr to the training set
lr.fit(X_train, y_train)

LinearRegression()

In [31]:
# Compute prediction
y_pred_lr = lr.predict(X_test)
y_pred_lr

array([31.50868688, 18.94367973, 19.41345388, 24.28338033, 30.6217465 ,
       25.18311937, 19.9290987 , 33.9178569 , 21.93520944, 12.77446455,
       11.27400488, 26.61414123, 22.98984846, 16.61043794, 17.69916619,
       30.81095602, 11.29894509, 24.45558915, 24.4685327 , 32.27846427,
       26.51282898, 11.77528025, 18.60866913, 11.63974292, 28.921882  ,
       19.90390673, 37.09221513,  9.90038702, 25.57597615, 19.88422995,
       30.63431058, 15.0085588 , 29.02702802, 22.93580201, 25.1624859 ,
       33.69945353, 35.28630001, 30.29636007, 13.65835098, 31.87232495,
       25.92587932, 27.45557969, 23.7626905 , 22.83768405, 17.59943656,
       10.28690147, 30.52840529, 11.68711525, 32.71700301, 21.08465085,
       33.35236791, 24.0191807 , 27.3157973 ,  7.42848767, 22.86188327,
       10.69940231, 15.91357732, 14.29756733, 32.39085296, 34.19432232,
       25.27345999, 13.3419616 , 13.77667505, 36.64015761, 10.72746844,
       26.47684883, 22.22465871, 32.0542889 , 30.252116  , 24.11

In [27]:
# Evaluate the Linear Regressin model
mse_lr = mean_squared_error(y_test, y_pred)
print("Test set MSE of lr :{:.2f}".format(mse_lr))

Test set MSE of lr :13.04


In [37]:
# Comparision

# Print rmse_lr
print('Linear Regression test set MSE: {:.2f}'.format(mse_lr))
 
# Print rmse_dt
print('Regression Tree test set MSE: {:.2f}'.format(mse_dt))

Linear Regression test set MSE: 13.04
Regression Tree test set MSE: 10.79
