# Tree and Ensemble Regressors #

This is the last type of regressors I will try on the accident data. If none of these have good outcomes, I'm going to give up using regression on this data and concentrate only on classification, i.e. predicting locations that will likely have at least one accident.

In [1]:
import pandas as pd
import numpy as np

from matplotlib import pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn import tree
from sklearn import ensemble

In [2]:
# read in data set with categorical variables turned into dummy variablees
df = pd.read_csv('data/cleaned_data/md_dum.csv')

# create X and y values for modeling
car_y = df.car_acc_score
car_X = df.drop(columns=['Unnamed: 0', 'car_acc_score', 'car_dens_score', 'bike_dens_score'])
bike_y = df.bike_acc_score
bike_X = df.drop(columns=['Unnamed: 0', 'bike_acc_score', 'car_dens_score', 'bike_dens_score'])

In [3]:
# do train test split
X_car_train, X_car_test, y_car_train, y_car_test = train_test_split(car_X, car_y, test_size=0.3, random_state=18,
                                                                   shuffle=True, stratify=car_y)
X_bike_train, X_bike_test, y_bike_train, y_bike_test = train_test_split(bike_X, bike_y, test_size=0.3, random_state=18,
                                                                   shuffle=True, stratify=car_y)

### Decision Tree Regressor ###

In [4]:
dtr = tree.DecisionTreeRegressor()

In [17]:
dtr.fit(X_car_train, y_car_train)
pred = dtr.predict(X_car_test)
dtr_r2 = r2_score(y_car_test, pred)
print('Decision Tree Regression Score on car accidents: {}'.format(dtr_r2))

Decision Tree Regression Score on car accidents: -0.4292589072573687


In [5]:
dtr.fit(X_bike_train, y_bike_train)
pred = dtr.predict(X_bike_test)
dtr_r2 = r2_score(y_bike_test, pred)
print('Decision Tree Regression Score on bike accidents: {}'.format(dtr_r2))

Decision Tree Regression Score on bike accidents: 0.03306168329917303


### Random Forest Regressor ###

In [6]:
rfr = ensemble.RandomForestRegressor()

In [6]:
rfr.fit(X_car_train, y_car_train)
car_score = rfr.score(X_car_test, y_car_test)
print('Random Forest Regression Score on car accidents: {}'.format(car_score))

Random Forest Regression Score on car accidents: 0.10908453089581127


In [7]:
rfr.fit(X_bike_train, y_bike_train)
bike_score = rfr.score(X_bike_test, y_bike_test)
print('Random Forest Regression Score on bike accidents: {}'.format(bike_score))

Random Forest Regression Score on bike accidents: 0.4514587420452767


### Bagging Regressor ###

In [8]:
br = ensemble.BaggingRegressor()

In [10]:
br.fit(X_car_train, y_car_train)
car_score = br.score(X_car_test, y_car_test)
print('Bagging Regression Score on car accidents: {}'.format(car_score))

Bagging Regression Score on car accidents: 0.024822322498819327


In [9]:
br.fit(X_bike_train, y_bike_train)
bike_score = br.score(X_bike_test, y_bike_test)
print('Bagging Regression Score on bike accidents: {}'.format(bike_score))

Bagging Regression Score on bike accidents: 0.4220904757449081


### Gradient Boosting Regressor ###

In [10]:
gbr = ensemble.GradientBoostingRegressor()

In [11]:
gbr.fit(X_car_train, y_car_train)
car_score = gbr.score(X_car_test, y_car_test)
print('Gradient Boosting Regression Score on car accidents: {}'.format(car_score))

Gradient Boosting Regression Score on car accidents: 0.09855219511407176


In [11]:
gbr.fit(X_bike_train, y_bike_train)
bike_score = gbr.score(X_bike_test, y_bike_test)
print('Gradient Boosting Regression Score on bike accidents: {}'.format(bike_score))

Gradient Boosting Regression Score on bike accidents: 0.2760963993934691


### AdaBoost Regressor ###

In [12]:
abr = ensemble.AdaBoostRegressor()

In [12]:
abr.fit(X_car_train, y_car_train)
car_score = abr.score(X_car_test, y_car_test)
print('AdaBoost Regression Score on car accidents: {}'.format(car_score))

AdaBoost Regression Score on car accidents: -0.5292187350971909


In [13]:
abr.fit(X_bike_train, y_bike_train)
bike_score = abr.score(X_bike_test, y_bike_test)
print('AdaBoost Regression Score on bike accidents: {}'.format(bike_score))

AdaBoost Regression Score on bike accidents: -0.15733427820141488


Yes, none of these performed very well. I am going to focus on classifiers for this project going forward, not regressors. Next step is trying tree and ensemble classifiers. [Go >>](Testing%20Models%20-%20Tree%20and%20Ensemble%20Classifiers.ipynb)