# Assignment 5

**Instructions**

This assignment is the continuation of Assignment 3, where linear regression was used to model the data. In this assignment, we will use another supervised model, the decision tree, to model `total_conversion`. We will then compare the results and see which model – OLS vs Decision Tree – performs better.

1. Download the data set again and read into a Pandas dataframe.
2. Using sklearn, instantiate a simple regression model (`LinearRegression` class). This will be the baseline model.
3. Use `cross_val_score` to find the R-squared, MSE, and MAE for the regression mode. (See the code below for reference).
4. Using sklearn, train a decision tree regressor (`DecisionTreeRegressor` class)
5. Using `GridSearchCV`, find the best hyperparameters for the decision tree regressor.
6. Use the `cross_val_score`, observe how the best model performs on the dataset.
7. With respect to R-squared, MSE, MAE, compare the decision tree regressor to simple the linear regression model.

Which model has a higher R^2 and a lower MAE? Linear Regression or Decision Tree Regressor?


In [None]:
# Install libraties (only if not installed before)
# !python3 -m pip install numpy pandas matplotlib scipy sklearn

In [None]:
# Download the CSV data
!curl -o conversion-data.csv -H "Accept: application/csv" -X GET https://raw.githubusercontent.com/danyentezari/bignumber-material/master/SPML%20Dubai/mod3/conversion-data.csv

In [None]:

"""
Example of finding best hyperparameters and 
cross-validating a decision tree regressor.

This code uses toy data. Use this code as reference.
Adapt it for your dataset and for comparing the decision tree 
to the linear regression model.
"""

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Load the Boston housing dataset
boston = load_boston()
X = boston.data
y = boston.target

# Create the Decision Tree Regressor
model_dtr = DecisionTreeRegressor()

# Define the hyperparameter grid for GridSearchCV
param_grid = {
    'max_depth': [3, 5, 7, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
}

# Perform a grid search with cross-validation
grid_search = GridSearchCV(model_dtr, param_grid, cv=5)
grid_search.fit(X, y)

# Print the best hyperparameters and score
print("Best Parameters: ", grid_search.best_params_)
print("Best Score: ", grid_search.best_score_)

# Use cross_val_score on model with best hyperparameters
model_dtr_best_hyperparameter = DecisionTreeRegressor(**grid_search.best_params_)
scores = cross_val_score(model_dtr_best_hyperparameter, X, y, cv=5, scoring='r2')
print("CRV R-squared: ", scores, 'Mean Score:', scores.mean())

scores = cross_val_score(model_dtr_best_hyperparameter, X, y, cv=5, scoring='neg_mean_squared_error')
print("CRV MSE: ", scores, 'Mean Score:', scores.mean()*-1)

scores = cross_val_score(model_dtr_best_hyperparameter, X, y, cv=5, scoring='neg_mean_absolute_error')
print("CRV MAE: ", scores, 'Mean Score:', scores.mean()*-1)