Now that we have our trained models we can make some predictions and evaluate their peformance. There is no need to train our model's on the training data again since RandomizedSearchCV refits on the best estimator by default.

In [8]:
import joblib 
import pandas as pd
import numpy as np
import json
from sklearn.metrics import root_mean_squared_error
from sklearn.metrics import mean_absolute_error
import sys
sys.path.append('..') #get root directory

In [6]:
svr_model = joblib.load('../models/svr_model.joblib')
rfr_model = joblib.load('../models/rfr_model.joblib')
dummy_model = joblib.load('../models/dummy_regressor.joblib')

with open('../models/svr_grid_model_score.json', 'r') as file:
    svr_log_score = json.load(file)
    
with open('../models/rfr_grid_model_score.json', 'r') as file:
    rfr_log_score = json.load(file)


In [3]:
X_test = pd.read_csv("../data/interim/X_test.csv", index_col= 0) # expects an untransformed X_train
y_test = pd.read_csv("../data/interim/y_test.csv", index_col = 0)
y_test_transformed = pd.read_csv("../data/processed/y_test_transformed.csv", index_col = 0)

In [11]:
dummy_predictions_log = dummy_model.predict(X_test) # gives a prediction in log space
dummy_predictions = np.exp(dummy_predictions_log) 

dummy_predictions_rsme = root_mean_squared_error(y_test, dummy_predictions)
dummy_log_predictions_rsme = root_mean_squared_error(y_test_transformed, dummy_predictions_log)
dummy_predictions_mae = mean_absolute_error(y_test, dummy_predictions)
dummy_log_predictions_mae = mean_absolute_error(y_test_transformed, dummy_predictions_log)


print(f"Dummy predictions RSME in dollars: ${dummy_predictions_rsme:,.2f}") 
print(f"Dummy log predictions RSME: {dummy_log_predictions_rsme:.4f}")
print(f"Dummy predictions MAE in dollars: ${dummy_predictions_mae:,.2f}")

Dummy predictions RSME in dollars: $386,309.06
Dummy log predictions RSME: 0.5289
Dummy predictions MAE in dollars: $225,367.89


Store the scoring values from the final models and compare to the log_prediction_rsme to see if model is underfitting or overfitting

In [12]:
svr_predictions_log = svr_model.predict(X_test) # gives a prediction in log space
svr_predictions = np.exp(svr_predictions_log) 

svr_predictions_rsme = root_mean_squared_error(y_test, svr_predictions)
svr_log_predictions_rsme = root_mean_squared_error(y_test_transformed, svr_predictions_log)
svr_predictions_mae = mean_absolute_error(y_test, svr_predictions)
svr_log_predictions_mae = mean_absolute_error(y_test_transformed, svr_predictions_log)

print(f"SVR predictions RSME in dollars: ${svr_predictions_rsme:,.2f}")
print(f"SVR log predictions RSME: {svr_log_predictions_rsme:.4f}")
print(f"SVR Grid Search RSME: {-svr_log_score:.4f}")
print(f"SVR predictions MAE in dollars: ${svr_predictions_mae:,.2f}")

SVR predictions RSME in dollars: $201,325.85
SVR log predictions RSME: 0.2669
SVR Grid Search RSME: 0.2902
SVR predictions MAE in dollars: $107,141.97


In [10]:
rfr_predictions_log = rfr_model.predict(X_test)
rfr_predictions = np.exp(rfr_predictions_log)

rfr_predictions_rsme = root_mean_squared_error(y_test, rfr_predictions)
rfr_predictions_log_rsme = root_mean_squared_error(y_test_transformed, rfr_predictions_log)
rfr_predictions_mae = mean_absolute_error(y_test, rfr_predictions)
rfr_log_predictions_mae = mean_absolute_error(y_test_transformed, rfr_predictions_log)

print(f"RFR predictions RSME in dollars: ${rfr_predictions_rsme:,.2f}")
print(f"RFR log predictions RSME: {rfr_predictions_log_rsme:.4f}")
print(f"RFR Grid Search RSME: {-rfr_log_score:.4f}")
print(f"RFR predictions MAE in dollars: ${rfr_predictions_mae:,.2f}")

RFR predictions RSME in dollars: $240,284.93
RFR log predictions RSME: 0.2774
RFR Grid Search RSME: 0.3001
RFR predictions MAE in dollars: $118,114.06


Both of the models have a lower RSME compared to the dummy regressor so the models aren't peforming horribly. The support vector regressor model beats the random forest regressor model in terms of having the lowest RSME and MAE. Our RSME tells us that on our predictions can deviate up to roughly $200k with more weight towards larger mistakes. The MAE tells us that on average our predictions are off by roughly $107k which compared to the mean of approximately $560k is about a 19% average error.