# Model Evaluation

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df_final = pd.read_csv('df_final_after_in-depth_1.csv').set_index("Unnamed: 0")

After comparing the scores of various regression models, we chose Random Forest. The performance scores showed there is no difference in accuracy of predictions between scaled and unscaled data, so we'll be using unscaled dataframe.

### Random Forest

In [None]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Split the data into train and test sets
X = df_final.drop("Weekly_Sales", axis=1)
y = df_final["Weekly_Sales"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit the Random Forest model
forest = RandomForestRegressor(n_estimators=100, random_state=0, max_depth=10, min_samples_split=25)
forest.fit(X_train, y_train)

# Make predictions
y_pred_train_forest = forest.predict(X_train)
y_pred_test_forest = forest.predict(X_test)

# Calculate R2 score
r_squared_score_train_forest = r2_score(y_train, y_pred_train_forest)
r_squared_score_test_forest = r2_score(y_test, y_pred_test_forest)

print(f"R2 train score: {r_squared_score_train_forest:.4f} R2 test score: {r_squared_score_test_forest:.4f}")


R2 train score: 0.9236 R2 test score: 0.9211


### Evaluate the models using different type of errors

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

mae = mean_absolute_error(y_test, y_pred_test_forest)
mse = mean_squared_error(y_test, y_pred_test_forest)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred_test_forest)

print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
print("Root Mean Squared Error:", rmse)
print("R-squared (R2) Score:", r2)

Mean Absolute Error: 3087.9719406216536
Mean Squared Error: 31786931.181501385
Root Mean Squared Error: 5637.989994803235
R-squared (R2) Score: 0.9210709905073468


The MAE score for the random forest model is approximately 3087.97, which can be considered a low score since it is based on the range of weekly sales. This indicates a high model accuracy, where the predictions are close to the actual values of the data set. 

The MSE score for the random forest is approximately 31,786,931, which is relatively high based on the weekly sales' range as well. A high MSE score suggests lower accuracy in prediction s.

The RMSE is approximately 5637.99 for the model. Similar to MSE and MAE, the RMSE score is considered generally low, revealing high model accuracy.

Lastly, the R2 score is 0.921 or 92.1%. This means that the model explains around 92.1%  of the variance in weekly sales, the target variable. This indicates a better fit as the perfect fit is 100%.

In summary, the scores provided by the random forest suggest that the model is a good fit and has relatively good performance. 

### Improve your model by using grid search

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

param_grid = {'max_depth': list(range(4,10)),
              'n_estimators': [50, 100, 200], 
              'min_samples_split': [10, 20, 30]
             }

grid_search = GridSearchCV(forest, param_grid, cv=5, return_train_score=True)
grid_search.fit(X_train, y_train)

best_max_depth = grid_search.best_params_['max_depth']
best_n_estimators = grid_search.best_params_['n_estimators']
best_min_samples_split = grid_search.best_params_['min_samples_split']

print("Best Parameters: {}".format(grid_search.best_params_))
print("Best Max Depth: {}".format(best_max_depth))
print("Best Number of Estimators: {}".format(best_n_estimators))
print("Best Min Samples Split: {}".format(best_min_samples_split))

Best Parameters: {'max_depth': 9, 'min_samples_split': 10, 'n_estimators': 200}
Best Max Depth: 9
Best Number of Estimators: 200
Best Min Samples Split: 10


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=755e957e-1a32-4030-9372-201dbf660aae' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>