# Ames Housing Step-by-step - Exercise 5 and 6

Pieter Overdevest  
2024-02-09

For suggestions/questions regarding this notebook, please contact
[Pieter Overdevest](https://www.linkedin.com/in/pieteroverdevest/)
(pieter@innovatewithdata.nl).

### How to work with this Jupyter Notebook yourself?

- Get a copy of the repository ('repo') [machine-learning-with-python-explainers](https://github.com/EAISI/machine-learning-with-python-explainers) from EAISI's GitHub site. This can be done by either cloning the repo or simply downloading the zip-file. Both options are explained in this Youtube video by [Coderama](https://www.youtube.com/watch?v=EhxPBMQFCaI).

- Copy the folders 'ames_housing_pieter\' and 'utils_pieter\' folder to your own project folder.

### Import packages

In [None]:
# Load packages and assign to a shorter alias.
import pandas as pd
import numpy as np

# Pieter's utils package.
import utils_pieter as up

## Exercise 5 - Estimate a Linear Regression, a LASSO and a kNN model

The models that we will develop in this section originate from the [SciKit-Learn](https://scikit-learn.org/stable/) library.

We start with the development of a linear regression model ([sklearn.linear_model.LinearRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) and ['Linear Regression in Python' by Dannar Mawardi](https://towardsdatascience.com/linear-regression-in-python-9a1f5f000606)).

#### a. Estimate a Linear Regression model

We will go through a number of steps, starting with defining a list holding all numerical variables excluding `SalePrice` and `SalePrice_log`. 

##### a1. Recap our objects and define a few more

Previously, we defined data frames, `df_scoped_num` and `df_scoped_cat`, and the lists holding their respective variable names, `l_df_num_names` and `l_df_cat_names`. We also defined the list `l_df_X_names` as a subset of `l_df_num_names`, excluding `SalePrice` and `SalePrice_log`. Below, we define data frame `df_X` containing the data of the variable in `l_df_X_names`.

To confirm we did our book keeping right, we confirm that: (1) `l_df_X_names` is two items shorter than `l_df_num_names`, namely `SalePrice` and `SalePrice_log` and (2) `l_df_num_names` and `l_df_cat_names` make up all variables in the updated and reduced data. 

In [None]:
print(f"Length of l_df_X_names:          {len(l_df_X_names)}")
print(f"Length of l_df_num_names:        {len(l_df_num_names)}")
print(f"Length of l_df_cat_names:        {len(l_df_cat_names)}")
print(f"Number of columns in df_reduced: {len(df_reduced.columns)}")
print(f"Number of columns in df_reduced: {len(df_scoped.columns)}")

Depending on the scenario of interest add/remove the hash signs in front of the variable names. Note, `df_reduced` contains all observations that we reduced in memory size by converting the string type data to the categorical type and downcasted the numerical data to their smallest container, and in `df_scoped` we dropped some of the houses that had `Gr Liv Area` > 4000 sq ft.

In [None]:
df_data, ps_y_data_log = df_scoped.copy(), ps_y_scoped_log.copy()
#df_data, ps_y_data_log = df_reduced.copy(), ps_y_reduced_log.copy()

We define `df_X` as the data frame holding all numerical variables - except `SalePrice` and `SalePrice_log` - that we will use to predict the Y variable (`SalePrice_log`). You may wonder, will we use categorical variables as predictors? Good question, the short answer is yes, more will come later.



In [None]:
df_X = df_data[l_df_X_names]

To start simple we build a linear model considering the Top-4 numerical variables with the highest correlation with `SalePrice_log`, the variable that we want to predict.

In [None]:
#l_df_X_names_subset = ['Overall Qual']
#l_df_X_names_subset = ['Overall Qual', 'Gr Liv Area']
#l_df_X_names_subset = ['Overall Qual', 'Gr Liv Area', 'Garage Cars']
l_df_X_names_subset = ['Overall Qual', 'Gr Liv Area', 'Garage Cars', 'Garage Area']
#l_df_X_names_subset = ['Overall Qual', 'Gr Liv Area', 'Total Bsmt SF']

# Define object 'df_X_subset' as subset of df_X.
df_X_subset = df_X[l_df_X_names_subset]

print(df_X.shape, df_X_subset.shape)

##### a2. Draw scatter plot and correlation plot of the predictors (`l_df_X_names_subset`)

To know what we are working with, we plot the concerned variables against `SalePrice_log`.

In [None]:
alt.Chart(df_data).mark_point(opacity=0.1).encode(
    x = alt.X(alt.repeat("column"), type='quantitative'),
    y = alt.Y(alt.repeat("row"), type='quantitative')
).properties(
    width  = 200,
    height = 200
).repeat(
    column = l_df_X_names_subset,
    row    = ['SalePrice_log']            
)

In [None]:
plt.figure(figsize=(5, 5)) 

sns.heatmap(
    data      = df_data[l_df_X_names_subset + ['SalePrice_log']].corr(),
    annot     = True,
    square    = True,
    annot_kws = {"size": 12},
    cmap      = 'coolwarm'
);

##### a3. Split the data

We split the subset of the predictor data (`df_X_subset` and `ps_y_data_log`) using Scikit-Learn's `train_test_split()` ([*ref*](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)).

In [None]:
# Import module.
from sklearn.model_selection import train_test_split

# Split data in train and test sets.
df_X_train, df_X_test, ps_y_log_train, ps_y_log_test = train_test_split(
    
    df_X_subset,
    ps_y_data_log,
    test_size=0.33,
    random_state=42
)

The function `f_train_test_split()` in the utils_pieter package does the same as Scikit-Learn's `train_test_split()` function, in addition, it prints some stats.

In [None]:
# Split data in train and test sets.
# df_X_train, df_X_test, ps_y_log_train, ps_y_log_test = up.f_train_test_split(
    
#     df_X_subset,
#     ps_y_data_log,
#     n_test_size=0.33
# )

##### a4. Scaling

The data in `df_X_subset` is standardized so that each variable is represented at the same scale. Generally, there are two ways to scale variable data ([ref](https://www.analyticsvidhya.com/blog/2020/04/variable-scaling-machine-learning-normalization-standardization/)):

* Normalization is a scaling technique in which values are translated and rescaled so that they end up ranging between 0 and 1. It is also known as Min-Max scaling.

* Standardization is another scaling technique where the values are centered around the mean with a unit standard deviation ([ref](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)). This means that the mean of the variable becomes zero and the resultant distribution has a unit standard deviation.

There is no rule to tell you what to choose when. You can always start by fitting your model to raw, normalized and standardized data, resp., and evaluate the three outcomes. In the example below, we will standardize the variable data so that each variable will have μ = 0 and σ = 1.

Scaling of outcome variable is generally not required. Only when the outcome variable is not normally distributed, like is our case here, scaling will help to improve model performance.

It is a good practice to 'fit' (apply) the scaler object on the training data and then use the same scaler object to transform the testing data. This would avoid any data leakage during the model testing process. Therefore, we use scikit-learn's `fit()` and `transform()` functions in subsequent steps, even though scikit-learn also has a `fit_transform()` function that does both in one go. The difference between `fit()` and `transform()`:
1. 'fit' applies a transformer, like scaling or encoding. The result is an updated 'machine', that knows *how* to convert an input to an output, e.g. from orginal data to scaled data. In case of standardization it knows the mean and the standard deviation of the data it was fit on.
2. 'transform' actually transforms input data to output data. The updated machine is used to convert the input to the output.

The downside of `fit_transform()` is that in case we apply if before to the train/test split, we leak information from the test data into the training data ([ref](https://towardsdatascience.com/what-and-why-behind-fit-transform-vs-transform-in-scikit-learn-78f915cf96fe)). If we apply it after the train/test split we introduce different means and standard deviations in both train and test data. Applying `fit()` to the train data allows us to use the resulting mean and standard deviation to standardize the test data, applying `transform()`.

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
# Define scaling object.
scaler = StandardScaler()
 
# Standardization of variables in training data.
scaler_fitted_on_X_train = scaler.fit(df_X_train)

# Key properties of scaler object.
pd.DataFrame({'name': l_df_X_names_subset, 'mean': scaler_fitted_on_X_train.mean_, 'std': scaler_fitted_on_X_train.scale_})


With the `scaler` object - that was 'fitted' on the variables in the training data - in our hands, we can apply it to transform the  variables in the training data as well as to transform the variables in the test data. This approach avoids any information leakage from the test data into the training data.

In [None]:
df_X_train_scaled = pd.DataFrame(
    scaler_fitted_on_X_train.transform(df_X_train),
    columns = l_df_X_names_subset
)

df_X_test_scaled  = pd.DataFrame(
    scaler_fitted_on_X_train.transform(df_X_test),
    columns = l_df_X_names_subset
)

In [None]:
alt.Chart(df_X_train_scaled).mark_bar().encode(
    x = alt.X(alt.repeat("column"), type='quantitative'),
    y = "count()"
).properties(
    width  = 200,
    height = 200,
    title = "Standardized training data"
).repeat(
    column = l_df_X_names_subset,          
)

In [None]:
alt.Chart(df_X_test_scaled).mark_bar().encode(
    x = alt.X(alt.repeat("column"), type='quantitative'),
    y = "count()"
).properties(
    width  = 200,
    height = 200,
    title = "Standardized test data"
).repeat(
    column = l_df_X_names_subset,          
)

##### a5. Train the model

Having completed all preparations, we can train the linear regression model on the train data. It starts by defining `mo_lin_reg` as an 'empty' linear regression model object.

In [None]:
# Import module.
from sklearn.linear_model import LinearRegression

mo_lin_reg = LinearRegression()

What attributes are available in the 'empty' model?

In [None]:
mo_lin_reg.__dict__

We fill it by fitting the model on the train data. We update the object by fitting the model on the train data (input/output).

In [None]:
mo_lin_reg.fit(df_X_train_scaled, ps_y_log_train);

Again, we use `__dict__` to see what attributes are available in the `mo_lin_reg` object. Now, we observe the presence of the regression coefficients (`coef_`) and the intercept (`intercept_`).

In [None]:
mo_lin_reg.__dict__

##### a6. Interpret the coefficients

We observe a general trend that the larger the correlation with `SalePrice_log` the larger the fitted coefficient.

In [None]:
print(f"Intercept: {mo_lin_reg.intercept_:,.2f}\n")

print("Coefficients:")

pd.DataFrame({
    'variable': l_df_X_names_subset,    
    'coeff':   [str(round(x,3)) for x in mo_lin_reg.coef_],
    'corr':    df_corr_table.query('name in @l_df_X_names_subset')['corr']
})

##### a7. Make predictions based on estimated model and test data

The fitted model - present in `mo_line_reg` - is used to make `SalePrice_log` predictions for the test data.

In [None]:
ps_y_log_pred = mo_lin_reg.predict(df_X_test_scaled)

##### a8. Evaluate estimated model based on test data

There are three primary metrics that can be used to evaluate regression models:

* **Mean Absolute Error (MAE):** The easiest to understand. Represents average error.

* **Mean Squared Error (MSE):** Similar to MAE but noise is exaggerated and larger errors result in higher “punishment”, as the error is squared. MSE is harder to interpret than MAE as it’s not in base units, however, it is generally more popular.

* **Root Mean Squared Error (RMSE):** Most popular metric, and being the square root of MSE - the RMSE is better interpretable as it has the same unit as the outcome variable (Y). This makes RMSE the recommended metric to interpret your linear regression model.

Since we will evaluate these metrics more often, `f_evaluate_results()` has been defined, as part of 'utils_pieter'. This function also returns a scatter plot of the predicted vs actual values.

In [None]:
up.f_evaluate_results(
    ps_y_true = ps_y_log_test,
    ps_y_pred = ps_y_log_pred
)

#### b. Estimate a LASSO model

LASSO (Least Absolute Shrinkage and Selection Operator) is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the resulting machine learning model ([ref](https://machinelearningmastery.com/lasso-regression-with-python/)). Recall that for regularized linear regression a cost function is added. In case of LASSO this is:

$$ J(\Theta) = MSE (\Theta) + \alpha\sum\limits_{i=1}^n \mid{\Theta_{i}}\mid$$

The larger the hyperparameter $\alpha$ - sometimes referred to as $\lambda$ - the larger the penalty and the more coefficients will be set to zero, resulting in a model based on fewer variables.


##### b1. Recap our objects and define a few more

In addition to the numerical variables in the data (`df_X`) you may also want to include categorical data to predict the `SalePrice`. To give an example, earlier, we observed that `Neighborhood` correlated well with `SalePrice`. This can be done by so-called 'One-Hot Encoding' ([ref1](https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/), [ref2](https://towardsdatascience.com/categorical-encoding-using-label-encoding-and-one-hot-encoder-911ef77fb5bd)). In the example below we add the `Neighborhood` variable to the predictor data. One-hot encoding adds new variables to the data, one for each unique value and entering a '1' for observations where it had the concerned value and a '0' in all other cases.

You can see how the `fit_transform()` function is also used here. As in case of scaling, we could perform 'fit' and 'transform' separately. To simplify we do them both in one go.

In [None]:
# Import module.
from sklearn.preprocessing import OneHotEncoder

# Define a one-hot encoder object.
ohe = OneHotEncoder()

# The fit_transform() function produces a 'sparse matrix' (sm_) object. To convert the data to
# a data frame we first need to convert it to a matrix (m_).
sm_neighborhoods = ohe.fit_transform(df_data[['Neighborhood']])
m_neighborhoods  = sm_neighborhoods.toarray()

# The one-hot encoder object 'ohe' has been updated by applying the fit_transform() function to it.
# Now it has the 'categories_' attribute. Since it is an array in a list we take the first element.
# Note, we convert the default floats to integers to save memory (more for demo purposes, since our
# dataset is relatively small already).
df_X_ohe = pd.DataFrame(
    data    = m_neighborhoods,
    columns = ohe.categories_[0]
).astype('int')

# Comms to the user. The created data frame has as many variables as there are unique values in the
# 'Neighborhood' variable in df_data and the same number of observations as df_data has.
# No surprises.
print(f"Unique neighborhoods: {ohe.categories_[0]}\n")

print(f"Length of the 'categories_' attribute:         {len(ohe.categories_[0])}")
print(f"Number of unique neighborhoods in df_data:     {len(df_data['Neighborhood'].unique())}\n")

print(f"Dimensions of created data frame ('df_X_ohe'): {df_X_ohe.shape}")

Combine the two dataframes horizontally, as before, set 'axis' to 1.

In [None]:
df_X_combined = pd.concat([df_X, df_X_ohe], axis = 1)

# Comms to the user. The created data frame 'df_X_combined' has as many observations as both 'X' and 'X_ohe' have,
# and as many columns as the two have together. 
print("\nThe sum of the number of columns in df_X and df_X_ohe must equal that in df_X_combined:")
print(f"Number of variables in df_X:          {df_X.shape[1]}")
print(f"Number of variables in df_X_ohe:      {df_X_ohe.shape[1]}")
print(f"Number of variables in df_X_combined: {df_X_combined.shape[1]}")

print("\nThe number of rows must all be the same:")
print(f"Number of rows in df_X:               {df_X.shape[0]}")
print(f"Number of rows in df_X_ohe:           {df_X_ohe.shape[0]}")
print(f"Number of rows in df_X_combined:      {df_X_combined.shape[0]}")

##### b2. Visualize predictors

We show a sample of the combined data frame to clarify how one-hot coding works and how the result was added to the numerical data. In which neighborhoods are the first five houses located?

In [None]:
df_X_combined.head(5)

##### b3. Split the data

We can follow two scenarios:

Scenario A - Only use the numerical variables (df_X), so excluding the one-hot encoded `Neighborhood` variable.

Scenario B - Use the combined data (df_X_combined), so including the one-hot encoded `Neighborhood` variable.

Depending on what scenario you want to follow, add/remove the '#'.

In [None]:
# Scenario A.
#df_X_train, df_X_test, ps_y_log_train, ps_y_log_test = up.f_train_test_split(df_X, ps_y_data_log)

# Scenario B.
df_X_train, df_X_test, ps_y_log_train, ps_y_log_test = up.f_train_test_split(df_X_combined, ps_y_data_log)

##### b4. Scaling

As before, we standardize `df_X_train` and `df_X_test` separately.

In [None]:
# Define scaling object.
scaler = StandardScaler()
 
# Standardization of variables in train data.
scaler_fitted_on_X_train = scaler.fit(df_X_train)

# Key properties of scaler object for first 10 variables in df_X_train.
pd.DataFrame({'name': df_X_train.columns, 'mean': scaler_fitted_on_X_train.mean_, 'sd': scaler_fitted_on_X_train.scale_}).head(10)

In [None]:
df_X_train_scaled = pd.DataFrame(
    scaler_fitted_on_X_train.transform(df_X_train),
    columns = df_X_train.columns
)

df_X_test_scaled  = pd.DataFrame(
    scaler_fitted_on_X_train.transform(df_X_test),
    columns = df_X_test.columns
)

Let's have a look at the scaled data. In the one-hot encoded neighborhood variables we observe that the 0's and 1's have been replaced by a negative and a positive number, resp.

In [None]:
df_X_train.head(5)

To explain why the 0's and 1's have been transformed to negative and positive numbers.

In [None]:
# We create a dummy list of 0's and 1's.
l_temp = np.zeros(2).tolist()+np.ones(3).tolist()
print(l_temp)

# We standardize these numbers.
(l_temp - np.mean(l_temp)) / np.std(l_temp)

##### b5. Train the model

Instead of using Scikit Learn's `Lasso()` function to build a single model ([ref](https://scikit-learn.org/stable/modules/linear_model.html#lasso)), we use `LassoCV()` to build a series of LASSO models ([ref](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html)). By default `LassoCV()` tries 100 different values for $\alpha$ (`n_alphas`=100 by default). Another way to control alpha is by providing a list of alpha's to evaluate; this is what we do here. We set the min and max value of the alpha range and use these values to calculate the step size in order to have 100 alpha values.



In [None]:
# Define log of lower border of alphas range.
n_alphas_min_log = np.log(0.0001)
n_alphas_max_log = np.log(0.01)
n_step = (n_alphas_max_log - n_alphas_min_log) / 100

# List of alphas.
l_alpha = [round(np.e**i,5) for i in np.arange(n_alphas_min_log, n_alphas_max_log, n_step)]

# First 10 elements.
print(l_alpha[:10])

# Last 10 elements.
print(l_alpha[-10:])

In [None]:
# Import module.
from sklearn.linear_model import LassoCV

mo_lasso = LassoCV(
    
    # Number of folds.
    cv           = 5,

    # Fixing random_state ensures the results are reproducible.
    random_state = 42,

    # Max number of iterations.
    max_iter     = 1000,

    # We can enforce for which alphas a Lasso model is fitted.
    # In case we do not provide a list, LassoCV() will select 100 values.
    alphas       = l_alpha

).fit(
    
    df_X_train_scaled,
    ps_y_log_train
)

##### b6. Interpret the coefficients

For each of the folds we plot the RMSE against the respective $\alpha$'s we added as parameter to `LassoCV()`, see colored dotted lines in the figure below. We also add the mean of the folds, see black line. This plot allows us to choose the optimal $\alpha$, i.e., the one that results in the lowest RMSE.

In [None]:
#pd.set_option("display.precision", 5)

df_lasso = (
    
    pd.DataFrame(np.sqrt(mo_lasso.mse_path_))
    .rename(columns   = {i : f"rmse_{i+1}" for i in range(5)})
    .assign(rmse_mean = np.sqrt(mo_lasso.mse_path_.mean(axis=-1)))
    .assign(alpha     = mo_lasso.alphas_)
    .sort_values(by = 'alpha')
    .reset_index(drop = True)
)

# Transform the data frame to long format.
df_lasso_long = pd.melt(df_lasso, id_vars = 'alpha', var_name = 'fold', value_name = 'rmse')

The best model is selected based on the lowest RMSE. The alpha for that model can be obtained through the attribute `alpha_`.

In [None]:
print(f"We observe the lowest RMSE {min(df_lasso.rmse_mean):.3f} at alpha: {mo_lasso.alpha_:,.5f}")

In [None]:
base_chart = alt.Chart(df_lasso_long).mark_line().encode(
    x     = alt.X('alpha', scale = alt.Scale(type='log'), title="Alpha"),
    y     = alt.Y('rmse',  scale = alt.Scale(domain=[0.1, 0.16])),
    color = alt.Color(
        'fold',
        scale = alt.Scale(
            domain = [f"rmse_{i+1}" for i in range(5)] + ['rmse_mean'],
            range  = ['red']*5 + ['black']
        )
    )
)

# Add a vertical dotted line at alpha = mo_lasso.alpha_
vertical_dotted_line = alt.Chart(df_lasso_long).mark_rule(
        color      = 'red',
        strokeDash = [5,5]
    ).encode(
        x = alt.X('a:Q', title="")
    ).transform_calculate(
        a = str(mo_lasso.alpha_)
    )

# Combine the scatter plot and the vertical dotted line
base_chart + vertical_dotted_line


From the `mo_lasso` object we can extract the intercept and coefficients of the best model. What explains the number of coefficients equal to zero? How can we increase the number of coefficients equal to zero?

In [None]:
print(f"intercept: {mo_lasso.intercept_:,.2f}")

df_lasso_coefficients = pd.DataFrame(

    {
        'name':            df_X_train_scaled.columns,
        'lasso coeff':     mo_lasso.coef_,
        'lasso_coeff_abs': abs(mo_lasso.coef_)        
    }

).sort_values(

    'lasso_coeff_abs',
    ascending=False
)

pd.set_option("display.max_rows", 50)
pd.set_option("display.min_rows", 40)

df_lasso_coefficients

##### b7. Make predictions based on estimated model and test data

The `mo_lasso` object holds the properties of the best model, i.e., the one resulting in the lowest RMSE.

In [None]:
ps_y_log_pred = mo_lasso.predict(df_X_test_scaled)

##### b8. Evaluate estimated model based on test data

The same three primary metrics are used to evaluate the best LASSO model. The RMSE is considerably lower than with the linear regression model, however, we require a lot more variables. Using $\alpha$ we can reduce the number of included variables, but this goes at the 'expense' of a higher RMSE.

In [None]:
up.f_evaluate_results(
    ps_y_true = ps_y_log_test,
    ps_y_pred = ps_y_log_pred
)

#### c. Estimate a kNN model

In addition, we develop a kNN model to predict the `SalePrice_log` ([ref1](https://realpython.com/knn-python/), [ref2](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html#sklearn.neighbors.KNeighborsRegressor)). We define a KNeighborsRegressor object, called `mo_knn`, that is further informed by fitting the training data. Contrary to (LASSO) regression, we keep this section on kNN simple and to the point; you got the idea.

In [None]:
# Import module.
from sklearn.neighbors import KNeighborsRegressor

# Define KNN object.
mo_knn = KNeighborsRegressor(n_neighbors = 2)

mo_knn.fit(df_X_train_scaled, ps_y_log_train);

The trained kNN model is used to predict `SalePrice` for the train and test data. We do this to investigate overfitting.

In [None]:
up.f_evaluate_results(
    ps_y_true = ps_y_log_train,
    ps_y_pred = mo_knn.predict(df_X_train_scaled)
)

In [None]:
up.f_evaluate_results(
    ps_y_true = ps_y_log_test,
    ps_y_pred = mo_knn.predict(df_X_test_scaled)
)

For low values of `n_neighbors` the model is highly flexible and we are running the risk of over-fitting. Observing the RMSE value for the train and test set, we can conclude that we are indeed overfitting. We can make use of SciKit Learn's `GridSearchCV()` function to work through a series of hyperparameters, and determine for which hyperparameter we found the best model.

In [None]:
# Import module.
from sklearn.model_selection import GridSearchCV

parameters = {"n_neighbors": range(1, 50)}
gridsearch = GridSearchCV(KNeighborsRegressor(), parameters)

gridsearch.fit(df_X_train_scaled, ps_y_log_train)

# Comms to the user
print(f"We found the best model for {gridsearch.best_params_}.")

In [None]:
up.f_evaluate_results(
    ps_y_true = ps_y_log_train,
    ps_y_pred = gridsearch.predict(df_X_train_scaled)
)

In [None]:
up.f_evaluate_results(
    ps_y_true = ps_y_log_test,
    ps_y_pred = gridsearch.predict(df_X_test_scaled)
)

Indeed, no over-fitting.

## Exercise 6 - Assess which model performs best

### Evaluation

This is discussed with each of the models. I leave making the overall assessment to you.