In [1]:
from sklearn.datasets import load_boston
import numpy as np

from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge


from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV

**Load Dataset**

In [2]:
boston = load_boston()

Description of Dataset

In [3]:
boston.DESCR

".. _boston_dataset:\n\nBoston house prices dataset\n---------------------------\n\n**Data Set Characteristics:**  \n\n    :Number of Instances: 506 \n\n    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.\n\n    :Attribute Information (in order):\n        - CRIM     per capita crime rate by town\n        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.\n        - INDUS    proportion of non-retail business acres per town\n        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)\n        - NOX      nitric oxides concentration (parts per 10 million)\n        - RM       average number of rooms per dwelling\n        - AGE      proportion of owner-occupied units built prior to 1940\n        - DIS      weighted distances to five Boston employment centres\n        - RAD      index of accessibility to radial highways\n        - TAX      full-value property-tax rate per $10,000

**Create feature matrix & target vector**

In [4]:
X = boston.data
y = boston.target
X.shape, y.shape

((506, 13), (506,))

Range of target variable

In [5]:
ran = np.max(y) - np.min(y)
np.max(y) - np.min(y)

45.0

**Create model instances**

In [6]:
linreg = LinearRegression()
ridge = Ridge()

**Cross validate linear regression model (OLS)**

In [7]:
scores = cross_val_score(linreg, X, y, cv= 5, scoring= 'neg_mean_absolute_error')
error_lr = -scores.mean()
-scores.mean()

4.249968544192538

**Search (exhaustive) through the different parameter combination of Ridge regressior**

Dictionary: parameters are keys and their possible values are values

In [8]:
params = {
    'alpha': [1.0, 0.95, 0.9, 0.85, 0.80, 0.75],
    'solver': ['svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga']
}

Grid Search instance

In [9]:
grid = GridSearchCV(ridge, params, cv= 5, scoring= 'neg_mean_absolute_error')

Fit the grid search object

In [10]:
grid.fit(X, y)

GridSearchCV(cv=5, estimator=Ridge(),
             param_grid={'alpha': [1.0, 0.95, 0.9, 0.85, 0.8, 0.75],
                         'solver': ['svd', 'cholesky', 'lsqr', 'sparse_cg',
                                    'sag', 'saga']},
             scoring='neg_mean_absolute_error')

Get the best parameter combination and respective error

In [11]:
grid.best_params_

{'alpha': 0.9, 'solver': 'sag'}

The above is the best-performing parameter combination.

In [12]:
error_rid = -grid.best_score_
-grid.best_score_

3.869848435536432

Error as a percentage of range for the 2 models

In [14]:
print(f'Ordinary Least Squares model: {error_lr/ran:.3f}.\nOLS with l2 regularization model: {error_rid/ran:.3f}.')

Ordinary Least Squares model: 0.094.
OLS with l2 regularization model: 0.086.


The least-squares model with l2 regularization outperforms the OLS model, the former model's predicted value on avarage differs by ~3870 USD from the actual values, meanwhile, the latter model is off by ~4250 USD.