## Ridge Regression is a popular type of regularized linear regression that includes an L2 penalty. This has the effect of shrinking the coefficients for those input variables that do not contribute much to the prediction task.

### l2_penalty = sum j=0 to p beta_j^2

In [9]:
# import dependencies
import sklearn.datasets
import pandas as pd
from numpy import mean
from numpy import std
from numpy import absolute
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold
from sklearn.linear_model import Ridge
from pandas import read_csv
from matplotlib import pyplot
# load dataset
house_pricing_dataset= sklearn.datasets.load_boston() 
#converting array into a dataframe and adding columns name
house_pricing = pd.DataFrame(house_pricing_dataset.data,columns=house_pricing_dataset.feature_names)
# add price column which is thetarget
house_pricing['price']=house_pricing_dataset.target
# shape of the dataset
print(house_pricing.shape)
# print the first 5 rows
print(house_pricing.head())

(506, 14)
      CRIM    ZN  INDUS  CHAS    NOX     RM   AGE     DIS  RAD    TAX  \
0  0.00632  18.0   2.31   0.0  0.538  6.575  65.2  4.0900  1.0  296.0   
1  0.02731   0.0   7.07   0.0  0.469  6.421  78.9  4.9671  2.0  242.0   
2  0.02729   0.0   7.07   0.0  0.469  7.185  61.1  4.9671  2.0  242.0   
3  0.03237   0.0   2.18   0.0  0.458  6.998  45.8  6.0622  3.0  222.0   
4  0.06905   0.0   2.18   0.0  0.458  7.147  54.2  6.0622  3.0  222.0   

   PTRATIO       B  LSTAT  price  
0     15.3  396.90   4.98   24.0  
1     17.8  396.90   9.14   21.6  
2     17.8  392.83   4.03   34.7  
3     18.7  394.63   2.94   33.4  
4     18.7  396.90   5.33   36.2  


### The scikit-learn Python machine learning library provides an implementation of the Ridge Regression algorithm via the Ridge class.

In [16]:
# we are going to evaluate Ridge Regression model on the housing dataset using repeated 10-fold cross-validation
data = house_pricing.values
X, y = data[:, :-1], data[:, -1]
# define model
model = Ridge(alpha=1.0)
# define model evaluation method
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate model
scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)
# force scores to be positive
scores = absolute(scores)
print('Mean MAE: %.3f (%.3f)' % (mean(scores), std(scores)))

Mean MAE: 3.382 (0.519)


### Let use the Ridge Regression as our final model and make predictions on new data.

In [25]:
# fit model
model.fit(X, y)
# define new data
row = [0.00642,19.00,2.320,0,0.5381,6.5750,64.20,4.0910,1,286.0,15.30,399.90,4.99]
# make a prediction
predictions = model.predict([row])
# summarize prediction
print('Predicted: %.3f' % predictions)

Predicted: 30.456


### Tuning Ridge Hyperparameters
It is hard know if the default hyperparameters of alpha=1.0 is appropriate for our dataset so Instead, it is good practice to test a suite of different configurations and discover what works best for our dataset.

One approach would be to grid search alpha values from perhaps 1e-5 to 100 on a log scale and discover what works best for a dataset. Another approach would be to test values between 0.0 and 1.0 with a grid separation of 0.01. We will try the latter in this case.

The work below demonstrates this using the GridSearchCV class with a grid of values I have defined.

In [31]:
from sklearn.model_selection import GridSearchCV
from numpy import arange
model = Ridge()
# define model evaluation method
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
# defining grid
grid = dict()
grid['alpha'] = arange(0, 1, 0.01)
# defining search
search = GridSearchCV(model, grid, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)
# perform the search
results = search.fit(X, y)
# summarize
print('MAE: %.3f' % results.best_score_)
print('Config: %s' % results.best_params_)
print(""" The negative  comes as a results of the library assigning MAE negative for optimization purposes.
Also the model assigned an alpha weight of 0.51 to the penalty.
""")

MAE: -3.379
Config: {'alpha': 0.51}
 The negative  comes as a results of the library assigning MAE negative for optimization purposes.
Also the model assigned an alpha weight of 0.51 to the penalty.

