# Machine Learning—Linear, Ridge, Lasso, and Elastic Net Regression Using Standard Scaler to Pre-Process Data

Now, I will pre-process the data using standard scaler and then try all of the regression models again. Skip to the bottom to see a table with all the final results.

In [8]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import Ridge, Lasso, ElasticNet
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.preprocessing import StandardScaler

In [39]:
# Bring in the data
%store -r X
%store -r y
%store -r X_train
%store -r X_test
%store -r y_train
%store -r y_test
%store -r table1

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state=1)

### Scale the numerical data

In [11]:
# Instantiate a Standard Scaler and then scale the data for both training and test sets
scaler = StandardScaler()
X_train[['age','bmi']] = scaler.fit_transform(X_train[['age','bmi']])
X_test[['age','bmi']] = scaler.transform(X_test[['age','bmi']])

### Linear Regression

First, I will try a vanilla linear regression.

In [12]:
# Fit the training data to the model.
model1 = LinearRegression()
model1.fit(X_train, y_train)

LinearRegression()

In [13]:
# Get the coefficient of determination for training and test data.
model1_r2_train = model1.score(X_train, y_train)
model1_r2_test = model1.score(X_test, y_test)

In [14]:
# Make predictions and get the RMSE for training and test data.
model1_y_pred_train = model1.predict(X_train)
model1_y_pred_test = model1.predict(X_test)
model1_RMSE_train = np.sqrt(mean_squared_error(y_train, model1_y_pred_train))
model1_RMSE_test = np.sqrt(mean_squared_error(y_test, model1_y_pred_test))

In [15]:
# View the coefficient of determination and RMSE values.
print('R squared for training and test data:')
print(model1_r2_train, model1_r2_test)
print('RMSE for training and test data:')
print(model1_RMSE_train, model1_RMSE_test)

R squared for training and test data:
0.7562789441467117 0.734275077692753
RMSE for training and test data:
6055.905613168893 5978.637219305068


In [16]:
# View the coefficients of each term.
print(X_train.columns)
print(model1.coef_)

Index(['age', 'bmi', 'sex_male', 'smoker_yes', 'children_1', 'children_2',
       'children_3', 'children_4', 'children_5', 'region_northwest',
       'region_southeast', 'region_southwest'],
      dtype='object')
[ 3524.88242084  1978.58156046  -268.42233783 24067.87704993
  -103.80379432  1311.19532811   682.03649306  2127.71041988
  1057.26082258  -251.34733476  -855.07435369  -654.86702882]


Age, bmi, and smoker seem to be the largest predictors of charges.

### Ridge Regression

Now, I will see if regularization helps improve the model and reduces overfitting.

In [17]:
# Fit the training data to the model.
model2 = Ridge()
model2.fit(X_train, y_train)

Ridge()

In [18]:
# Get the coefficient of determination for training and test data.
model2_r2_train = model2.score(X_train, y_train)
model2_r2_test = model2.score(X_test, y_test)

In [19]:
# Make predictions and get the RMSE for training and test data.
model2_y_pred_train = model2.predict(X_train)
model2_y_pred_test = model2.predict(X_test)
model2_RMSE_train = np.sqrt(mean_squared_error(y_train, model2_y_pred_train))
model2_RMSE_test = np.sqrt(mean_squared_error(y_test, model2_y_pred_test))

In [20]:
# View the coefficient of determination and RMSE values.
print('R squared for training and test data:')
print(model2_r2_train, model2_r2_test)
print('RMSE for training and test data:')
print(model2_RMSE_train, model2_RMSE_test)

R squared for training and test data:
0.7562524739550629 0.7343514052667499
RMSE for training and test data:
6056.234465818388 5977.778497350744


Ridge regularization didn't help too much.

### Ridge Regression with different alpha

In [21]:
# Create a grid search
gs1 = GridSearchCV(model2, param_grid={'alpha':[0.01, 0.1, 1, 10, 100, 1000]})
gs1.fit(X_train, y_train)
print(gs1.best_params_)

{'alpha': 1}


It looks like the ideal alpha is 1, which is what I just tested.

### Lasso Regression

Now, I will try Lasso regression.

In [22]:
# Fit the training data to the model
model3 = Lasso(alpha=0.1)
model3.fit(X_train, y_train)

Lasso(alpha=0.1)

In [23]:
# Get the coefficient of determination for the training and test data.
model3_r2_train = model3.score(X_train, y_train)
model3_r2_test = model3.score(X_test, y_test)

In [24]:
# Make predictions and get the RMSE for training and test data.
model3_y_pred_train = model3.predict(X_train)
model3_y_pred_test = model3.predict(X_test)
model3_RMSE_train = np.sqrt(mean_squared_error(y_train, model3_y_pred_train))
model3_RMSE_test = np.sqrt(mean_squared_error(y_test, model3_y_pred_test))

In [25]:
# View the coefficient of determination and RMSE values.
print('R squared for training and test data:')
print(model3_r2_train, model3_r2_test)
print('RMSE for training and test data:')
print(model3_RMSE_train, model3_RMSE_test)

R squared for training and test data:
0.7562789276497388 0.7342648965102131
RMSE for training and test data:
6055.9058181247365 5978.751753187141


### Lasso Regression with different alpha

In [26]:
# Create a grid search
gs2 = GridSearchCV(model3, param_grid={'alpha':[0.01, 0.1, 1, 10, 100, 1000]})
gs2.fit(X_train, y_train)
print(gs2.best_params_)

{'alpha': 100}


It looks like the ideal alpha is 1, which is what I just tested.

### Elastic Net Regression

In [27]:
# Fit the training data to the model
model4 = ElasticNet()
model4.fit(X_train, y_train)

ElasticNet()

In [28]:
# Get the coefficient of determination for the training and test data.
model4_r2_train = model4.score(X_train, y_train)
model4_r2_test = model4.score(X_test, y_test)

In [29]:
# Make predictions and get the RMSE for training and test data.
model4_y_pred_train = model4.predict(X_train)
model4_y_pred_test = model4.predict(X_test)
model4_RMSE_train = np.sqrt(mean_squared_error(y_train, model4_y_pred_train))
model4_RMSE_test = np.sqrt(mean_squared_error(y_test, model4_y_pred_test))

In [30]:
# View the coefficient of determination and RMSE values.
print('R squared for training and test data:')
print(model4_r2_train, model4_r2_test)
print('RMSE for training and test data:')
print(model4_RMSE_train, model4_RMSE_test)

R squared for training and test data:
0.3844246694287381 0.38624759753250537
RMSE for training and test data:
9624.389169197126 9086.207691440462


This model does much worse than the others.

### Elastic Net Regression with different alpha and L1 ratios

In [31]:
# Create a grid search
gs3 = GridSearchCV(model4, param_grid={'alpha':[0.01, 0.1, 1, 10, 100, 1000], 'l1_ratio':[0.25, 0.5, 0.75]})
gs3.fit(X_train, y_train)
print(gs3.best_params_)

{'alpha': 0.01, 'l1_ratio': 0.75}


In [32]:
# Fit the training data to the model
model5 = ElasticNet(alpha=0.01, l1_ratio=0.75)
model5.fit(X_train, y_train)

ElasticNet(alpha=0.01, l1_ratio=0.75)

In [33]:
# Get the coefficient of determination for the training and test data.
model5_r2_train = model5.score(X_train, y_train)
model5_r2_test = model5.score(X_test, y_test)

In [34]:
# Make predictions and get the RMSE for training and test data.
model5_y_pred_train = model5.predict(X_train)
model5_y_pred_test = model5.predict(X_test)
model5_RMSE_train = np.sqrt(mean_squared_error(y_train, model5_y_pred_train))
model5_RMSE_test = np.sqrt(mean_squared_error(y_test, model5_y_pred_test))

In [35]:
# View the coefficient of determination and RMSE values.
print('R squared for training and test data:')
print(model5_r2_train, model5_r2_test)
print('RMSE for training and test data:')
print(model5_RMSE_train, model5_RMSE_test)

R squared for training and test data:
0.756118539350845 0.73439280523719
RMSE for training and test data:
6057.898129822696 5977.3126761477315


### Hyperparameter table

In [38]:
data2 = [['Standard', 'Linear Regression', 'NA', 'NA', model1_r2_train, model1_r2_test, model1_RMSE_train, model1_RMSE_test],
        ['Standard', 'Ridge Regression', 1, 'NA', model2_r2_train, model2_r2_test, model2_RMSE_train, model2_RMSE_test],
        ['Standard', 'Lasso Regression', 1, 'NA', model3_r2_train, model3_r2_test, model3_RMSE_train, model3_RMSE_test],
        ['Standard', 'Elastic Net Regression', 1, 0.5, model4_r2_train, model4_r2_test, model4_RMSE_train, model4_RMSE_test],
        ['Standard', 'Elastic Net Regression', 0.01, 0.75, model5_r2_train, model5_r2_test, model5_RMSE_train, model5_RMSE_test]]

table2 = pd.DataFrame(data2, columns = ['Scaler','Model', 'Alpha', 'L1 Ratio', 'Training R2', 'Test R2', 'Training RMSE', 'Test RMSE'])
table2

Unnamed: 0,Scaler,Model,Alpha,L1 Ratio,Training R2,Test R2,Training RMSE,Test RMSE
0,Standard,Linear Regression,,,0.756279,0.734275,6055.905613,5978.637219
1,Standard,Ridge Regression,1.0,,0.756252,0.734351,6056.234466,5977.778497
2,Standard,Lasso Regression,1.0,,0.756279,0.734265,6055.905818,5978.751753
3,Standard,Elastic Net Regression,1.0,0.5,0.384425,0.386248,9624.389169,9086.207691
4,Standard,Elastic Net Regression,0.01,0.75,0.756119,0.734393,6057.89813,5977.312676


Elastic Net regression with alpha set to 0.01 and L1 ratio set to 0.75 seems to perform the best.

### Combine both hyperparameter tables

In [40]:
pd.concat([table1,table2], axis=0)

Unnamed: 0,Scaler,Model,Alpha,L1 Ratio,Training R2,Test R2,Training RMSE,Test RMSE
0,MinMax,Linear Regression,,,0.756279,0.734275,6055.905613,5978.637219
1,MinMax,Ridge Regression,1.0,,0.75621,0.733789,6056.768114,5984.108551
2,MinMax,Lasso Regression,1.0,,0.756279,0.734261,6055.905868,5978.793975
3,MinMax,Elastic Net Regression,1.0,0.5,0.311151,0.300776,10181.094782,9698.27154
4,MinMax,Elastic Net Regression,0.01,0.75,0.755874,0.732893,6060.939552,5994.166537
0,Standard,Linear Regression,,,0.756279,0.734275,6055.905613,5978.637219
1,Standard,Ridge Regression,1.0,,0.756252,0.734351,6056.234466,5977.778497
2,Standard,Lasso Regression,1.0,,0.756279,0.734265,6055.905818,5978.751753
3,Standard,Elastic Net Regression,1.0,0.5,0.384425,0.386248,9624.389169,9086.207691
4,Standard,Elastic Net Regression,0.01,0.75,0.756119,0.734393,6057.89813,5977.312676


Elastic Net regression with alpha set to 0.01 and L1 ratio set to 0.75 seems to perform the best overall.

Whether or not the patient is a smoker is by far the biggest indicator of charges. Smokers are charged more.