<a href="https://colab.research.google.com/github/Raulespz/cross_validation/blob/main/Regularization_LASSO.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
Bias-variance tradeoff decomposes the model's Mean Square Error into two parts: Bias and Variance.

Bias- is a measure of how near the model is to the actual function we are trying to model. If your model has a high bias, the model is underfitting; this means you will do poorly on the training and test data, but the relative results will be similar.

Ways to improve bias include making the model more complex, adding higher order polynomials, obtaining more features or finding more data.

## Usefull INFORMATION ACCORDING TO OUR DATA TO USE THE LR, RIDGE LASSO HYPERPARAMENTERS DEPENDING IN OUR DATA:
Variance - is the average squared difference of each model you train relative to the average prediction of each model. If your model has high variance, the model will usually overfit the data; this means you will do well on the training data but not on the testing data.


You can improve variance by making the model less complex, i.e., lowering the order of the polynomial, obtaining more data  or using Reguliztion. There are 3 regulazation techniques discussed in this Module: Ridge, LASSO, and Elastic Net.

Ridge (L2 Regularization)

penalizes the size  magnitude of the regression coefficients by adding a squad term


enforces the coefficients to be lower, but not 0


minimizes irrelevant features and does not remove them


faster to train



LASSO (L1 Regularization)

penalizes the  absolute value of the coefficients


sets irrelevant features to 0


finds features you don't need



 Elastic Net (L1+L2 Regularization)

penalizes the size  magnitude of the regression and  absolute value of the coefficients


sets irrelevant features to 0 and enforces the coefficients to be lower



In [None]:
import os, pandas as pd, numpy as np, matplotlib.pyplot as plt, seaborn as sns


In [None]:
data = pd.read_csv('/content/X_Y_Sinusoid_Data.csv')

x_real = np.linspace(0, 1.0, 100)
y_real = np.sin(2 * np.pi * x_real)

data.head()

In [None]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

#Setup the polinomial features
degree = 20
pf = PolynomialFeatures(degree)
lr = LinearRegression()

x_data = data[['x']]
y_data = data['y']

#x_data
#y_data

# create the features and fit the model
x_poly = pf.fit_transform(x_data)
lr = lr.fit(x_poly, y_data)
y_pred = lr.predict(x_poly)

# Plot the result
plt.figure(figsize = (12, 8))
plt.plot(x_data, y_data, marker='o', ls='', label='data', alpha=1)
plt.plot(x_real, y_real, ls='--', label='real function')
plt.plot(x_data, y_pred, marker='^', alpha=.5, label='predictions with polynomial features')
plt.legend()
ax = plt.gca()
ax.set(xlabel='x data', ylabel='y data');

In [None]:
#Mute  the sklearn warning about regularization
import warnings
warnings.filterwarnings('ignore', module='sklearn')

from sklearn.linear_model import Ridge, Lasso

#The Ridge regression model
rr = Ridge(alpha=0.001)
rr = rr.fit(x_poly, y_data)
y_pred_rr = rr.predict(x_poly)

#The Lasso regression model
lassor = Lasso(alpha=0.0001)
lassor = lassor.fit(x_poly, y_data)
y_pred_lr = lassor.predict(x_poly)

#The plot of the predicted values
plt.figure(figsize = (12, 8))
plt.plot(x_data, y_data, marker='o', ls='', label='data')
plt.plot(x_real, y_real, ls='--', label='real function')
plt.plot(x_data, y_pred, marker='^', alpha=.5, label='predictions with polynomial features')
plt.plot(x_data, y_pred_rr, marker='^', alpha=.5, label='ridge regression')
plt.plot(x_data, y_pred_lr, marker='^', alpha=.5, label='lasso regression')

plt.legend()

ax = plt.gca()
ax.set(xlabel='x data', ylabel='y data');


In [None]:
# Let's look at the absolute value of coefficients for each model

coefficients = pd.DataFrame()
coefficients['linear regression'] = lr.coef_
coefficients['ridge regression'] = rr.coef_
coefficients['lasso regression'] = lassor.coef_
coefficients = coefficients.applymap(abs)

coefficients.describe()     # HUge difference in scale between non-regularized vs regularized expression

In [None]:
(coefficients>0).sum()

In [None]:
data = pd.read_csv('/content/Ames_Housing_Sales.csv')
len(data.columns)

In [None]:
#get_dummies will convert any columns that are of type object
data = pd.get_dummies(data, drop_first=True)
data.columns

In [None]:
from sklearn.model_selection import train_test_split

train, test = train_test_split(data, test_size=0.3, random_state=42)

In [None]:
# Create a list of columns that are not one hot encoded
mask = data.apply(lambda x: x.nunique())>2
num_cols = data.columns[mask]
num_cols

In [None]:
skew_limit = 0.75
skew_vals = train[num_cols].skew()

skew_cols = (skew_vals[skew_vals > skew_limit]
             .sort_values(ascending=False)
             .to_frame()
             .rename(columns={0:'Skew'}))
skew_cols

In [None]:
from sklearn.metrics import mean_squared_error

def rmse(ytrue, ypredicted):
  return np.sqrt(mean_squared_error(ytrue, ypredicted))


In [None]:
#Mute the settings with a copy warnings
#pd.options.mode.chained_assignment = None

#for col in skew_cols.index.tolist():
#  if col == 'SalesPrice':
#    continue
#  train[col] = np.loglp(train[col])
#  test[col] = test[col].apply(np.loglp)

In [None]:
feature_cols = [x for x in train.columns if x != 'SalePrice']
x_train = train[feature_cols]
y_train = train['SalePrice']

x_test = test[feature_cols]
y_test = test['SalePrice']

In [None]:
from sklearn.linear_model import LinearRegression

linearRegression = LinearRegression().fit(x_train, y_train)

linearRegression_rmse = rmse(y_test, linearRegression.predict(x_test))

print(linearRegression_rmse)

In [None]:
f = plt.figure(figsize=(6, 6))
ax = plt.axes()

ax.plot(y_test, linearRegression.predict(x_test),
        marker='o', ls='', ms=3.0)

lim = (0, y_test.max())

ax.set(xlabel='Actual Price',
       ylabel='Predicted Price',
       xlim=lim,
       ylim=lim,
       title='Linear Regression Result');

In [None]:
# Ridge regression uses L2 normalization to reduce the magnitude of the coeeficiens. this can be helpful
# in situations where there is high variance. the regularization fucntions in sklearn each contain versions that have cross-validation built in.

from sklearn.linear_model import RidgeCV

alphas = [0.005, 0.05, 0.1, 0.3, 1, 3, 5, 10, 15, 30, 80]

ridgeCV = RidgeCV(alphas = alphas,
                  cv=4).fit(x_train, y_train)

ridgeCV_rmse = rmse(y_test, ridgeCV.predict(x_test))

print(ridgeCV.alpha_, ridgeCV_rmse)

10.0 33522.13670891972


In [None]:
from sklearn.linear_model import LassoCV
import numpy as np

alphas2 = np.array([0.005, 0.05, 0.1, 0.3, 1, 3, 5, 10, 15, 30, 80, 100, 120, 140])

LassoCV = LassoCV(alphas = alphas2,
                  max_iter=500, cv=4).fit(x_train, y_train)

LassoCV_rmse = rmse(y_test, LassoCV.predict(x_test))

print(LassoCV.alpha_, LassoCV_rmse)

80.0 51020.47779928883


In [None]:
from sklearn.linear_model import ElasticNetCV

l1_ratios = np.linspace(0.1, 0.9, 9)

ElasticNetCV  = ElasticNetCV (alphas = alphas2,
                  l1_ratio=l1_ratios, max_iter=1000).fit(x_train, y_train)

elasticNetCV_rmse = rmse(y_test, ElasticNetCV.predict(x_test))

print(ElasticNetCV.alpha_, ElasticNetCV.l1_ratio_, elasticNetCV_rmse)

0.1 0.9 33537.54979095956
