# Regularization (Ridge - LASSO - ElasticNet)
In mathematics, statistics, finance, computer science, particularly in machine learning and inverse problems, regularization is the process of adding information in order to solve an ill-posed problem or to prevent overfitting.  Regularization can be applied to objective functions in ill-posed optimization problems. The regularization term, or penalty, imposes a cost on the optimization function to make the optimal solution unique.  Independent of the problem or model, there is always a data term, that corresponds to a likelihood of the measurement and a regularization term that corresponds to a prior. By combining both using Bayesian statistics, one can compute a posterior, that includes both information sources and therefore stabilizes the estimation process. By trading off both objectives, one choses to be more addictive to the data or to enforce generalization (to prevent overfitting). There is a whole research branch dealing with all possible regularizations. The work flow usually is, that one tries a specific regularization and then figures out the probability density that corresponds to that regularization to justify the choice. It can also be physically motivated by common sense or intuition, which is more difficult.  In machine learning, the data term corresponds to the training data and the regularization is either the choice of the model or modifications to the algorithm. It is always intended to reduce the generalization error, i.e. the error score with the trained model on the evaluation set and not the training data.  One of the earliest uses of regularization is related to the method of least squares. The figured out probability density is the gaussian distribution, which is now known under the name "Tikhonov regularization".



###  Import all Necessary Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

###  Import the Data

In [None]:
df= pd.read_csv('/kaggle/input/real-estate-price-prediction/Real estate.csv')

###  Data Overview

In [None]:
df.head()

In [None]:
df.shape

In [None]:
df.info()

In [None]:
df.describe()

###  EDA

In [None]:
sns.pairplot(df)

In [None]:
plt.figure(figsize=(8,5))
sns.displot(df['Y house price of unit area'] , bins=30 , kde=True )

In [None]:
sns.heatmap(df.corr(), annot=True,cmap='RdYlGn')

In [None]:
plt.figure(figsize=(16,10))
for i in range (len(df.columns)):
    plt.subplot(3,5,i+1)
    sns.boxplot(df[df.columns[i]])

###  Determine the Features & Target Variable (Lable)

In [None]:
X= df.drop('Y house price of unit area', axis=1)
y=df['Y house price of unit area']

###  Preprocessing (Polynomial Conversion)

In [None]:
from sklearn.preprocessing import PolynomialFeatures

In [None]:
polynomial_converter= PolynomialFeatures(degree=3, include_bias=False)

In [None]:
poly_features= polynomial_converter.fit_transform(X)

In [None]:
poly_features.shape

### Split the Data to Train & Test

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(poly_features, y, test_size=0.3, random_state=101)

In [None]:
print(X_train)
print(y_train)
print(X_test)
print(y_test)

###  Scaling the Data

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
scaler= StandardScaler()

In [None]:
scaler.fit(X_train)

In [None]:
X_train= scaler.transform(X_train)
X_test= scaler.transform(X_test)

In [None]:
print(X_train)
print(y_train)
print(X_test)
print(y_test)

###  Regularization

### 1: Ridge Regression

Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where independent variables are highly correlated. It has uses in fields including econometrics, chemistry, and engineering.

Ridge regression was developed as a possible solution to the imprecision of least square estimators when linear regression models have some multicollinear (highly correlated) independent variables—by creating a ridge regression estimator (RR). This provides a more precise ridge parameters estimate, as its variance and mean square estimator are often smaller than the least square estimators previously derived.

In [None]:
#Train the Model
from sklearn.linear_model import Ridge

In [None]:
ridge_model= Ridge(alpha=10)

In [None]:
ridge_model.fit(X_train, y_train)

In [None]:
#predict Test Data
y_pred= ridge_model.predict(X_test)

In [None]:
#Evaluating the Model
from sklearn.metrics import mean_absolute_error, mean_squared_error

MAE= mean_absolute_error(y_test, y_pred)
MSE= mean_squared_error(y_test, y_pred)
RMSE= np.sqrt(MSE)

In [None]:
pd.DataFrame([MAE, MSE, RMSE], index=['MAE', 'MSE', 'RMSE'], columns=['metrics'])

### Ridge Regression (Coosing an alpha value with Cross-Validation)

In [None]:
#Train the Model
from sklearn.linear_model import RidgeCV

In [None]:
ridge_cv_model=RidgeCV(alphas=(0.5, 1.0, 10.0), scoring='neg_mean_absolute_error')

In [None]:
ridge_cv_model.fit(X_train, y_train)

In [None]:
ridge_cv_model.alpha_

In [None]:
#Predicting Test Data
y_pred_ridge= ridge_cv_model.predict(X_test)

In [None]:
MAE_ridge= mean_absolute_error(y_test, y_pred_ridge)
MSE_ridge= mean_squared_error(y_test, y_pred_ridge)
RMSE_ridge= np.sqrt(MSE_ridge)

In [None]:
pd.DataFrame([MAE_ridge, MSE_ridge, RMSE_ridge], index=['MAE', 'MSE', 'RMSE'], columns=['Ridge Metrics'])

In [None]:
ridge_cv_model.coef_

### 2: Lasso Regression
Regression is a modeling task that involves predicting a numeric value given an input.
Linear regression is the standard algorithm for regression that assumes a linear relationship between inputs and the target variable. An extension to linear regression invokes adding penalties to the loss function during training that encourages simpler models that have smaller coefficient values. These extensions are referred to as regularized linear regression or penalized linear regression.

Lasso Regression is a popular type of regularized linear regression that includes an L1 penalty. This has the effect of shrinking the coefficients for those input variables that do not contribute much to the prediction task. This penalty allows some coefficient values to go to the value of zero, allowing input variables to be effectively removed from the model, providing a type of automatic feature selection. 

In [None]:
from sklearn.linear_model import LassoCV

In [None]:
lasso_cv_model= LassoCV(eps=0.01, n_alphas=100, cv=5)

In [None]:
lasso_cv_model.fit(X_train, y_train)

In [None]:
lasso_cv_model.alpha_

In [None]:
y_pred_lasso= lasso_cv_model.predict(X_test)

In [None]:
MAE_Lasso= mean_absolute_error(y_test, y_pred_lasso)
MSE_Lasso= mean_squared_error(y_test, y_pred_lasso)
RMSE_Lasso= np.sqrt(MSE_Lasso)

In [None]:
pd.DataFrame([MAE_Lasso, MSE_Lasso, RMSE_Lasso], index=['MAE', 'MSE', 'RMSE'], columns=['Lasso Metrics'])

In [None]:
lasso_cv_model.coef_

### 3: Elastic Net
In statistics and, in particular, in the fitting of linear or logistic regression models, the elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods.

In [None]:
from sklearn.linear_model import ElasticNetCV

In [None]:
elastic_model= ElasticNetCV(l1_ratio=[0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1],cv=5, max_iter=150000)

In [None]:
elastic_model.fit(X_train, y_train)

In [None]:
elastic_model.l1_ratio_

In [None]:
y_pred_elastic=elastic_model.predict(X_test)

In [None]:
MAE_Elastic= mean_absolute_error(y_test, y_pred_elastic)
MSE_Elastic= mean_squared_error(y_test, y_pred_elastic)
RMSE_Elastic= np.sqrt(MSE_Elastic)

In [None]:
pd.DataFrame([MAE_Elastic, MSE_Elastic, RMSE_Elastic], index=['MAE', 'MSE', 'RMSE'], columns=['Elastic Metrics'])

In [None]:
elastic_model.coef_