# 1. Ridge Regression (L2 penality)
$$
L_{r_ridge}(w, b) = \sum_{i = 1}^{N}{(y_{i} - (w.x_{i}+b))^{2}} + \alpha\sum_{i = 1}^{N}{w_{j}^{2}}
$$

    * Ridge Regression learns (w,b) using the least square cretirion and adds a penality for large variabtion of w parameters
    * Once the parameters are learned, the ridge regression prediction formula as the ordinary least square critirion
    * The addition of parameters penalty is called regularization, to prevent overfitting by reducing complexity of the model
    * alpha is the parameter cotrole of regularization
    * Higher aplpha  ===> more regularization and simpler models 

# 2. Coding side of Ridge Regression (sklearn)

In [46]:
from sklearn.model_selection import train_test_split
from sklearn import datasets

"""
Dataset load and split
"""
X_diabetes, y_diabetes = datasets.load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X_diabetes, y_diabetes, random_state=0)


"""
##########################
"""

"""
Feature Normalization
"""
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)


"""
#########################
"""


"""
Ridge Regression runing
"""

from sklearn.linear_model import Ridge

params = {
    
    "alpha":20.0,  # defaut = 1.0
    "fit_intercept":True, # defaut = True
    "normalize":True, # normalize data 
    "max_iter": 15000, # defaut value is 15000
    "tol":1e-3, #default=1e-3 precision of solutiuon of w vector found by solver
    "solver":"auto", #{‘auto’ : according to data, ‘svd’ : Decomposition, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’, ‘lbfgs’}
}

l_ridge = Ridge(alpha=20.0,
                fit_intercept=True).fit(X_train_scaled, y_train)

"""
############################
"""


print("Model parameters : ")
print("#"*20)
print("Ridg Regression intercept param : {} ".format(l_ridge.fit_intercept))
print("-"*100)
print("Ridg Rehression coef : \n{}".format(l_ridge.coef_))
print("-"*100)
print("number of non zero features is {} ".format(sum(l_ridge.coef_ != 0)))
print()
print()




"""
Model evaluation
"""

from sklearn.metrics import r2_score, mean_squared_error

print("Model evaluation :")
print("#"*20)
y_predict_train = l_ridge.predict(X_train)
y_predict_test = l_ridge.predict(X_test)
print("R-squared error for test model is {:.3f} ".format(r2_score(y_predict_test, y_test)))



Model parameters : 
####################
Ridg Regression intercept param : True 
----------------------------------------------------------------------------------------------------
Ridg Rehression coef : 
[  7.11985565  -9.7217072   67.82623715  43.44005892   3.02678554
  -3.1763564  -35.13925901  29.74248891  57.22782298  25.13511817]
----------------------------------------------------------------------------------------------------
number of non zero features is 10 


Model evaluation :
####################
R-squared error for test model is -172.238 
