### Ridge Regression. 
Ridge regresssion is also a regression model so it uses the same formula as the OLS regression model. In ridge regression though, the coefficients (w) are chosen not only so they can predict well on traning data, but also to fit an additional constraint. We also want  the magnitude of coefficients to be as small as possible; inother words, all entries of w should be close to zero. Intuitively, this means each feature should have as little effect on the outcome as possible (which translates to having a small slope), while still predicting well. This is an example of regularization. Regularization means expliciting a model to avoid overfitting. Ridge regression uses L2 regularization. 

### Let's try it out.

In [1]:
from sklearn.linear_model import Ridge
import pandas as pd
from sklearn.model_selection import train_test_split
import numpy as np

### Data Processing

In [2]:
# Get the Melbourn housing data.
melb_data = pd.read_csv('../datasets/melb_data.csv')
# Drop missing values. 
melb_data.dropna(axis = 0, inplace = True)
# Get the prices. 
y = melb_data['Price']
# Get the numerical features
X = melb_data.select_dtypes(exclude = 'object').drop(['Price'], axis = 1, )
# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42)


### Training and Evaluation.

In [3]:
ridge = Ridge().fit(X_train, y_train) # Instantiate the model and perform the fit step.
# Evaluate the model.
print(f'Training R^2 Score : {ridge.score(X_train, y_train):.2f}' ) 
print(f'Test R^2 Score : {ridge.score(X_test, y_test):.2f}')

Training R^2 Score : 0.59
Test R^2 Score : 0.64


The training and test R<sup>2</sup> scores are the same for the LinearRegression model and the Ridge regression model. 

### GPT's EXPLANATION.

The `Ridge` model in scikit-learn is a type of linear regression model that includes regularization. Regularization is a technique used to prevent overfitting by adding a penalty to the size of the coefficients. Ridge regression is also known as L2 regularization.

Here are the key aspects of the `Ridge` model:


1. **Regularization Strength (`alpha`)**: The parameter `alpha` controls the amount of regularization applied to the model. A higher value of `alpha` increases the regularization strength, which can help reduce overfitting but may also lead to underfitting. Conversely, a lower value of `alpha` reduces the regularization effect, making the model closer to ordinary least squares regression.

2. **Implementation**: The `Ridge` model can be implemented in scikit-learn as follows:

   ```python
   from sklearn.linear_model import Ridge
   
   # Create a Ridge regression model
   ridge = Ridge(alpha=1.0)
   
   # Fit the model to the data
   ridge.fit(X_train, y_train)
   
   # Make predictions
   y_pred = ridge.predict(X_test)
   ```

3. **Attributes**:
   - `coef_`: The coefficients (weights) of the linear model.
   - `intercept_`: The intercept of the linear model.

4. **Methods**:
   - `fit(X, y)`: Fit the model to the training data.
   - `predict(X)`: Predict target values for new data.
   - `score(X, y)`: Return the coefficient of determination \( R^2 \) of the prediction.

5. **Advantages**:
   - Can handle multicollinearity (correlated features) better than ordinary least squares regression.
   - Helps in preventing overfitting by introducing a penalty for large coefficients.

6. **Limitations**:
   - Requires tuning of the regularization parameter `alpha`.
   - May not perform well if the true relationship between the features and target is highly non-linear.

In summary, `Ridge` is a powerful tool for linear regression that incorporates regularization to improve generalization and robustness, particularly in the presence of multicollinearity among features.

### Lasso Regression.
An alternative for regularizing linear regression is `Lasso`. As with ridge regression, using Lasso also restricts coefficients to be close to zero, but in a slightly different way, called L1 regularization. The consequences of L1 regularization is that when using the lasso, some coefficients are <i>exactly</i> zero.  This means some features are enirely ignored by the model. This can be seen as a form of automatic feature selection. Having some coefficients be exactly zero often makes a model easier to interprete, and can reveal the most important features of your model. 

In [4]:
from sklearn.linear_model import Lasso 

lasso = Lasso(alpha = 0.1, max_iter = 10000).fit(X_train, y_train) # Instantiate and fit the model. 
print(f'Training R^2 Score : {lasso.score(X_train, y_train):.2f}')
print(f'Test R^2 Score : {lasso.score(X_test, y_test):.2f}')
print(f'Number of feature used {np.sum(lasso.coef_ != 0)}') # Sum Up the number of Non-Zero coefficients.

Training R^2 Score : 0.59
Test R^2 Score : 0.64
Number of feature used 12


It achieved the same metrics as the Ridge model. I hink it didn't drop anything because the number of features is low. 


`Also note that Ridge regression is usually the first choice between the 2. `