# Ridge Regression — Theory & Interview Q&A

Ridge Regression is a regularized linear regression technique that adds an L2 penalty to the loss function, helping to reduce overfitting and handle multicollinearity.

| Aspect                | Details                                                                 |
|-----------------------|------------------------------------------------------------------------|
| **Definition**        | Linear regression with L2 regularization (penalizes large coefficients).|
| **Equation**          | Minimize: RSS + α * Σβⱼ²                                                |
| **Use Cases**         | High-dimensional data, multicollinearity, regularization                |
| **Assumptions**       | Same as linear regression                                               |
| **Pros**              | Reduces overfitting, handles multicollinearity, stable estimates        |
| **Cons**              | All coefficients shrunk, less interpretable, α needs tuning             |
| **Key Parameters**    | Regularization strength (α)                                             |
| **Evaluation Metrics**| MSE, RMSE, R² Score                                                     |

## Interview Q&A

**Q1: What is Ridge Regression?**  
A: It is linear regression with L2 regularization to penalize large coefficients.

**Q2: When should you use Ridge Regression?**  
A: When predictors are highly correlated or the model is overfitting.

**Q3: How does the regularization parameter α affect the model?**  
A: Higher α increases penalty, shrinking coefficients more.

**Q4: What is the difference between Ridge and Lasso Regression?**  
A: Ridge uses L2 penalty (shrinks coefficients), Lasso uses L1 (can set coefficients to zero).

**Q5: How do you select α?**  
A: Use cross-validation to find the optimal value.

**Q6: Can Ridge Regression perform feature selection?**  
A: No, it only shrinks coefficients but does not eliminate them.

# Ridge Regression: Step-by-Step Explanation
This notebook demonstrates how to perform Ridge Regression using scikit-learn. Below are the detailed steps involved:
1. **Import Libraries**: Import all necessary libraries for data manipulation, visualization, and machine learning.
2. **Load Dataset**: Load the California housing dataset and inspect its structure.
3. **Prepare Data**: Convert the dataset into a DataFrame, separate features (`X`) and target (`y`).
4. **Split Data**: Divide the data into training and testing sets to evaluate model performance.
5. **Preprocessing**: Standardize features to ensure all variables contribute equally to the model.
6. **Build Pipeline**: Create a pipeline that chains preprocessing and the Ridge regression model.
7. **Hyperparameter Tuning**: Use `GridSearchCV` to find the best values for model parameters (like `alpha` and `solver`).
8. **Train Model**: Fit the pipeline to the training data.
9. **Predict**: Use the trained model to predict target values for the test set.
10. **Evaluate**: Calculate metrics such as Mean Squared Error (MSE) and R² score to assess model performance.


In [10]:
## Ridge Regression 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.datasets import fetch_california_housing

In [11]:
data = fetch_california_housing()
data

{'data': array([[   8.3252    ,   41.        ,    6.98412698, ...,    2.55555556,
           37.88      , -122.23      ],
        [   8.3014    ,   21.        ,    6.23813708, ...,    2.10984183,
           37.86      , -122.22      ],
        [   7.2574    ,   52.        ,    8.28813559, ...,    2.80225989,
           37.85      , -122.24      ],
        ...,
        [   1.7       ,   17.        ,    5.20554273, ...,    2.3256351 ,
           39.43      , -121.22      ],
        [   1.8672    ,   18.        ,    5.32951289, ...,    2.12320917,
           39.43      , -121.32      ],
        [   2.3886    ,   16.        ,    5.25471698, ...,    2.61698113,
           39.37      , -121.24      ]]),
 'target': array([4.526, 3.585, 3.521, ..., 0.923, 0.847, 0.894]),
 'frame': None,
 'target_names': ['MedHouseVal'],
 'feature_names': ['MedInc',
  'HouseAge',
  'AveRooms',
  'AveBedrms',
  'Population',
  'AveOccup',
  'Latitude',
  'Longitude'],
 'DESCR': '.. _california_housing_dataset:\n

In [12]:
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X = df.drop('target', axis=1)
y = df['target']


In [13]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [14]:
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('ridge', Ridge())
])


## Hyperparameter tuning
from sklearn.model_selection import GridSearchCV

param_grid = {
    'ridge__alpha': [0.1, 1.0, 10.0],
    'ridge__solver': ['auto', 'svd', 'cholesky', 'lsqr', 'sag', 'saga']
}

grid_search = GridSearchCV(pipeline, param_grid, cv=5)



In [15]:
grid_search.fit(X_train, y_train)


0,1,2
,estimator,"Pipeline(step...e', Ridge())])"
,param_grid,"{'ridge__alpha': [0.1, 1.0, ...], 'ridge__solver': ['auto', 'svd', ...]}"
,scoring,
,n_jobs,
,refit,True
,cv,5
,verbose,0
,pre_dispatch,'2*n_jobs'
,error_score,
,return_train_score,False

0,1,2
,copy,True
,with_mean,True
,with_std,True

0,1,2
,alpha,0.1
,fit_intercept,True
,copy_X,True
,max_iter,
,tol,0.0001
,solver,'auto'
,positive,False
,random_state,


In [16]:
y_pred = grid_search.predict(X_test)


In [17]:
## Calculate MSE R score
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

In [18]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error, r2_score
import pandas as pd

# 1. Load data
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

# 2. Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 3. Pipeline with scaler and Ridge model
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('ridge', Ridge())
])

# 4. Parameter grid to tune alpha and solver
param_grid = {
    'ridge__alpha': [0.1, 1.0, 10.0],
    'ridge__solver': ['auto', 'svd', 'cholesky', 'lsqr', 'sag', 'saga']
}

# 5. GridSearchCV instantiation
grid_search = GridSearchCV(
    estimator=pipeline,
    param_grid=param_grid,
    scoring='r2',
    cv=5,
    n_jobs=-1,         # Parallelize if multiple cores available
    verbose=1          # Verbose output for monitoring progress
)

# 6. Fit GridSearchCV
grid_search.fit(X_train, y_train)

# 7. Results
print("Best parameters:", grid_search.best_params_)
print("Best CV R² score:", grid_search.best_score_)

# 8. Test set evaluation using best estimator
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Test MSE: {mse:.4f}")
print(f"Test R² Score: {r2:.4f}")


Fitting 5 folds for each of 18 candidates, totalling 90 fits
Best parameters: {'ridge__alpha': 0.1, 'ridge__solver': 'auto'}
Best CV R² score: 0.6114839657407327
Test MSE: 0.5559
Test R² Score: 0.5758
Best parameters: {'ridge__alpha': 0.1, 'ridge__solver': 'auto'}
Best CV R² score: 0.6114839657407327
Test MSE: 0.5559
Test R² Score: 0.5758
