### Q1. What is Ridge Regression, and how does it differ from ordinary least squares regression?

**Ridge Regression** is a type of linear regression that addresses overfitting by adding a penalty to the size of the coefficients in the model. This is done by adding a regularization term, which is the **sum of the squared values of the coefficients** (also known as the L2 norm) to the loss function. 

- **Ordinary Least Squares (OLS) Regression** minimizes only the sum of squared differences between predicted and actual values (i.e., it minimizes the error).
- **Ridge Regression** minimizes both the error **and** the size of the coefficients by adding the L2 penalty. This helps reduce model complexity and prevents overfitting when there are many correlated features or when the dataset has multicollinearity.

The key difference is that **Ridge Regression shrinks the coefficients**, reducing their magnitude, but does not set any of them to zero (unlike Lasso Regression). This means Ridge is good for preventing overfitting, but not for feature selection.

---

### Q2. What are the assumptions of Ridge Regression?

Ridge Regression shares many assumptions with OLS regression, but it’s more robust when some of these assumptions are violated:

1. **Linearity**: The relationship between the predictors and the response should be linear.
2. **Independence**: Observations in the dataset should be independent of each other.
3. **Homoscedasticity**: The variance of residuals (errors) should be constant across all levels of the independent variables.
4. **Multicollinearity**: While OLS regression assumes that multicollinearity (correlation between predictors) should be low, Ridge Regression can handle high multicollinearity. In fact, it's often used when multicollinearity is present.
5. **Normality of residuals**: The residuals (errors) should be normally distributed, although Ridge is less sensitive to this assumption compared to OLS.

---

### Q3. How do you select the value of the tuning parameter (lambda) in Ridge Regression?

The tuning parameter **lambda** (also called the regularization parameter) controls how much regularization is applied. A large value of lambda increases the penalty on the size of the coefficients, resulting in smaller coefficients, while a small lambda makes Ridge behave more like OLS.

**Selecting the right lambda** is crucial, and this is often done through:

1. **Cross-validation**: The most common method. You split the data into training and validation sets, train the model with different values of lambda, and choose the lambda that results in the best performance on the validation set.
   
2. **Grid Search**: You create a grid of lambda values, then train the model using cross-validation for each lambda. The lambda that produces the lowest error on the validation set is selected.
   
3. **Automated algorithms**: Some libraries, like scikit-learn, provide functions (e.g., `RidgeCV`) that automatically find the best lambda using cross-validation.

---

### Q4. Can Ridge Regression be used for feature selection? If yes, how?

Ridge Regression **cannot directly be used for feature selection** because, unlike Lasso, Ridge does not set any coefficients exactly to zero. It shrinks the coefficients, but it does not eliminate them completely.

However, it can still **indirectly help in feature selection** by:

1. **Shrinking the coefficients**: If a feature’s coefficient becomes very small after applying Ridge, it indicates that the feature is less important.
2. **Combining Ridge with other methods**: In practice, Ridge can be combined with other techniques (like recursive feature elimination or feature importance ranking) to identify less relevant features.

But for direct and aggressive feature selection, Lasso Regression is preferred because it forces some coefficients to zero, effectively removing those features from the model.