# Regression and Model Evaluation Questions

### Q1. Explain the concept of R-squared in linear regression models. How is it calculated, and what does it represent?
R-squared, also known as the coefficient of determination, is a statistical measure that represents the proportion of the variance for a dependent variable that is explained by the independent variable(s) in a regression model. It is calculated using the formula:

$R^2 = 1 - \frac{SS_{\text{residual}}}{SS_{\text{total}}}$

Where:
- $SS_{\text{residual}}$ is the sum of squared residuals (the difference between observed and predicted values).
- $SS_{\text{total}}$ is the total sum of squares (the difference between observed values and their mean).

**Interpretation**: 
- An R-squared of 1 indicates that the model explains all the variability in the target variable.
- An R-squared of 0 indicates that the model explains none of the variability.

---

### Q2. Define adjusted R-squared and explain how it differs from the regular R-squared.
Adjusted R-squared adjusts the R-squared value based on the number of independent variables in the model. It accounts for the fact that adding more predictors can artificially inflate R-squared. The formula for adjusted R-squared is:

$R_{\text{adj}}^2 = 1 - \left( \frac{(1 - R^2)(n - 1)}{n - p - 1} \right)$

Where:
- $n$ is the number of observations.
- $p$ is the number of predictors.

**Difference from R-squared**: 
- R-squared can increase with more variables, even if those variables do not contribute meaningfully to the model.
- Adjusted R-squared penalizes for adding non-significant predictors, so it only increases when the new variable improves the model.

---

### Q3. When is it more appropriate to use adjusted R-squared?
Adjusted R-squared is more appropriate when you are comparing models with different numbers of predictors, as it accounts for the complexity of the model. It ensures that only predictors that improve the model’s explanatory power will lead to an increase in adjusted R-squared.

---

### Q4. What are RMSE, MSE, and MAE in the context of regression analysis? How are these metrics calculated, and what do they represent?
- **MSE (Mean Squared Error)**: The average of the squared differences between the actual and predicted values. It is calculated as:

$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$

- **RMSE (Root Mean Squared Error)**: The square root of MSE, which brings the error back to the original scale of the target variable:

$RMSE = \sqrt{MSE}$

- **MAE (Mean Absolute Error)**: The average of the absolute differences between actual and predicted values:

$MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$

**Representations**:
- RMSE and MSE give higher weight to larger errors due to the squaring of residuals.
- MAE gives equal weight to all errors.

---

### Q5. Discuss the advantages and disadvantages of using RMSE, MSE, and MAE as evaluation metrics in regression analysis.
- **RMSE**:
  - **Advantages**: More sensitive to large errors, which can be useful when large errors are especially problematic.
  - **Disadvantages**: Sensitive to outliers, which can distort the evaluation of the model.
  
- **MSE**:
  - **Advantages**: Highlights larger errors more due to the squaring process.
  - **Disadvantages**: Also sensitive to outliers and may not be as interpretable as RMSE due to being in squared units.

- **MAE**:
  - **Advantages**: Provides a more interpretable error in the original unit of the target variable; less sensitive to outliers.
  - **Disadvantages**: Does not emphasize larger errors, which might not always be desirable.

---

### Q6. Explain the concept of Lasso regularization. How does it differ from Ridge regularization, and when is it more appropriate to use?
**Lasso (Least Absolute Shrinkage and Selection Operator)** regularization adds a penalty equal to the absolute value of the magnitude of coefficients to the loss function:

$\text{Lasso penalty} = \lambda \sum_{i=1}^{n} |\beta_i|$

Lasso can shrink some coefficients to exactly zero, effectively performing feature selection. 

**Ridge regularization** adds a penalty equal to the square of the coefficients:

$\text{Ridge penalty} = \lambda \sum_{i=1}^{n} \beta_i^2$

**Difference**:
- Lasso performs feature selection by forcing some coefficients to zero.
- Ridge shrinks coefficients but does not set them to zero.

**Appropriate use**:
- Use Lasso when you expect only a subset of features to be important.
- Use Ridge when you believe all predictors are relevant but require shrinkage to prevent overfitting.

---

### Q7. How do regularized linear models help to prevent overfitting in machine learning? Provide an example to illustrate.
Regularized linear models prevent overfitting by adding a penalty term to the loss function, which discourages overly large coefficients. This constrains the model, making it simpler and less prone to capturing noise in the training data.

**Example**: In a high-dimensional dataset with many features, a regularized model (e.g., Ridge or Lasso) shrinks the coefficients of less important features, reducing their impact on predictions. This prevents the model from fitting the training data too closely and generalizes better to unseen data.

---

### Q8. Discuss the limitations of regularized linear models and explain why they may not always be the best choice for regression analysis.
- Regularized models assume that all predictors contribute linearly to the outcome, which may not hold for complex, non-linear relationships.
- They may perform poorly when the true relationships between features and the target variable are complex or non-linear.
- Feature scaling is required, which can complicate model development.
- Regularization may not perform well if the regularization parameter ($\lambda$) is not chosen appropriately, potentially underfitting the data.

---

### Q9. You are comparing the performance of two regression models using different evaluation metrics. Model A has an RMSE of 10, while Model B has an MAE of 8. Which model would you choose as the better performer, and why? Are there any limitations to your choice of metric?
If RMSE is considered, Model A has an error of 10, while Model B has a lower MAE of 8. 

**Choosing Model B**: 
- Model B could be better in terms of average error, as MAE is 8.
- However, MAE does not account for large outliers as strongly as RMSE. If outliers are a concern, RMSE might be a more reliable metric, making Model A preferable.

**Limitations**:
- RMSE emphasizes larger errors, so if the distribution of errors is uneven, RMSE might highlight problems that MAE does not.
- MAE may be a better choice when outliers are not a primary concern.

---

### Q10. You are comparing the performance of two regularized linear models using different types of regularization. Model A uses Ridge regularization with a regularization parameter of 0.1, while Model B uses Lasso regularization with a regularization parameter of 0.5. Which model would you choose as the better performer, and why? Are there any trade-offs or limitations to your choice of regularization method?
Choosing between Ridge and Lasso depends on the nature of the features:
- **Ridge** (Model A) is better if all features contribute to the outcome, as it shrinks coefficients without eliminating them.
- **Lasso** (Model B) is better if only a subset of features is important, as it can shrink some coefficients to zero, performing feature selection.

**Trade-offs**:
- Ridge maintains all predictors but may not eliminate irrelevant ones.
- Lasso may remove relevant features if the regularization parameter ($\lambda = 0.5$) is too large, leading to underfitting.
- The choice depends on how sparse the model is expected to be and the need for feature selection.
