### Q1. **Explain the concept of R-squared in linear regression models. How is it calculated, and what does it represent?**

**R-squared (R²)**, or the coefficient of determination, is a statistical metric used in linear regression to measure how well the regression line fits the observed data. It represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s).

- **Calculation**: 
   \[
   R^2 = 1 - \frac{SS_{residual}}{SS_{total}}
   \]
   Where:
   - \(SS_{residual}\) is the sum of squares of the residuals (errors).
   - \(SS_{total}\) is the total sum of squares (variance of the data from the mean).

**Interpretation**:
- R² ranges from 0 to 1. A higher value indicates a better fit, meaning the model explains a larger proportion of the variance in the dependent variable.
- An R² of 1 means the model explains all the variance, while an R² of 0 means the model explains none of the variance.

---

### Q2. **Define adjusted R-squared and explain how it differs from the regular R-squared.**

**Adjusted R-squared** adjusts the R² value by accounting for the number of predictors (independent variables) in the model. It penalizes the R² value as more predictors are added, making it a more accurate measure of model fit, especially for models with multiple predictors.

- **Formula**:
   \[
   \text{Adjusted } R^2 = 1 - \left( \frac{(1 - R^2)(n - 1)}{n - p - 1} \right)
   \]
   Where:
   - \(n\) is the number of data points.
   - \(p\) is the number of predictors.
   
**Difference**: 
- **R²** can increase as you add more variables, even if they do not improve the model. 
- **Adjusted R²**, however, only increases if the new predictor improves the model beyond what would be expected by chance.

---

### Q3. **When is it more appropriate to use adjusted R-squared?**

**Adjusted R-squared** is more appropriate when comparing models with a different number of predictors. It accounts for overfitting that can occur by adding unnecessary variables and provides a more reliable metric for model performance when there are multiple predictors.

---

### Q4. **What are RMSE, MSE, and MAE in the context of regression analysis? How are these metrics calculated, and what do they represent?**

These are error metrics used to evaluate the performance of regression models:

- **Mean Squared Error (MSE)**: 
   \[
   MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
   \]
   Where \(y_i\) is the actual value, and \(\hat{y}_i\) is the predicted value. It gives the average squared difference between actual and predicted values, emphasizing larger errors due to the squaring.

- **Root Mean Squared Error (RMSE)**: 
   \[
   RMSE = \sqrt{MSE}
   \]
   RMSE is the square root of MSE, bringing the error back to the same unit as the target variable.

- **Mean Absolute Error (MAE)**: 
   \[
   MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|
   \]
   It measures the average of the absolute differences between actual and predicted values.

**Representation**:
- MSE/RMSE penalizes larger errors more heavily (due to squaring), while MAE treats all errors equally.
- RMSE is more interpretable because it’s in the same unit as the target variable.

---

### Q5. **Discuss the advantages and disadvantages of using RMSE, MSE, and MAE as evaluation metrics in regression analysis.**

- **MSE**: 
  - **Advantages**: Heavily penalizes larger errors, useful when large errors are undesirable.
  - **Disadvantages**: Not easily interpretable since it’s in squared units of the target variable.

- **RMSE**: 
  - **Advantages**: Same units as the target variable, easy to interpret.
  - **Disadvantages**: Still penalizes large errors more than smaller ones.

- **MAE**: 
  - **Advantages**: Robust to outliers, treats all errors equally.
  - **Disadvantages**: Does not penalize large errors as much as MSE/RMSE.

---

### Q6. **Explain the concept of Lasso regularization. How does it differ from Ridge regularization, and when is it more appropriate to use?**

**Lasso (Least Absolute Shrinkage and Selection Operator)** regularization adds a penalty equal to the absolute value of the magnitude of coefficients to the loss function:

\[
\text{Lasso Loss} = \text{RSS} + \lambda \sum_{i=1}^{n} |w_i|
\]

- **Difference from Ridge**: While Ridge regularization (L2) adds a penalty proportional to the square of the coefficients, Lasso (L1) can drive some coefficients to zero, effectively performing feature selection.

- **When to use**: Lasso is more appropriate when you expect that some of the predictors should be completely excluded from the model, as it can shrink some coefficients to zero, making the model simpler and more interpretable.

---

### Q7. **How do regularized linear models help to prevent overfitting in machine learning? Provide an example to illustrate.**

Regularization techniques, such as Ridge (L2) and Lasso (L1), penalize large coefficients in the model, encouraging the model to prefer simpler patterns and thus preventing overfitting. Overfitting occurs when the model fits the training data too well, capturing noise instead of the underlying pattern.

**Example**: 
In a dataset with many features, a regular linear regression may assign large weights to irrelevant features, resulting in overfitting. Ridge regularization shrinks these weights, while Lasso may set some of them to zero, preventing the model from becoming too complex.

---

### Q8. **Discuss the limitations of regularized linear models and explain why they may not always be the best choice for regression analysis.**

- **Limitations**:
  - **Lasso** can lead to the exclusion of important variables if the penalty is too strong.
  - **Ridge** does not perform feature selection, so it may retain irrelevant features.
  - Regularized models assume a linear relationship between variables, which may not hold in all cases.
  - They may underfit if the penalty term is too large.

---

### Q9. **You are comparing the performance of two regression models using different evaluation metrics. Model A has an RMSE of 10, while Model B has an MAE of 8. Which model would you choose as the better performer, and why? Are there any limitations to your choice of metric?**

Choosing the better model depends on the specific context:
- **Model A (RMSE = 10)**: More sensitive to large errors, which could indicate it performed worse on some predictions.
- **Model B (MAE = 8)**: Better for cases where consistent performance (small and large errors weighted equally) is important.

**Limitations**:
- RMSE penalizes large errors more, while MAE treats all errors equally. If outliers are important, RMSE may be a better metric.
- MAE might be more appropriate for datasets where outliers are not critical.

---

### Q10. **You are comparing the performance of two regularized linear models using different types of regularization. Model A uses Ridge regularization with a regularization parameter of 0.1, while Model B uses Lasso regularization with a regularization parameter of 0.5. Which model would you choose as the better performer, and why? Are there any trade-offs or limitations to your choice of regularization method?**

The choice depends on the dataset characteristics:
- **Model A (Ridge, λ = 0.1)**: Ridge is effective when all features are expected to contribute to the prediction.
- **Model B (Lasso, λ = 0.5)**: Lasso is preferable when you expect some features to be irrelevant, as it can shrink some coefficients to zero.

**Trade-offs**:
- Lasso can perform feature selection, but it may exclude important features if the regularization is too strong.
- Ridge retains all features but may not simplify the model as much.

