### Q1. Explain the concept of R-squared in linear regression models. How is it calculated, and what does it represent?

- **R-squared (R²)** is a statistical measure that represents the proportion of the variance in the dependent variable that is predictable from the independent variables. It indicates how well the model explains the variability of the target variable.
- **Formula**:  
  \[
  R^2 = 1 - \frac{SS_{res}}{SS_{tot}}
  \]
  Where:
  - \( SS_{res} \) is the sum of squared residuals.
  - \( SS_{tot} \) is the total sum of squares (variance of the data).
- **Interpretation**: An \( R^2 \) value of 1 means the model explains all the variability, while 0 means it explains none of the variability.

### Q2. Define adjusted R-squared and explain how it differs from the regular R-squared.

- **Adjusted R-squared** is a modified version of \( R^2 \) that adjusts for the number of predictors in the model. It penalizes adding more variables that do not improve the model.
- **Formula**:  
  \[
  R^2_{adj} = 1 - \left( \frac{1 - R^2}{n - p - 1} \right)
  \]
  Where:
  - \( n \) is the number of data points.
  - \( p \) is the number of predictors.
- **Difference**: Regular \( R^2 \) can increase with the addition of irrelevant predictors, while adjusted \( R^2 \) accounts for the number of predictors and only increases if the new variable improves the model.

### Q3. When is it more appropriate to use adjusted R-squared?

- **Adjusted R-squared** is more appropriate when comparing models with different numbers of independent variables. It helps to avoid overfitting by penalizing the inclusion of unnecessary variables.

### Q4. What are RMSE, MSE, and MAE in the context of regression analysis? How are these metrics calculated, and what do they represent?

- **RMSE (Root Mean Squared Error)**:  
  \[
  RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y_i})^2}
  \]
  - Represents the standard deviation of residuals and measures how well the model predicts actual values.

- **MSE (Mean Squared Error)**:  
  \[
  MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y_i})^2
  \]
  - Measures the average squared difference between predicted and actual values.

- **MAE (Mean Absolute Error)**:  
  \[
  MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y_i}|
  \]
  - Measures the average absolute difference between predicted and actual values.

### Q5. Discuss the advantages and disadvantages of using RMSE, MSE, and MAE as evaluation metrics in regression analysis.

- **RMSE**:
  - **Advantages**: Penalizes large errors more than small ones, making it sensitive to outliers.
  - **Disadvantages**: Difficult to interpret in terms of the original units of the data.

- **MSE**:
  - **Advantages**: Similar to RMSE but simpler to compute and penalizes larger errors.
  - **Disadvantages**: Like RMSE, it penalizes outliers and is not in the same units as the original data.

- **MAE**:
  - **Advantages**: Provides a more interpretable measure in terms of original units and is less sensitive to outliers.
  - **Disadvantages**: Does not penalize large errors as heavily as RMSE or MSE.

### Q6. Explain the concept of Lasso regularization. How does it differ from Ridge regularization, and when is it more appropriate to use?

- **Lasso Regularization** (Least Absolute Shrinkage and Selection Operator) adds a penalty to the absolute values of the coefficients, shrinking some of them to exactly zero. This allows Lasso to perform feature selection by excluding irrelevant features.
  - **Formula**:  
    \[
    L(\beta) = \sum_{i=1}^{n} (y_i - \hat{y_i})^2 + \lambda \sum_{j=1}^{p} |\beta_j|
    \]
- **Ridge Regularization** penalizes the squared values of the coefficients and keeps all variables in the model but shrinks them towards zero.
  - **Formula**:  
    \[
    L(\beta) = \sum_{i=1}^{n} (y_i - \hat{y_i})^2 + \lambda \sum_{j=1}^{p} \beta_j^2
    \]
- **When to use**: Lasso is better when you want to perform feature selection, while Ridge is preferable when all variables are relevant, but you want to prevent overfitting.

### Q7. How do regularized linear models help to prevent overfitting in machine learning? Provide an example to illustrate.

- **Regularized models** add a penalty term (either L1 or L2) to the cost function, discouraging complex models with large coefficients. This prevents the model from fitting the noise in the training data, which can lead to overfitting.
  - **Example**: In a model predicting housing prices, adding too many irrelevant features (like house color) may lead to overfitting. Regularization ensures that such irrelevant features have little or no effect by shrinking their coefficients.

### Q8. Discuss the limitations of regularized linear models and explain why they may not always be the best choice for regression analysis.

- **Limitations**:
  - Regularized models can shrink coefficients too much, leading to **underfitting** if the regularization parameter is too high.
  - Regularization assumes all features contribute equally to the outcome, which may not hold in all datasets.
  - Lasso may arbitrarily exclude some important variables when multicollinearity exists.
  - Regularization may not perform well in non-linear relationships unless combined with feature engineering.

### Q9. You are comparing the performance of two regression models using different evaluation metrics. Model A has an RMSE of 10, while Model B has an MAE of 8. Which model would you choose as the better performer, and why? Are there any limitations to your choice of metric?

- **Choice**: Model B, with an MAE of 8, appears better as it indicates a smaller average error.
- **Limitations**: MAE does not penalize large errors as much as RMSE. If large errors are important to your analysis, you might prefer Model A's RMSE metric. The choice of metric depends on the nature of the problem and whether large errors are more costly than smaller ones.

### Q10. You are comparing the performance of two regularized linear models using different types of regularization. Model A uses Ridge regularization with a regularization parameter of 0.1, while Model B uses Lasso regularization with a regularization parameter of 0.5. Which model would you choose as the better performer, and why? Are there any trade-offs or limitations to your choice of regularization method?

- **Choice**: It depends on the context:
  - If feature selection is important, choose **Model B** (Lasso) as it tends to exclude irrelevant features by setting coefficients to zero.
  - If the goal is to prevent overfitting while retaining all variables, choose **Model A** (Ridge).
  
- **Trade-offs**: Lasso may eliminate important features in the presence of multicollinearity, while Ridge shrinks coefficients without eliminating variables, making it more robust for highly correlated data. The regularization parameter also affects the degree of shrinkage, so choosing an optimal parameter is crucial.