In [17]:
"""
Q1. R-squared in Linear Regression:
- **R-squared (R²)** represents the proportion of variance in the dependent variable that is explained by the independent variable(s) in a regression model. It is calculated as:
  R² = 1 - (SS_res / SS_tot)
  Where SS_res is the sum of squared residuals (errors), and SS_tot is the total sum of squares (the variance in the dependent variable).
- R² ranges from 0 to 1, with 1 indicating a perfect fit and 0 meaning no explanatory power. A higher R² value generally indicates a better fit.

Q2. Adjusted R-squared:
- **Adjusted R-squared** is a modified version of R-squared that adjusts for the number of predictors in the model. It is calculated as:
  Adjusted R² = 1 - [(1 - R²) * (n - 1) / (n - p - 1)]
  Where:
  - n is the number of data points,
  - p is the number of predictors.
- Adjusted R-squared is useful for comparing models with different numbers of predictors because it penalizes adding irrelevant predictors that do not improve the model's performance.

Q3. When to Use Adjusted R-squared:
- **Adjusted R-squared** is more appropriate when comparing models with different numbers of independent variables (predictors). It helps to prevent the overestimation of model quality when more predictors are added, even if they do not provide additional explanatory power.

Q4. RMSE, MSE, and MAE:
- **Root Mean Squared Error (RMSE)**: Measures the square root of the average of squared residuals. It penalizes large errors more heavily than smaller ones. RMSE = √(1/n Σ (y_i - ŷ_i)²).
- **Mean Squared Error (MSE)**: Measures the average of squared residuals. MSE = (1/n) Σ (y_i - ŷ_i)².
- **Mean Absolute Error (MAE)**: Measures the average of absolute residuals. MAE = (1/n) Σ |y_i - ŷ_i|.
- These metrics quantify the difference between the actual values (y_i) and the predicted values (ŷ_i), with RMSE and MSE being sensitive to larger errors and MAE being more robust to outliers.

Q5. Advantages and Disadvantages of RMSE, MSE, and MAE:
- **RMSE**:
  - **Advantages**: Sensitive to large errors, which makes it useful when large errors are more undesirable.
  - **Disadvantages**: Not robust to outliers; large errors disproportionately affect RMSE.
- **MSE**:
  - **Advantages**: Similar to RMSE, it penalizes larger errors more, and is easier to compute for mathematical optimization.
  - **Disadvantages**: Not interpretable in the original units of the data.
- **MAE**:
  - **Advantages**: Less sensitive to outliers, provides a more interpretable result in the original units.
  - **Disadvantages**: Does not penalize larger errors as much, which may not be desirable in some cases.

Q6. Lasso vs. Ridge Regularization:
- **Lasso (Least Absolute Shrinkage and Selection Operator)**: Lasso regularization adds a penalty term based on the absolute values of the coefficients (L1 norm). This results in some coefficients becoming exactly zero, effectively performing feature selection.
- **Ridge Regularization**: Adds a penalty term based on the squared values of the coefficients (L2 norm), which shrinks the coefficients but does not set them to zero.
- **When to Use**:
  - **Lasso** is more suitable when you suspect that only a few predictors are important (sparse models).
  - **Ridge** is more appropriate when you expect that all predictors have some influence but want to reduce their impact to prevent overfitting.

Q7. Regularized Linear Models and Overfitting:
- **Regularized linear models** (like Lasso and Ridge) help prevent overfitting by adding a penalty to the size of the coefficients. This discourages large coefficients that could result in overly complex models that fit noise rather than the underlying data pattern.
- Example: In a linear regression model with many features, applying Ridge regularization will shrink the weights, preventing the model from fitting random fluctuations in the data.

Q8. Limitations of Regularized Linear Models:
- **Limitations**: Regularized models (like Lasso and Ridge) may not perform well if the underlying relationship is non-linear or if the number of features is very small. In cases where feature selection is unnecessary, Lasso can discard useful variables. Moreover, these models may not generalize well if the regularization parameter is not tuned properly.

Q9. Choosing Between Models Using RMSE and MAE:
- Model A (RMSE = 10) vs. Model B (MAE = 8):
  - The choice of the better model depends on the problem context:
    - **If large errors are undesirable**, Model B may be preferred since MAE is more robust to large errors.
    - **If large errors are penalized more heavily**, Model A (with a lower RMSE) would be the better performer.
  - There are trade-offs: RMSE penalizes large errors more heavily, but it may not reflect the overall error distribution as well as MAE does.

Q10. Choosing Between Ridge and Lasso Regularization:
- Model A (Ridge, λ = 0.1) vs. Model B (Lasso, λ = 0.5):
  - **Ridge** tends to work better when all predictors have some influence, and the model is more stable when predictors are highly correlated.
  - **Lasso** is suitable when you want to perform feature selection by setting some coefficients to zero.
  - **Trade-offs**: Ridge is more suitable when you don't want to discard features, while Lasso is better for reducing the number of predictors. The regularization strength (λ) also plays a crucial role in controlling the degree of regularization.
"""


"\nQ1. R-squared in Linear Regression:\n- **R-squared (R²)** represents the proportion of variance in the dependent variable that is explained by the independent variable(s) in a regression model. It is calculated as:\n  R² = 1 - (SS_res / SS_tot)\n  Where SS_res is the sum of squared residuals (errors), and SS_tot is the total sum of squares (the variance in the dependent variable).\n- R² ranges from 0 to 1, with 1 indicating a perfect fit and 0 meaning no explanatory power. A higher R² value generally indicates a better fit.\n\nQ2. Adjusted R-squared:\n- **Adjusted R-squared** is a modified version of R-squared that adjusts for the number of predictors in the model. It is calculated as:\n  Adjusted R² = 1 - [(1 - R²) * (n - 1) / (n - p - 1)]\n  Where:\n  - n is the number of data points,\n  - p is the number of predictors.\n- Adjusted R-squared is useful for comparing models with different numbers of predictors because it penalizes adding irrelevant predictors that do not improve the