# Regression 2: Model Evaluation and Regularization
This notebook covers R-squared, adjusted R-squared, regression error metrics, and regularization techniques (Lasso and Ridge), with practical examples and discussion of their use in regression analysis.

## Q1. Explain the concept of R-squared in linear regression models. How is it calculated, and what does it represent?

**R-squared** (coefficient of determination) measures the proportion of variance in the dependent variable that is predictable from the independent variables. It is calculated as:

R² = 1 - (SS_res / SS_tot)

where SS_res is the sum of squared residuals and SS_tot is the total sum of squares. R-squared values range from 0 to 1, with higher values indicating a better fit.

In [None]:
# Example: Calculating R-squared
from sklearn.metrics import r2_score

# Assume Y and model.predict(X) from previous example
r2 = r2_score(Y, model.predict(X))
print(f"R-squared: {r2:.2f}")

## Q2. Define adjusted R-squared and explain how it differs from the regular R-squared.

**Adjusted R-squared** adjusts the R-squared value for the number of predictors in the model. It penalizes the addition of irrelevant variables and is calculated as:

Adjusted R² = 1 - [(1 - R²) * (n - 1) / (n - p - 1)]

where n is the number of observations and p is the number of predictors. Unlike R-squared, adjusted R-squared can decrease if unnecessary variables are added.

In [None]:
# Example: Calculating adjusted R-squared
n = X.shape[0]
p = X.shape[1] if len(X.shape) > 1 else 1
adj_r2 = 1 - (1 - r2) * (n - 1) / (n - p - 1)
print(f"Adjusted R-squared: {adj_r2:.2f}")

## Q3. When is it more appropriate to use adjusted R-squared?

Adjusted R-squared is more appropriate when comparing models with different numbers of predictors. It helps prevent overfitting by penalizing unnecessary variables, making it useful for model selection.

## Q4. What are RMSE, MSE, and MAE in the context of regression analysis? How are these metrics calculated, and what do they represent?

- **MSE (Mean Squared Error):** Average of squared differences between actual and predicted values.
- **RMSE (Root Mean Squared Error):** Square root of MSE; measures average prediction error in the same units as the target.
- **MAE (Mean Absolute Error):** Average of absolute differences between actual and predicted values.

All three metrics measure model prediction error, with lower values indicating better performance.

In [None]:
# Example: Calculating RMSE, MSE, and MAE
from sklearn.metrics import mean_squared_error, mean_absolute_error
import numpy as np

mse = mean_squared_error(Y, model.predict(X))
rmse = np.sqrt(mse)
mae = mean_absolute_error(Y, model.predict(X))
print(f"MSE: {mse:.2f}, RMSE: {rmse:.2f}, MAE: {mae:.2f}")

## Q5. Discuss the advantages and disadvantages of using RMSE, MSE, and MAE as evaluation metrics in regression analysis.

- **RMSE:** Sensitive to outliers; penalizes large errors more. Good for applications where large errors are especially undesirable.
- **MSE:** Similar to RMSE but in squared units; less interpretable.
- **MAE:** Less sensitive to outliers; easier to interpret.

**Disadvantages:**
- RMSE/MSE can be overly influenced by outliers.
- MAE does not penalize large errors as strongly.

Choose the metric based on the problem context and error tolerance.

## Q6. Explain the concept of Lasso regularization. How does it differ from Ridge regularization, and when is it more appropriate to use?

**Lasso regularization** (L1) adds a penalty equal to the absolute value of coefficients to the loss function, encouraging sparsity (some coefficients become zero). **Ridge regularization** (L2) adds a penalty equal to the square of coefficients, shrinking them but rarely making them exactly zero.

Use Lasso when feature selection is important; use Ridge when all features are expected to contribute.

In [None]:
# Example: Lasso vs Ridge
from sklearn.linear_model import Lasso, Ridge

lasso = Lasso(alpha=0.5)
lasso.fit(X, Y)
print('Lasso coefficients:', lasso.coef_)

ridge = Ridge(alpha=0.5)
ridge.fit(X, Y)
print('Ridge coefficients:', ridge.coef_)

## Q7. How do regularized linear models help to prevent overfitting in machine learning? Provide an example to illustrate.

Regularized models (Lasso, Ridge) add a penalty to large coefficients, discouraging overly complex models that fit noise in the training data. This helps improve generalization to new data.

**Example:** In a model with many features, regularization reduces the impact of less important features, preventing overfitting.

## Q8. Discuss the limitations of regularized linear models and explain why they may not always be the best choice for regression analysis.

- May underfit if regularization is too strong.
- Not suitable for highly non-linear relationships.
- Lasso may arbitrarily select among correlated features.

Alternative models (e.g., tree-based) may perform better for complex or non-linear data.

## Q9. You are comparing the performance of two regression models using different evaluation metrics. Model A has an RMSE of 10, while Model B has an MAE of 8. Which model would you choose as the better performer, and why? Are there any limitations to your choice of metric?

The choice depends on the application's tolerance for large errors. RMSE penalizes large errors more than MAE. If large errors are critical, prefer Model A if its MAE is also reasonable. If not, Model B may be better. Comparing different metrics directly can be misleading; ideally, compare both models using the same metric.

## Q10. You are comparing the performance of two regularized linear models using different types of regularization. Model A uses Ridge regularization with a regularization parameter of 0.1, while Model B uses Lasso regularization with a regularization parameter of 0.5. Which model would you choose as the better performer, and why? Are there any trade-offs or limitations to your choice of regularization method?

Choose the model with better validation performance (e.g., lower RMSE or MAE on validation data). Ridge is better when all features are useful; Lasso is better for feature selection. Trade-offs include interpretability (Lasso) vs. stability (Ridge). The choice of regularization parameter also affects performance and should be tuned.