# Model Evaluation and Comparison

## 📌 Why Evaluate Models?

Model evaluation ensures that our predictive model **generalizes well** to new data.  
Key purposes:

- Assess model **accuracy**
- Compare **different models**
- Detect **overfitting** or **underfitting**


## 📌 Regression Metrics

For continuous outcomes:

1. **Mean Squared Error (MSE):**
$$
MSE = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2
$$

2. **Root Mean Squared Error (RMSE):**
$$
RMSE = \sqrt{MSE}
$$

3. **Mean Absolute Error (MAE):**
$$
MAE = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{y}_i|
$$

4. **R-squared ($R^2$):** proportion of variance explained
$$
R^2 = 1 - \frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2}{\sum_{i=1}^n (y_i - \bar{y})^2}
$$


## 📌 Classification Metrics

For binary outcomes:

- **Accuracy:** proportion of correctly classified instances  
- **Precision:** proportion of positive predictions that are correct
$$
Precision = \frac{TP}{TP + FP}
$$
- **Recall (Sensitivity):** proportion of actual positives detected
$$
Recall = \frac{TP}{TP + FN}
$$
- **F1 Score:** harmonic mean of precision and recall
$$
F1 = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}
$$
- **ROC-AUC:** area under the Receiver Operating Characteristic curve


In [None]:
# Example: Evaluate a linear regression model
set.seed(123)
x <- 1:10
y <- c(2.3, 3.1, 4.2, 5.0, 6.1, 7.2, 7.9, 8.8, 9.9, 10.2)

# Fit linear model
lm_model <- lm(y ~ x)

# Predictions
y_pred <- predict(lm_model)

# Compute metrics
MSE <- mean((y - y_pred)^2)
RMSE <- sqrt(MSE)
R2 <- 1 - sum((y - y_pred)^2) / sum((y - mean(y))^2)

print(paste("RMSE:", round(RMSE, 3)))
print(paste("R-squared:", round(R2, 3)))


In [None]:
# Example: Evaluate logistic regression
set.seed(123)
actual <- c(1, 0, 1, 1, 0, 0, 1, 0, 1, 0)
pred_prob <- c(0.9, 0.1, 0.8, 0.7, 0.2, 0.3, 0.6, 0.4, 0.9, 0.2)
pred_class <- ifelse(pred_prob > 0.5, 1, 0)

# Confusion matrix
conf_matrix <- table(Predicted = pred_class, Actual = actual)
print(conf_matrix)

# Accuracy
accuracy <- sum(diag(conf_matrix)) / sum(conf_matrix)
print(paste("Accuracy:", round(accuracy, 3)))


## 📌 Comparing Models

1. **Visual comparison:**
   - Plot predicted vs actual values
   - ROC curve for classification

2. **Cross-validation:** Split data into **k folds** to test generalization

3. **Information criteria for model selection:**
   - **AIC (Akaike Information Criterion)**
   - **BIC (Bayesian Information Criterion)**
   - Lower values indicate better trade-off between fit and complexity

4. **Nested models:** Compare simpler vs complex models using **likelihood ratio tests**


In [None]:
# Fit two models
lm_model1 <- lm(y ~ x)          # simple linear
lm_model2 <- lm(y ~ x + I(x^2)) # quadratic model

# Compare AIC and BIC
AIC_model1 <- AIC(lm_model1)
AIC_model2 <- AIC(lm_model2)
BIC_model1 <- BIC(lm_model1)
BIC_model2 <- BIC(lm_model2)

print(paste("AIC Model 1:", round(AIC_model1, 2)))
print(paste("AIC Model 2:", round(AIC_model2, 2)))
print(paste("BIC Model 1:", round(BIC_model1, 2)))
print(paste("BIC Model 2:", round(BIC_model2, 2)))


# Real-World Analogy

- **Regression metrics:** Imagine predicting house prices. RMSE tells you the average error in dollars, and R² tells you how well your model explains price variation.  
- **Classification metrics:** Predicting if an email is spam. Accuracy shows overall correctness, but F1 balances detecting spam (recall) and avoiding false alarms (precision).  
- **Model comparison:** Choosing between a simple linear model vs. a more complex polynomial model is like choosing a simple recipe vs. a fancy one — you want the best outcome without unnecessary complexity.
