# Model Validation

## How Do We Know Our Model is Good?

In machine learning, just like in many other areas, we want to check if our models make good predictions. This notebook will introduce you to ways to measure how well your models perform, depending on the kind of problem you're solving.

### 📊 Model Validation: Different Problems, Different Metrics

Imagine you have a dashboard that shows different scores depending on what you're measuring. Different problems need different metrics to see how good your model is! For example, predicting house prices (regression) uses different tools than classifying emails (classification).

### 📈 Regression Validation Metrics

When we predict continuous numbers, like house prices, we use specific metrics to see how close our predictions are to the true values.

#### How Good Are Your Number Predictions?

- 📏 **MSE (Mean Squared Error):** Average of squared differences between predicted and actual values.
- 📐 **RMSE (Root Mean Squared Error):** Square root of MSE, in the same units as your target.
- 📊 **MAE (Mean Absolute Error):** Average of absolute differences.
- 🎯 **R² Score:** How much variance in the data is explained by the model. Ranges from 0 to 1; higher is better.

In [None]:
# Example: calculating regression metrics
from sklearn.metrics import mean_squared_error, r2_score

# Suppose y_test are the true values and predictions are your model's predictions
y_test = [200000, 150000, 250000, 300000]
predictions = [210000, 140000, 260000, 310000]

rmse = mean_squared_error(y_test, predictions, squared=False)
r2 = r2_score(y_test, predictions)

print(f\"RMSE: {rmse:.2f}\")
print(f\"R²: {r2:.3f}\")


### 🏷️ Classification Validation Metrics

When predicting categories (like spam or not spam), different metrics are used to evaluate performance.

#### How Good Are Your Category Predictions?

- ✅ **Accuracy:** Percentage of correct predictions.
- 🎯 **Precision:** Out of all items predicted as positive, how many were really positive?
- 🔍 **Recall:** Out of all real positives, how many did the model find?
- ⚖️ **F1 Score:** The balance between precision and recall.
- 📋 **Confusion Matrix:** A detailed table showing true positives, false positives, true negatives, and false negatives.

In [None]:
# Example: calculating classification metrics
from sklearn.metrics import accuracy_score, classification_report

# Simulated true labels and predictions
y_test = [0, 1, 0, 1, 1]
predictions = [0, 0, 0, 1, 1]

accuracy = accuracy_score(y_test, predictions)
print(f\"Accuracy: {accuracy:.3f}\")
print(classification_report(y_test, predictions))


### 💡 Pro Tips for Model Improvement

Boost your model's performance with these techniques:

#### Boosting Performance

- **Regression:** Use feature scaling, Ridge/Lasso regularization, and ensemble methods.
- **Classification:** Balance classes, use cross-validation, and tune decision thresholds.
- **Both:** Improve features, gather more data, and tune hyperparameters for better results.

### Validation in Action

Here's some example code to perform validation using scikit-learn:

In [None]:
# Regression Validation
from sklearn.metrics import mean_squared_error, r2_score

# Example data
y_test = [200000, 150000, 250000, 300000]
predictions = [210000, 140000, 260000, 310000]

rmse = mean_squared_error(y_test, predictions, squared=False)
r2 = r2_score(y_test, predictions)
print(f\"RMSE: {rmse:.2f}, R²: {r2:.3f}\")

# Classification Validation
from sklearn.metrics import accuracy_score, classification_report

y_test = [0, 1, 0, 1, 1]
predictions = [0, 0, 0, 1, 1]

accuracy = accuracy_score(y_test, predictions)
print(f\"Accuracy: {accuracy:.3f}\")
print(classification_report(y_test, predictions))


### 🎯 Validation Wisdom

Remember: 
"A model without proper validation is like driving with your eyes closed!"

**Quick Check:**
If your house price model has an RMSE of $50,000, is that good or bad? (Hint: it depends on the price range!)