# Task Exception Prediction - Machine Learning Model

## Project Overview

This notebook presents a complete machine learning pipeline for predicting task exceptions in transportation operations. The goal is to build a binary classification model that can identify when an exception (RED) will occur versus normal operations (GOOD).

### Objective
Train an XGBoost model to predict exceptions with **precision > 50%**, prioritizing precision over recall as per case requirements.

### Key Requirements
- **Target Variable**: `Exception_output` (RED = exception occurred, GOOD = no exception)
- **Train/Test Split**: 95% training, 5% test (stratified)
- **Primary Metric**: Precision > 50%
- **Model**: XGBoost with precision-focused hyperparameters

## 1. Imports and Configuration

Import necessary libraries and configure the environment.

## 2. Data Loading and Preparation

Load the raw dataset and apply data cleaning transformations:
- **NumericCleaner**: Handles European decimal format (comma as decimal separator)
- **TextNormalizer**: Normalizes text columns (lowercase, strip whitespace)

Optionally save processed data to `data/processed/` for faster subsequent runs.

## 3. Categorical Feature Encoding

Apply Label Encoding to categorical variables. This converts text categories into numeric values that can be used by the machine learning model.

The target variable (`Exception_output`) is NOT encoded at this stage.

## 4. Feature and Target Separation

Separate the dataset into:
- **Features (X)**: All columns except `Exception_output`
- **Target (y)**: `Exception_output` encoded as binary (RED=1, GOOD=0)

## 5. Train/Test Split

Split the data into training (95%) and test (5%) sets using stratified sampling to maintain class distribution.

**Case Requirement**: 5% for testing, maintaining class proportions.

## 6. Model Training

Train an XGBoost classifier with precision-focused hyperparameters. The model uses:
- Early stopping to prevent overfitting
- Optimal threshold tuning to maximize precision
- Class weight balancing for imbalanced datasets

**Case Requirement**: Focus on precision > 50% (prioritize precision over recall).

## 7. Model Evaluation

Evaluate the model performance on both training and test sets. Calculate key metrics:
- **Accuracy**: Overall correctness
- **Precision**: Of predicted RED cases, how many are actually RED? (Primary metric)
- **Recall**: Of actual RED cases, how many were predicted?

Verify if the case requirement (Precision > 50%) is met.

### Understanding Overfitting

**What is Overfitting?**
Overfitting occurs when a model learns the training data too well, including its noise and specific patterns, but fails to generalize to new, unseen data.

**How to Identify Overfitting:**
- **Training metrics** are much better than **test metrics**
- Large gap (>10%) between train and test performance
- Model "memorizes" training data instead of learning general patterns

**Example Scenario:**
```
Training Set Performance:
- Accuracy: 0.95 (95%)
- Precision: 0.92 (92%)
- Recall: 0.94 (94%)

Test Set Performance:
- Accuracy: 0.72 (72%)  ← Gap of 23%!
- Precision: 0.68 (68%)  ← Gap of 24%!
- Recall: 0.70 (70%)     ← Gap of 24%!

Conclusion: SEVERE OVERFITTING
The model performs well on training data but poorly on test data.
```

**Why This Matters:**
- A model with overfitting will not work well in production
- It has learned dataset-specific patterns, not generalizable rules
- The test set simulates real-world performance

**How Our Model Prevents Overfitting:**
- Early stopping (stops training when test performance stops improving)
- Regularization (L1/L2 penalties)
- Precision-focused hyperparameters (more conservative model)
- Optimal threshold tuning (prevents overconfidence)

### Example: Interpreting Overfitting Results

After running the overfitting analysis, you'll get results like this:

**Good Model (No Overfitting):**
```python
overfitting_info = {
    'has_overfitting': False,
    'severity': 'none',
    'differences': {
        'accuracy': 0.02,   # Only 2% gap - excellent!
        'precision': 0.03,  # Only 3% gap - excellent!
        'recall': 0.01      # Only 1% gap - excellent!
    }
}
# Interpretation: Model generalizes well to new data
```

**Model with Overfitting:**
```python
overfitting_info = {
    'has_overfitting': True,
    'severity': 'severe',
    'differences': {
        'accuracy': 0.25,   # 25% gap - severe overfitting!
        'precision': 0.30,  # 30% gap - severe overfitting!
        'recall': 0.28      # 28% gap - severe overfitting!
    }
}
# Interpretation: Model memorized training data, won't work in production
```

**What to do if overfitting is detected:**
1. Increase regularization (already done with `focus_precision=True`)
2. Reduce model complexity
3. Get more training data
4. Use cross-validation for better evaluation

### Optional: Overfitting Demonstration

**Compare two models to see overfitting in action:**

This section demonstrates overfitting by training two models:
1. **Model with Overfitting**: Uses default/aggressive hyperparameters (no regularization)
2. **Model without Overfitting**: Uses precision-focused hyperparameters (with regularization)

**Expected Results:**
- Model 1: High training performance, low test performance (overfitting)
- Model 2: Balanced training and test performance (good generalization)

**Note:** This is optional and for educational purposes. For the actual case, use only the precision-focused model.

In [None]:
# OPTIONAL: Overfitting Demonstration
# Use train_xgboost() with focus_precision=False for Model 1 (will overfit)
# Use train_xgboost() with focus_precision=True for Model 2 (prevents overfitting)
# Use evaluate_model() to get metrics for both
# Use check_overfitting() to analyze both models
# Compare the results to see the difference

### Visual Comparison of Overfitting

Plot side-by-side comparison of both models to visualize the overfitting difference.

In [None]:
# OPTIONAL: Create side-by-side comparison plot
# Use plot_overfitting_analysis() for both models or create custom visualization
# Compare metrics_overfit vs metrics_good to visualize the difference

## 8. Overfitting Analysis

Analyze the model for overfitting by comparing training and test performance. Overfitting is detected when:
- Training metrics are significantly better than test metrics
- Gap > 10% typically indicates overfitting

Severity levels: none, mild, moderate, severe

## 9. Feature Importance Analysis

Identify the most important features that drive the model's predictions. This helps understand:
- Which factors are most predictive of exceptions
- Business insights about what causes exceptions
- Potential for feature engineering or data collection improvements

## 10. Individual Visualizations

Generate individual plots to analyze different aspects of the model performance.

### 10.1. Class Distribution

Visualize the distribution of classes (GOOD vs RED) in both training and test sets. This helps verify:
- Class balance
- Stratified split effectiveness
- Potential class imbalance issues

### 10.2. Metrics Comparison

Compare model performance metrics (Accuracy, Precision, Recall) between training and test sets. This visualization helps identify:
- Performance differences between train and test
- Overfitting indicators
- Model generalization capability

### 10.3. Feature Importance Visualization

Visual representation of the top N most important features. Shows which variables have the greatest impact on predictions.

### 10.4. Confusion Matrix

Detailed breakdown of predictions vs actual values:
- **True Positives (TP)**: Correctly predicted RED
- **False Positives (FP)**: Incorrectly predicted RED (Type I error)
- **False Negatives (FN)**: Missed RED cases (Type II error)
- **True Negatives (TN)**: Correctly predicted GOOD

Shows both counts and percentages for each cell.

### 10.5. ROC Curve

Receiver Operating Characteristic curve showing the trade-off between True Positive Rate and False Positive Rate at different classification thresholds.

- **AUC (Area Under Curve)**: Measures overall discriminative ability
- **AUC = 0.5**: Random classifier
- **AUC > 0.7**: Good discriminative power
- **AUC = 1.0**: Perfect classifier

### 10.6. Overfitting Analysis

Visual analysis of overfitting showing:
- Side-by-side comparison of train vs test metrics
- Gap analysis (difference between train and test performance)
- Severity assessment (none, mild, moderate, severe)

Red bars indicate significant gaps (>10%), suggesting overfitting.

## 11. Complete Model Report

Generate a comprehensive visual report consolidating all analyses into a single figure. This report includes:
- Confusion Matrix
- Top 10 Feature Importance
- Metrics Comparison
- Class Distributions (Train & Test)
- Overfitting Analysis
- ROC Curve

This consolidated view provides a complete overview of model performance for presentations and documentation.

## 12. Summary and Conclusions

Final summary of the model including:
- Dataset characteristics
- Model performance metrics
- Overfitting assessment
- Case requirement verification (Precision > 50%)
- Top features driving predictions
- Key insights and recommendations