> 💡 **Recommended Environment**:  
> Run this notebook in the `model_eval_suite` Conda environment for best results.  
> See setup instructions in the [Usage Guide](../resource_hub/usage_guide.md).
>
> ⚠️ If you're running this outside Conda, you can install dependencies manually:
> Uncomment the line below to install from the root requirements file.
> ```python
> # !pip install -r ../../requirements.txt
> ```

# 🧪 Model Evaluation Suite Demo Notebook

This notebook demonstrates how to use the **Model Evaluation Suite** to:

- Prepare and validate input data.
- Run a full modeling pipeline using YAML configuration files.
- Log models and artifacts with MLflow.
- Evaluate a production candidate against a holdout dataset.
- Optionally compare against a **baseline model** for performance drift or uplift.

<details><summary>📦 Project Structure</summary>

This notebook expects the following directories and files to exist:
- `config/`: contains user-defined YAML configuration files.
- `data/holdout_data/`: contains the holdout CSV used in validation.
- `mlruns/`: MLflow tracking output.

</details>


<details><summary>⚙️ Workflow Overview</summary>

1. **Prep Data (Optional)** – If needed, run data prep to split/train/test and cache sets.
2. **Run Experiment** – Train model(s) as defined in the YAML file using `run_pipeline`.
3. **Validate Champion** – Evaluate the registered MLflow model using `validate_and_display`.

>YAML-driven configuration allows for full modularity, reproducibility, and MLflow registry integration.

</details>

### 📜 Configuration Setup

This notebook is driven by modular **YAML configuration files**, which serve as the central control system for the evaluation suite.

<details><summary>click here to expand section</summary>

These YAMLs are edited **upfront** to define the behavior of each stage in the pipeline. See `config_resources/` in the repository for further guidance.

The YAML configuration governs:

- Filepaths for all inputs and outputs (train/test/holdout, plots, reports, logs)
- Model architecture, hyperparameters, and estimator type
- Preprocessing and feature engineering behavior
- Optional diagnostics modules (e.g., VIF, SHAP, permutation importance)
- MLflow tracking settings (URI, run tags, experiment names)
- Model type and parameters
- Plotting controls and dashboard rendering options
- Evaluation behavior: segmentation columns, scoring metrics, baseline model comparison

> Prebuilt templates are provided. You can download them at `config_resources/config.zip` and use them as-is or customize them to match your workflow

</details>

### 🔧 Custom Feature Engineering

This suite supports plug-and-play custom transformers via the `feature_engineering` block in your YAML. 

<details><summary>click here to expand</summary>

Your transformer should follow scikit-learn's `fit/transform` API and be referenced like this:

```yaml
feature_engineering:
  run: true
  module: "my_project.custom_features"
  class_name: "MyFeatureTransformer"
```

>Your transformer must follow the fit/transform interface. See `docs/feature_engineering.md` for a full example. 

</details>

## 📚 Dashboard Guidence

#### 📉 Pre-Model Diagnostics Dashboard

This optional module runs before any model training or validation occurs. It provides key insights into the integrity and statistical structure of your input data. It is driven by the `pre_model_diagnostics` block in your YAML and is best used in notebook workflows.

<details><summary>click here to expand section</summary>

- **Overview**  
  Summary of the input dataset, schema, and basic shape metadata.

- **Missingness**  
  Tabulates and visualizes missing values by column, with percent missing and optional flag encoding hints.

- **Collinearity**  
  Includes:
  - Pearson correlation heatmap
  - Variance Inflation Factor (VIF) plot to detect multicollinearity risks

- **Distribution Quality**  
  Visualizes skewness and potential distribution anomalies:
  - Target column distribution
  - Numerical feature histograms
  - Outlier detection using IQR boxplots

- **Evaluation Plots (via PlotViewer)**  
  This tab includes all advanced diagnostics plots:
  - VIF plot
  - Pearson heatmap
  - Outlier boxplots
  - Feature-wise skew distributions

>These diagnostics are critical for spotting leakage, encoding flaws, and redundancy before any modeling occurs.

</details>


#### 📊 Model Evaluation Dashboard 

This dashboard provides an interactive summary of the model trained in the experimental run. It visualizes performance on the test set and includes explainability tools to support model diagnostics and stakeholder communication.

<details><summary>Summary</summary>

- High-level performance metrics (e.g., R², MAE for regression or Accuracy, F1 for classification)
- If **cross-validation** is enabled, a **boxplot of fold-level scores** is included
- If a **baseline model** is configured, delta scores are annotated beside the champion’s metrics

</details>

<details><summaryBaseline (if applied)</summary>

- Displays the same metrics for the baseline model
- Highlights any drop or improvement when compared to the current champion

</details>

<details><summary>Importance</summary>

- Feature importance scores from:
  - SHAP bar charts (if SHAP enabled)
  - Coefficients (for linear models)
  - Permutation importance (if enabled)
- Useful for debugging and stakeholder reporting

</details>

<details><summary>Explainability</summary>

- SHAP Impact Summary Plot for understanding global feature effects
- Optional if SHAP is disabled in your config

</details>

<details><summary>Plotviewers</summary>

**Model Performance Plots**
- Interactive evaluation visuals via the plot viewer widget:
  - ROC & PR curves (classification)
  - Residuals, prediction vs. truth (regression)
  - Confusion matrix, threshold plots, calibration, etc.

**Distribution Plots**
- Always included
- Shows feature distributions in the holdout set
- Supports quick detection of skew, class imbalance, or feature leakage

</details>

<details><summary>Metadata</summary>

- Full configuration summary:
  - YAML config snapshot
  - Model and version from MLflow
  - Holdout dataset used
  - Run ID and export paths

</details>

<details><summary> Alerts</summary>

- Automated audit system that surfaces:
  - Warning thresholds (e.g., F1 below expected)
  - Cross-validation variance anomalies
  - Drift against baseline scores

</details>

#### 📌 Core Imports

You can access the main runners directly from the package thanks to a clean interface exposed via `__init__.py`. These entrypoints allow you to run each stage of the pipeline from a single import.

In [None]:
from model_eval_suite import run_experiment, validate_champion, prep_data

#### 📤 Data Preparation

If you're starting from raw CSVs, you can use the suite's built-in preprocessing tool, `data_prep.py` to split the data into training, testing, and holdout sets. 

Skip this if you've already created your `train.csv`, `test.csv`, and `holdout.csv`.

In [None]:
prep_data(config_path="config/data_prep.yaml")

## ⚙️ Model Experiment Runs (Demo)

This demo walks through multiple model runs using the `salifort_50k` dataset. Although the dataset is optimized for **classification tasks**, it is used for both classification and regression pipelines to demonstrate flexibility and YAML-driven control.

We run the following models using the evaluation suite:

### 🔍 Classifier Models

- **Gaussian Naive Bayes**  
  Config: [config/classifier/guas_nb.yaml](config/classifier/guas_nb.yaml)
  
- **Logistic Regression**  
  Config: [config/classifier/logreg.yaml](config/classifier/logreg.yaml)

- **XGBoost Classifier**  
  Config: [config/classifier/xgboost.yaml](config/classifier/xgboost.yaml)

### 📈 Regressor Models

- **Linear Regression**  
  Config: [config/regressor/linreg.yaml](config/regressor/linreg.yaml)

- **XGBoost Regressor**  
  Config: [config/regressor/xgboost_reg.yaml](config/regressor/xgboost_reg.yaml)

Each model triggers:

- An optional **Pre-Model Diagnostics Dashboard** (if enabled in YAML)
- A complete **Evaluation Dashboard** with explainability, distributions, and exportable artifacts

At the end of the demo, we use the **champion validation** system to validate and crown the two XGBoost models — one for classification, one for regression.

> All behavior is controlled by the YAML configs. See the `config/` directory or `config_resources/config.zip` for template downloads.

In [None]:
# ========== Naive - Bayes ==========
run_experiment(user_config_path="config/classifier/guas_nb.yaml")

#### ⚠️ SHAP Error Handling

<details><summary>click to expand section</summary>

Some models — such as `SVC`, `SVR`, or `GaussianNB` — do not expose traditional feature importance attributes or are incompatible with SHAP explainability tools.

This suite handles such situations **gracefully**:

- The SHAP tab will be skipped silently if no compatible features are found.
- A warning will be logged (but not treated as a failure).
- All other evaluation plots and metrics will still render normally.

This ensures that the workflow remains robust even for models with limited explainability tooling.

In [None]:
# ========== Logistic Regression ==========
run_experiment(user_config_path="config/classifier/logreg.yaml")

In [None]:
# ========== XGBoost Classifier ==========

run_experiment(user_config_path="config/classifier/xgboost.yaml")

### 🔁 Cross-Validation Insight

<details><summary>click here to expand</summary>

If hyperparameter tuning via cross-validation is enabled in the config (`hyperparameter_tuning.run: true`), the dashboard will include an additional boxplot in the **Summary** tab. 

This plot visualizes the distribution of CV scores across folds for the best-performing parameter set, offering a quick diagnostic of stability and performance variance.

In [None]:
# ========== Linear Regression ==========
run_experiment(user_config_path="config/regressor/linreg.yaml")

#### 🚨 Automated Alert Auditing
<details>
<summary>click here to expand section</summary>

The validation dashboard includes an **Alerts** tab that surfaces automated audit checks on your model's performance.

These alerts are designed to flag potential concerns such as:

- Very low precision or recall
- High class imbalance
- Overfitting indicators (e.g., large delta between train/test scores)
- Underwhelming performance against a baseline (if provided)

This system provides a lightweight, interpretable review of model quality without requiring custom code or manual thresholding.

In [None]:
# ========== XGBoost Regressor ==========
run_experiment(user_config_path="config/regressor/xgboost_reg.yaml")

## 🏆 Champion Model Validation

This section evaluates a registered MLflow model (your *champion*) against a holdout dataset using a dedicated validation YAML configuration.

### Key Features

- Uses its own standalone YAML file (separate from training experiments)
- Accepts an optional **baseline model** for drift detection or performance benchmarking
- Automatically generates:
  - Confidence interval plots (if applicable)
  - Baseline comparison deltas (if a baseline model is provided)
  - Alert audits for performance degradation or instability
- Produces a complete interactive dashboard with:
  - Summary metrics and cross-validation visualizations
  - Explainability and feature importance plots
  - Distribution visualizations for target and predictions
  - Full configuration and environment metadata
- **Tags** the evaluated model in the MLflow Registry using your specified `production_tag`

📍 This workflow is ideal for pre-deployment validation, regression testing, and model promotion decisions.

#### Validation Configurations Used in This Demo

- [config/xgb_validation.yaml](config/xgb_validation.yaml) – XGBoost classifier
- [config/xgb_reg_validation.yaml](config/xgb_reg_validation.yaml) – XGBoost regressor

In [None]:
# ========== Validate Classifer Champion Model ==========
validate_champion(config_path="config/xgb_validation.yaml")

In [None]:
# ========== Validate Regressor Champion Model ==========
validate_champion(config_path="config/xgb_reg_validation.yaml")

---

## ✅ Wrap-Up and Next Steps

You’ve now run multiple models through the full suite — from preprocessing and diagnostics to evaluation and champion validation.

This notebook demonstrates the flexibility of the system, including:

- YAML-driven configuration at every stage
- Reusable pipelines for both classification and regression tasks
- Support for custom feature engineering and hyperparameter tuning
- Interactive dashboards for diagnostics and final reporting
- MLflow integration for model tracking and registry updates

### Next Steps

- **Test additional models** by duplicating a config YAML.
- **Customize features** using your own transformers or fe_config modules.
- **Enable advanced diagnostics**, SHAP, and permutation importance as needed.
- **Package and deploy** validated models via the MLflow registry.

For more examples and config templates, explore the [`config_resources/`](config_resources/) folder or the full [README.md](../README.md).

> Questions or suggestions? Feel free to submit an issue or feature request in the repository.