# 專案完整流程：鋼板瑕疵分類分析
整合了從資料探索、前處理、特徵分布分析、建模、SMOTEENN 資料平衡、Ensemble 模型訓練與預測到模型解釋與深度學習的完整流程。

## 📘 目錄（Table of Contents）

1. [資料探索與前處理](#-1.-資料探索與前處理)
2. [傳統機器學習模型訓練](#-2.-傳統機器學習模型訓練)
3. [模型解釋與 SHAP 分析](#-3.-模型解釋與-SHAP-分析)
4. [多分類模型強化](#-4.-多分類模型強化)
5. [困難類別處理策略](#-5.-困難類別處理策略)



> 本專案使用資料：**Steel Plates Faults Dataset**  
> 資料來源：[UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Steel+Plates+Faults)


```mermaid
---
config:
  layout: fixed
  theme: neutral
  look: neo
---
flowchart TD
    A["資料讀取與合併標籤"] --> B["資料清理與標準化"]
    B --> C["特徵選擇與高相關欄位刪除"]
    C --> D["資料分割 + SMOTEENN"]
    D --> E["RF/XGB/LGBM 等模型訓練與比較"]
    E --> F["SHAP 特徵重要性分析"]
    F --> G["強化模型：交互特徵 + Optuna 調參"]
    G --> H["合併難類別為 other類，再訓練"]
    H@{ shape: rect}


## 🧩 Pipeline Flow (Modular Notebooks)

This project is organized into modular notebooks. Each one serves a specific stage of the pipeline:


### 📘 `01_EDA_and_Preprocessing.ipynb`
- Conducts **Exploratory Data Analysis** (EDA) to understand feature distributions, correlations, and class imbalance.
- Applies **data cleaning**: missing value handling, outlier detection, normalization.
- Performs **feature selection** and dimensionality checks.
- Handles **class imbalance** using SMOTE, SMOTEENN, or other resampling techniques.


### 📘 `02_Traditional_ML_Models.ipynb`
- Trains classical machine learning models: `RandomForest`, `XGBoost`, and `LightGBM`.
- Applies soft-voting ensemble (`VotingClassifier`) for improved generalization.
- Evaluates models using classification report, macro-F1, ROC AUC, and confusion matrices.


### 📘 `03_Model_Interpretability_SHAP.ipynb`
- Uses `SHAP` (SHapley Additive exPlanations) to visualize feature importance.
- Generates **summary plots**, **bar plots**, and **class-wise explanation plots**.
- Helps understand which features influence each class prediction the most.

### 📘 `04_Enhance_Modeling_Strategy.ipynb`
- Implements advanced techniques to improve minority class recall and robustness.
- Includes **two-stage classification** (binary → multiclass).
- Compares different class balancing strategies and custom loss functions.
- Evaluates hybrid scoring methods and performs in-depth error analysis.

# Master Pipeline Controller

This notebook serves as the entry point for the complete analysis workflow.


## 🚀 How to Execute

You may manually run each notebook in order, or automate with tools like `papermill`.

```python
# Optional automation via papermill
import papermill as pm

notebooks = [
    "01_EDA_and_Preprocessing.ipynb",
    "02_Traditional_ML_Models.ipynb",
    "03_Model_Interpretability_SHAP.ipynb",
    "04_Enhance_Modeling_Strategy.ipynb",
]

for nb in notebooks:
    print(f"📘 Running: {nb}")
    pm.execute_notebook(nb, f"outputs/{nb}")
```

