# Winding Fault Model Training

## What this notebook does 
Generates a **YAML configuration** of winding-fault thresholds from a harmonics dataset for a specific machine and **logs the experiment in MLflow**.  
The YAML artifact is what your **real-time inferencing** service consumes to decide winding faults.

---

## How to use (5 steps)
1. **Set IDs & data path** in the “Core params” cell  
   - `dataset_path`, `tenant_id`, `machine_id`, `EXPERIMENT_NAME`, `RUN_NAME`.
2. **Run all cells** top-to-bottom.  
   The notebook will: load data → compute phase imbalance & stats → derive thresholds (mean + 3σ) → compute a quality index → **save YAML** → **log metrics & artifact** to MLflow.
3. **Check outputs in cell logs**  
   - Printed stats, computed thresholds, quality index, MLflow run id.
4. **Verify MLflow**  
   - Parameters, metrics (`*_imbalance_*`, `*_threshold`, `quality_index`), artifact, and the **Run Documentation** tag.
5. *(Optional)* **Enable promotion block**  
   - Uncomment “Automated Promotion” to auto-register the best config (`quality_index`) as `@production`.

---

## Inputs & assumptions
- CSV with columns for **current harmonics** prefixed with `c` and **voltage harmonics** prefixed with `v`.  
- Optional `timestamp` column (parsed if present).  
- Any `_id`, `metaData.tenant_id`, `metaData.machine_id` columns are dropped if present.

---

## What it computes
- **Phase imbalance:** `max(row) − min(row)` over `c*` (current) and `v*` (voltage) columns.  
- **Imbalance stats:** mean / std for current and voltage imbalance.  
- **Thresholds (mean + 3σ):**
  - `current_threshold`, `voltage_threshold`  
  - `harmonic_threshold` (row-wise sum)  
  - `statistical_threshold` (stacked distribution)  
- **Quality index:** `(1 / (cur_std + vol_std + 1e-6)) * log1p(n_rows)`  
  → larger dataset + stable imbalance ⇒ higher score.

---

## Outputs
- **Artifact:** `winding_fault_configs_m{machine_id}.yaml`  
- **MLflow logs:** parameters (IDs, dataset), metrics (shape, imbalance stats, thresholds, `quality_index`), artifact, and run documentation tag.

---

## Parameters to change

| What | Variable(s) | Current | When to change |
|---|---|---:|---|
| **Experiment naming** | `EXPERIMENT_NAME`, `RUN_NAME` | `"winding_fault_monitoring_testing/28/257"`, `"winding_fault_config_v1"` | Always set per tenant/machine or run. |
| **Dataset path** | `dataset_path` | `"iotts.harmonics_257.csv"` | Point to new machine’s CSV. |
| **IDs** | `tenant_id`, `machine_id` | `"28"`, `"257"` | Always set for new machine. |
| **Column prefixes** | `c*`, `v*` | current / voltage | Change if schema differs. |
| **Meta columns** | `drop_cols` | `_id`, `metaData.*` | Extend if needed. |
| **Timestamp** | `"timestamp"` | parsed if present | Update if named differently. |
| **Sigma multiplier** | `3` | `mean + 3σ` | Tune sensitivity. |
| **Quality metric** | `quality_index` formula | as given | Adjust if you prefer. |
| **Artifact name** | `winding_fault_configs_m{machine_id}.yaml` | per machine | Keep stable pattern. |

> **Tip:** If you change sigma rules or imbalance definition, also update the **Run Documentation** tag.

---

## How thresholds are used online
- Real-time service loads the YAML config and applies thresholds.  
- If observed metrics exceed thresholds persistently → **winding fault flagged**.  
- `quality_index` ranks configs; best run can be promoted to `@production`.

---

## Common gotchas
- **Contaminated baseline:** include only healthy data if possible.  

In [1]:
import os, json
import pandas as pd
import numpy as np
import mlflow
from mlflow_base import MLflowBase

# Experiment / run naming
EXPERIMENT_NAME = "winding_fault_monitoring_testing/28/257"
RUN_NAME        = "winding_fault_config_v1"

# End any active run
if mlflow.active_run() is not None:
    mlflow.end_run()

# Start new run
mlbase = MLflowBase(EXPERIMENT_NAME)
run = mlbase.start_run(run_name=RUN_NAME)
print("MLflow run:", mlflow.active_run().info.run_id)

# Core params (minimal)
dataset_path = "iotts.harmonics_257.csv"
tenant_id, machine_id = "28", "257"
mlbase.log_params({
    "dataset_path": os.path.basename(dataset_path),
    "tenant_id": tenant_id,
    "machine_id": machine_id,
})


MLflow run: a56fec3ab6a343749d0461773ee58b2f


In [2]:
# Load harmonics dataset
df = pd.read_csv(dataset_path)
if "timestamp" in df.columns:
    df["timestamp"] = pd.to_datetime(df["timestamp"], errors="coerce")

# Drop meta if present
drop_cols = [c for c in ["_id","metaData.tenant_id","metaData.machine_id"] if c in df.columns]
df = df.drop(columns=drop_cols, errors="ignore")

print("Loaded shape:", df.shape)
display(df.head(3))

# Log dataset shape as metrics
mlbase.log_metrics({
    "n_rows": float(df.shape[0]),
    "n_cols": float(df.shape[1])
})


Loaded shape: (14491, 91)


Unnamed: 0,timestamp,vh1_0,vh2_9,vh2_8,vh2_0,vh1_2,vh3_7,ch1_13,vh3_11,ch1_7,...,ch3_12,ch2_5,vh1_7,vh2_13,ch2_10,ch2_11,vh1_6,ch1_0,ch1_4,ch2_8
0,2025-08-02 08:00:08+00:00,100.0,0.0,0.686252,100.0,0.646852,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.689831,0.0,0.0,0.0
1,2025-08-02 08:02:17+00:00,100.0,0.0,0.605559,100.0,0.777842,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.800501,0.0,0.0,0.0
2,2025-08-02 08:03:26+00:00,100.0,0.0,0.57984,100.0,0.755995,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.81562,0.0,0.0,0.0


In [3]:
# Compute Imbalance
def calculate_imbalance(row):
    return np.max(row) - np.min(row)

current_cols = [c for c in df.columns if c.startswith("c")]
voltage_cols = [c for c in df.columns if c.startswith("v")]

current_imbalance = df[current_cols].apply(calculate_imbalance, axis=1)
voltage_imbalance = df[voltage_cols].apply(calculate_imbalance, axis=1)

cur_mean, cur_std = current_imbalance.mean(), current_imbalance.std()
vol_mean, vol_std = voltage_imbalance.mean(), voltage_imbalance.std()

print(f"Current imbalance → mean {cur_mean:.3f}, std {cur_std:.3f}")
print(f"Voltage imbalance → mean {vol_mean:.3f}, std {vol_std:.3f}")

# Log imbalance stats
mlbase.log_metrics({
    "cur_imbalance_mean": cur_mean,
    "cur_imbalance_std": cur_std,
    "vol_imbalance_mean": vol_mean,
    "vol_imbalance_std": vol_std
})


Current imbalance → mean 35.394, std 47.821
Voltage imbalance → mean 99.758, std 4.910


In [4]:
# Compute Thresholds

# Simple 3σ thresholds
cur_thr = cur_mean + 3*cur_std
vol_thr = vol_mean + 3*vol_std

harmonic_sums = df[current_cols + voltage_cols].sum(axis=1)
harm_thr = harmonic_sums.mean() + 3*harmonic_sums.std()

stat_thr = df[current_cols + voltage_cols].stack().mean() + \
           3*df[current_cols + voltage_cols].stack().std()

thresholds = {
    "current_threshold": round(cur_thr, 3),
    "voltage_threshold": round(vol_thr, 3),
    "harmonic_threshold": round(harm_thr, 3),
    "statistical_threshold": round(stat_thr, 3)
}
print("Computed thresholds:", thresholds)

# Log thresholds as metrics
mlbase.log_metrics(thresholds)


Computed thresholds: {'current_threshold': 178.857, 'voltage_threshold': 114.487, 'harmonic_threshold': 904.708, 'statistical_threshold': 66.878}


In [5]:

# --- Quality Index (higher = better)
# Simple formula: more rows, less variability → better config
quality_index = (1.0 / (cur_std + vol_std + 1e-6)) * np.log1p(df.shape[0])
mlbase.log_metrics({"quality_index": quality_index})
print("Quality index:", quality_index)

Quality index: 0.18170421131930925


In [6]:
thresholds = {
    "current_threshold": float(round(cur_thr, 3)),
    "voltage_threshold": float(round(vol_thr, 3)),
    "harmonic_threshold": float(round(harm_thr, 3)),
    "statistical_threshold": float(round(stat_thr, 3))
}

import yaml   

# save and log config
output_yaml = f"winding_fault_configs_m{machine_id}.yaml"
with open(output_yaml, "w") as f:
    yaml.dump(thresholds, f, sort_keys=False, default_flow_style=False)

print("Config YAML saved →", output_yaml)
mlbase.log_artifact(run.info.run_id, local_path=output_yaml)



Config YAML saved → winding_fault_configs_m257.yaml


INFO:botocore.credentials:Found credentials in environment variables.


In [7]:
# # Automated Promotion: pick best run by quality_index
# from mlflow.tracking import MlflowClient

# client = MlflowClient()
# experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)

# # Fetch past runs
# runs_df = mlflow.search_runs(
#     experiment_ids=[experiment.experiment_id],
#     order_by=["metrics.quality_index DESC"],   # best at top
#     max_results=5
# )

# if not runs_df.empty:
#     best_run = runs_df.iloc[0]
#     best_run_id = best_run.run_id
#     best_qi = best_run["metrics.quality_index"]
#     print(f"Best run: {best_run_id} (quality_index={best_qi:.4f})")

#     # Register this config artifact as model version
#     model_name = f"winding_fault_detector_{tenant_id}_{machine_id}"
#     model_uri  = f"runs:/{best_run_id}/artifacts/{output_json}"

#     result = mlflow.register_model(model_uri=model_uri, name=model_name)

#     # Set alias to @production
#     client.set_registered_model_alias(
#         name=model_name,
#         alias="production",
#         version=result.version
#     )
#     print(f"✅ Promoted {model_name} v{result.version} to @production")
# else:
#     print("No past runs found for promotion.")


In [8]:
mlflow.set_tag(
    "mlflow.note.content",
f"""
### Run Documentation

**Deliverable:**
- `winding_fault_configs_m{machine_id}.yaml` — contains thresholds for detecting winding faults in real-time.

---

**Parameters:**
- `tenant_id`: {tenant_id}
- `machine_id`: {machine_id}

---

**Metrics (for benchmarking):**
- `cur_imbalance_mean/std`: capture imbalance between current phases; large imbalance → winding issue.
- `vol_imbalance_mean/std`: capture imbalance between voltage phases; abnormal supply side.
- `*_thresholds`: derived via mean+3σ rule; used as detection cutoffs for real-time.
- `quality_index`: composite stability score = `(1 / (cur_std + vol_std)) * log1p(n_rows)`.  
   - Higher = better → larger dataset + more stable imbalance statistics.

These metrics are **not for supervised classification** but allow benchmarking and ranking of runs.

---

**Decision Context:**
- The YAML config (artifact) is the actual model consumed by the online detector.
- Metrics provide **internal validation and comparability** across runs.
- **Promotion/selection** is based on `quality_index`:  
  - Higher values mean thresholds are derived from **larger, more stable datasets**,  
  - Used to automatically choose and promote the best config to `@production`.
"""
)


In [9]:
mlflow.end_run()
print("MLflow run ended.")


🏃 View run winding_fault_config_v1 at: https://mlops.zolnoi.app/#/experiments/17/runs/a56fec3ab6a343749d0461773ee58b2f
🧪 View experiment at: https://mlops.zolnoi.app/#/experiments/17
MLflow run ended.
