# Gearbox Fault Model Training

## What this notebook does
We generate a **gearbox-fault configuration YAML** from harmonics data and **log everything to MLflow**.  
This builds harmonic **profiles** (low/intermediate/high orders, optionally odd/even parity), computes per-profile stats for **normalization**, derives an **unlabeled fault score** with a baseline threshold, evaluates **Core-6** metrics, computes a **quality index**, and saves the **YAML artifact** our **real-time inferencing** service consumes.

---

## How to use (5 steps)
1. **Fill the “Enter these configuration when onboarding new machines” block**
   - `EXPERIMENT_NAME`, `RUN_NAME`, `dataset_path`, `tenant_id`, `machine_id`
   - Tune **Algorithm/config knobs** and **Unlabeled evaluation settings** only if needed.
2. **Run all cells top-to-bottom.**
3. **Confirm outputs in the cell logs**
   - Loaded shape, warnings (e.g., low valid rows), printed YAML, metrics, and MLflow run ID.
4. **Verify in MLflow**
   - Parameters, metrics (Core-6 + `quality_index`), artifact `gearbox_fault_configs_<machine_id>.yaml`, and run **Notes** tag.
5. *(Optional)* Use the metrics to **select/promote** the best run in our registry/UI.

---

## Inputs & assumptions
- CSV with harmonics columns named like:  
  - `ch1_7` / `vh3_12` → `c` or `v`, **phase** ∈ {1,2,3}, **harmonic order** as integer.  
- Optional `timestamp` column (parsed if present).  
- Any `_id`, `metaData.tenant_id`, `metaData.machine_id` columns are dropped if present.

---

## What it computes (high level)
- **Profiles** over harmonic buckets (RMS across chosen harmonics, then averaged across phases):
  - Current: **low (1–10)**, **intermediate (11–20)**, **high (21–30)**  
  - *(Optional)* **odd**/**even** parity profiles if `use_parity_profiles=True`
- **Config YAML** with:
  - `stoppage_current_threshold`
  - `norm_args`: mean & std per profile (used for online normalization)
- **Unlabeled evaluation**:
  - Split by `BASELINE_FRACTION` (first part = presumed healthy **baseline**, remainder = **recent**)
  - Z-normalize profiles on the baseline, combine into a **fault_score** (`AGGREGATION`: `max_z` or `mean_z`)
  - Decision **threshold** from baseline (`THRESHOLD_METHOD`: `mean+3sigma` or `p99`)
- **Core-6 metrics** (for comparability without labels):
  - `drift_cohens_d`, `psi_fault_score`, `far_healthy`, `arl0_samples`, `recent_pct_time_anomalous`, `mean_profile_cv_baseline`
- **quality_index** (single ranking score):  
  `cohen_d − 5*far_healthy + 0.5*psi − 0.5*mean_profile_cv_baseline`

---

## Outputs
- **Artifact:** `gearbox_fault_configs_<machine_id>.yaml` (used by real-time detector)  
- **MLflow logs:** parameters (IDs, knobs, eval settings), metrics (shape, Core-6, `quality_index`), artifact, and a **Run Documentation** note explaining each metric/knob.

---

## Parameters to change (when onboarding a new machine)

| What | Variable(s) | Current | When to change |
|---|---|---:|---|
| **Experiment naming** | `EXPERIMENT_NAME`, `RUN_NAME` | `"gearbox_fault_monitoring_testing/28/257"`, `"harmonic_profiling_kstest_v1"` | Always set per tenant/machine or per approach/version. |
| **Dataset path** | `dataset_path` | `"iotts.harmonics_257.csv"` | Point to the new machine’s harmonics CSV. |
| **IDs** | `tenant_id`, `machine_id` | `"28"`, `"257"` | Always set for the new machine. |
| **Stoppage filter** | `stoppage_current_threshold` | `40` | Adjust to our site’s “machine running” current; higher filters more idle periods. |
| **Harmonic buckets** | `low_order_range`, `intermediate_order_range`, `high_order_range` | `(1,10)`, `(11,20)`, `(21,30)` | Change if our order definitions differ (e.g., extend to 40). |
| **Parity profiles** | `use_parity_profiles` | `True` | Disable if odd/even split is not meaningful for that machine. |
| **Baseline split** | `BASELINE_FRACTION` | `0.7` | Increase if we have a longer healthy history; decrease if drifted quickly. |
| **Score aggregation** | `AGGREGATION` | `"max_z"` | `"mean_z"` is smoother; `"max_z"` is more sensitive to any profile spike. |
| **Threshold rule** | `THRESHOLD_METHOD` | `"mean+3sigma"` | Use `"p99"` when baseline is skewed or heavy-tailed. |
| **Column naming** | `extract_info` / `filter_harmonic_cols` | expects `ch<phase>_<order>`, `vh<phase>_<order>` | If our schema differs, update `extract_info` regex & mapping (`c`/`v`). |

> **Keep the artifact naming pattern** (`gearbox_fault_configs_<machine_id>.yaml`) stable so serving can auto-discover it.

---

## How the YAML is used online
- Serving loads the YAML, **normalizes** live profile values using `norm_args` (mean, std per profile), applies the **fault_score** rule and the stored **decision threshold** logic (same policy used here), and raises gearbox-fault flags when exceedance is persistent.

---

## Promotion guideline (rule of thumb)
We promote when:
- `far_healthy < 0.01` and `arl0_samples ≥ 100`
- `drift_cohens_d ≥ 0.8`
- `coverage_valid_row_fraction_min ≥ 0.8`
- `quality_index` ranks near the top across recent runs

(Exact policy can be adapted per asset criticality and cost of false alarms.)

---

## Common gotchas
- **Column parsing mismatch:** If our columns don’t match `ch<phase>_<order>` / `vh<phase>_<order>`, profiles become empty → YAML stats become zeros. Fix `extract_info`.
- **Sparse coverage:** Warnings like “only X valid rows” mean some profiles lack columns; check order ranges & naming.
- **Mixed units/scales:** Ensure consistent preprocessing across data pulls or buckets; otherwise normalization drifts.
- **Contaminated baseline:** If the first `BASELINE_FRACTION` includes faults, thresholds inflate; choose a clean baseline window or switch to `p99`.
- **Idle periods:** Set `stoppage_current_threshold` so we don’t learn thresholds on non-operating data.

---

## TL;DR (quick start)
- Update: `EXPERIMENT_NAME`, `RUN_NAME`, `dataset_path`, `tenant_id`, `machine_id`.  
- (Optional) Tune: buckets, parity, baseline split, aggregation, threshold rule.  
- Run all cells → check printed YAML + MLflow run.  
- Use `quality_index` and Core-6 to pick the best run.  
- Deploy `gearbox_fault_configs_<machine_id>.yaml` to serving.


In [16]:
# --- Enter these configuration when onboarding new machines ---

# Experiment / run naming
EXPERIMENT_NAME = "gearbox_fault_monitoring_testing/28/257"
RUN_NAME        = "harmonic_profiling_kstest_v1"   # algorithm/approach name

# Data & IDs
dataset_path = "iotts.harmonics_257.csv" # or dataset id
tenant_id    = "28"
machine_id   = "257"

In [17]:
# --- Cell 1: Setting up MLflow ---
import os, re, json, math, yaml
import numpy as np
import pandas as pd
import mlflow
from mlflow_base import MLflowBase

# End any existing active run (notebook safety)
if mlflow.active_run() is not None:
    mlflow.end_run()

# Start run
mlbase = MLflowBase(EXPERIMENT_NAME)
run = mlbase.start_run(run_name=RUN_NAME)
print("MLflow run:", mlflow.active_run().info.run_id)


# Algorithm/config knobs
stoppage_current_threshold = 40
low_order_range, intermediate_order_range, high_order_range = (1,10), (11,20), (21,30)
use_parity_profiles = True


# Unlabeled evaluation settings
BASELINE_FRACTION = 0.7                  # healthy portion
AGGREGATION       = "max_z"              # "max_z" or "mean_z"
THRESHOLD_METHOD  = "mean+3sigma"        # or "p99"


# Logging important params
mlbase.log_params({
    "algorithm_name": RUN_NAME,
    "dataset_path": os.path.basename(dataset_path),
    "tenant_id": tenant_id,
    "machine_id": machine_id,
    "stoppage_current_threshold": stoppage_current_threshold,
    "low_order_range": str(low_order_range),
    "intermediate_order_range": str(intermediate_order_range),
    "high_order_range": str(high_order_range),
    "use_parity_profiles": use_parity_profiles,
    "eval_baseline_fraction": BASELINE_FRACTION,
    "eval_aggregation": AGGREGATION,
    "eval_threshold_method": THRESHOLD_METHOD,
})


🏃 View run harmonic_profiling_kstest_v1 at: https://mlops.zolnoi.app/#/experiments/16/runs/5980ef708f524f0cb5998994020dbebc
🧪 View experiment at: https://mlops.zolnoi.app/#/experiments/16
MLflow run: 041bd7651d4c4afb9e4dd5713377eb4f


In [18]:
# --- Cell 2: Load & basic health ---
df = pd.read_csv(dataset_path)
if "timestamp" in df.columns:
    df["timestamp"] = pd.to_datetime(df["timestamp"], errors="coerce")

drop_cols = [c for c in ["_id","metaData.tenant_id","metaData.machine_id"] if c in df.columns]
if drop_cols:
    df = df.drop(columns=drop_cols)

print("Loaded shape:", df.shape)
display(df.head(3))

missing_fraction_overall = float(df.isna().mean().mean()) if df.size else math.nan
mlbase.log_metrics({
    "n_rows": float(df.shape[0]),
    "n_cols": float(df.shape[1]),
    "missing_fraction_overall": missing_fraction_overall
})


Loaded shape: (14491, 91)


Unnamed: 0,timestamp,vh1_0,vh2_9,vh2_8,vh2_0,vh1_2,vh3_7,ch1_13,vh3_11,ch1_7,...,ch3_12,ch2_5,vh1_7,vh2_13,ch2_10,ch2_11,vh1_6,ch1_0,ch1_4,ch2_8
0,2025-08-02 08:00:08+00:00,100.0,0.0,0.686252,100.0,0.646852,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.689831,0.0,0.0,0.0
1,2025-08-02 08:02:17+00:00,100.0,0.0,0.605559,100.0,0.777842,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.800501,0.0,0.0,0.0
2,2025-08-02 08:03:26+00:00,100.0,0.0,0.57984,100.0,0.755995,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.81562,0.0,0.0,0.0


In [19]:
# --- Cell 3: Column parsing helpers ---
def extract_info(col_name: str):
    """
    Parse names like 'ch1_7' or 'vh3_12' => (type in {'c','v'}, phase in {1,2,3}, harmonic order int)
    """
    try:
        parts = col_name.split("_")
        type_phase_str = parts[0]
        freq = int(parts[1])
        m = re.match(r"([a-z]+)([0-9]+)", type_phase_str, re.I)
        if not m: return (None, None, None)
        type_str, phase_str = m.groups()
        t = "c" if type_str.lower()=="ch" else "v" if type_str.lower()=="vh" else None
        ph = int(phase_str)
        return (t, ph, freq) if t in {"c","v"} else (None, None, None)
    except Exception:
        return (None, None, None)

def filter_harmonic_cols(df, t="c", phase=None, order_range=None, parity=None):
    cols = []
    for col in df.columns:
        tt, ph, fr = extract_info(col)
        if tt != t or fr is None: continue
        if phase is not None and ph != phase: continue
        if order_range is not None:
            lo, hi = order_range
            if not (lo <= fr <= hi): continue
        if parity == "odd"  and fr % 2 == 0: continue
        if parity == "even" and fr % 2 != 0: continue
        cols.append(col)
    return cols


In [20]:
# --- Cell 4 (revised): Profiles & compact coverage logging ---
def rms_across_harmonics(subdf: pd.DataFrame) -> pd.Series:
    if subdf.empty: return pd.Series([], dtype=float)
    vals = subdf.to_numpy(dtype=float)
    with np.errstate(invalid="ignore"):
        return pd.Series(np.sqrt(np.nanmean(np.square(vals), axis=1)), index=subdf.index)

def build_profile(df, t="c", order_range=None, parity=None):
    phase_series, debug = [], {}
    for ph in [1,2,3]:
        cols = filter_harmonic_cols(df, t=t, phase=ph, order_range=order_range, parity=parity)
        debug[f"phase{ph}_ncols"] = len(cols)
        phase_series.append(rms_across_harmonics(df[cols]) if cols else pd.Series(index=df.index, dtype=float))
    stacked = pd.concat(phase_series, axis=1) if phase_series else pd.DataFrame(index=df.index)
    profile = stacked.mean(axis=1, skipna=True) if not stacked.empty else pd.Series(index=df.index, dtype=float)
    debug["rows"] = int(df.shape[0])
    debug["valid_rows"] = int(profile.dropna().shape[0])
    return profile, debug

profiles_to_compute = {
    "low_order_current_harmonics":          dict(t="c", order_range=(1,10),  parity=None),
    "intermediate_order_current_harmonics": dict(t="c", order_range=(11,20), parity=None),
    "high_order_current_harmonics":         dict(t="c", order_range=(21,30), parity=None),
}
if use_parity_profiles:
    profiles_to_compute.update({
        "odd_parity_current_harmonics":  dict(t="c", order_range=None, parity="odd"),
        "even_parity_current_harmonics": dict(t="c", order_range=None, parity="even"),
    })

profile_series_map, profile_debug_map = {}, {}
profile_valid_fracs = []
profiles_used = 0

for pname, args in profiles_to_compute.items():
    s, dbg = build_profile(df, **args)
    profile_series_map[pname] = s
    profile_debug_map[pname] = dbg

    if dbg["rows"] > 0:
        valid_frac = dbg["valid_rows"] / dbg["rows"]
        profile_valid_fracs.append(valid_frac)
    if dbg["valid_rows"] > 0:
        profiles_used += 1

# Aggregate coverage signals (compact)
coverage_min = float(np.min(profile_valid_fracs)) if profile_valid_fracs else float("nan")
mlbase.log_metrics({"coverage_valid_row_fraction_min": coverage_min})
mlbase.log_params({"profiles_used": int(profiles_used)})

# Optional: warn if any profile has very low valid rows
for pname, dbg in profile_debug_map.items():
    if dbg["valid_rows"] < 50:
        print(f"WARNING: {pname} has only {dbg['valid_rows']} valid rows.")


  return pd.Series(np.sqrt(np.nanmean(np.square(vals), axis=1)), index=subdf.index)
  return pd.Series(np.sqrt(np.nanmean(np.square(vals), axis=1)), index=subdf.index)
  return pd.Series(np.sqrt(np.nanmean(np.square(vals), axis=1)), index=subdf.index)
  return pd.Series(np.sqrt(np.nanmean(np.square(vals), axis=1)), index=subdf.index)
  return pd.Series(np.sqrt(np.nanmean(np.square(vals), axis=1)), index=subdf.index)
  return pd.Series(np.sqrt(np.nanmean(np.square(vals), axis=1)), index=subdf.index)
  return pd.Series(np.sqrt(np.nanmean(np.square(vals), axis=1)), index=subdf.index)
  return pd.Series(np.sqrt(np.nanmean(np.square(vals), axis=1)), index=subdf.index)




In [21]:
# --- Cell 5: Build config & save YAML (artifact) ---
def safe_pair(mean_val, std_val):
    m = float(mean_val) if not np.isnan(mean_val) else 0.0
    s = float(std_val)  if not np.isnan(std_val)  else 0.0
    return [round(m,6), round(s,6)]

# Compute mean/std per profile
profile_stats_map = {
    p: {
        "mean": float(v.dropna().mean()) if v.dropna().size else float("nan"),
        "std":  float(v.dropna().std(ddof=0)) if v.dropna().size>1 else 0.0
    }
    for p, v in profile_series_map.items()
}

config = {
    "stoppage_current_threshold": stoppage_current_threshold,
    "norm_args": {
        "low_order_current_harmonics":          safe_pair(profile_stats_map["low_order_current_harmonics"]["mean"],          profile_stats_map["low_order_current_harmonics"]["std"]),
        "intermediate_order_current_harmonics": safe_pair(profile_stats_map["intermediate_order_current_harmonics"]["mean"], profile_stats_map["intermediate_order_current_harmonics"]["std"]),
        "high_order_current_harmonics":         safe_pair(profile_stats_map["high_order_current_harmonics"]["mean"],         profile_stats_map["high_order_current_harmonics"]["std"]),
    }
}
if use_parity_profiles:
    config["norm_args"]["odd_parity_current_harmonics"]  = safe_pair(profile_stats_map["odd_parity_current_harmonics"]["mean"],  profile_stats_map["odd_parity_current_harmonics"]["std"])
    config["norm_args"]["even_parity_current_harmonics"] = safe_pair(profile_stats_map["even_parity_current_harmonics"]["mean"], profile_stats_map["even_parity_current_harmonics"]["std"])

output_yaml = f"gearbox_fault_configs_{machine_id}.yaml"
with open(output_yaml, "w") as f:
    yaml.dump(config, f, sort_keys=False, default_flow_style=False)

print("✅ Generated Gearbox Fault Detection Config (YAML):")
print(yaml.dump(config, sort_keys=False, default_flow_style=False))

# Log the YAML artifact (the deliverable)
mlbase.log_artifact(run.info.run_id, local_path=output_yaml)


✅ Generated Gearbox Fault Detection Config (YAML):
stoppage_current_threshold: 40
norm_args:
  low_order_current_harmonics:
  - 0.521031
  - 0.783555
  intermediate_order_current_harmonics:
  - 0.09268
  - 0.153546
  high_order_current_harmonics:
  - 0.0
  - 0.0
  odd_parity_current_harmonics:
  - 0.027823
  - 0.043371
  even_parity_current_harmonics:
  - 12.519126
  - 16.92035



In [22]:
# --- Cell 6 - A (revised): Unlabeled evaluation — Core-6 metrics only ---
from scipy.stats import ks_2samp

# 1) Composite fault score from profiles (z-normalized on baseline)
profiles_df = pd.DataFrame(profile_series_map).sort_index()
n = len(profiles_df)
split_idx = int(n * BASELINE_FRACTION)

baseline_df = profiles_df.iloc[:split_idx].copy()
recent_df   = profiles_df.iloc[split_idx:].copy()

baseline_means = baseline_df.mean(skipna=True)
baseline_stds  = baseline_df.std(skipna=True).replace(0, np.nan)

z_df     = (profiles_df - baseline_means) / baseline_stds
z_pos_df = z_df.clip(lower=0)

if AGGREGATION == "mean_z":
    fault_score = z_pos_df.mean(axis=1, skipna=True)
else:  # default "max_z"
    fault_score = z_pos_df.max(axis=1, skipna=True)

baseline_scores = fault_score.iloc[:split_idx].dropna()
recent_scores   = fault_score.iloc[split_idx:].dropna()

# 2) Threshold from baseline
if THRESHOLD_METHOD == "p99":
    thr = float(np.nanpercentile(baseline_scores, 99))
else:  # "mean+3sigma"
    thr = float(baseline_scores.mean() + 3 * baseline_scores.std())

mlbase.log_params({"eval_decision_threshold": thr})

# 3) Helper funcs
def cohens_d(a, b):
    a, b = np.asarray(a, float), np.asarray(b, float)
    a, b = a[~np.isnan(a)], b[~np.isnan(b)]
    if a.size < 2 or b.size < 2: return float("nan")
    m1, m2 = a.mean(), b.mean()
    s = np.sqrt(((a.var(ddof=1) + b.var(ddof=1)) / 2.0))
    return float((m2 - m1) / s) if s > 0 else float("nan")

def psi(a, b, bins=10):
    a = np.asarray(a, float); b = np.asarray(b, float)
    a = a[~np.isnan(a)]; b = b[~np.isnan(b)]
    if a.size < 10 or b.size < 10: return float("nan")
    qs = np.quantile(a, np.linspace(0,1,bins+1)); qs[0], qs[-1] = -np.inf, np.inf
    val = 0.0
    for i in range(bins):
        p = ((a>=qs[i]) & (a<qs[i+1])).mean()
        q = ((b>=qs[i]) & (b<qs[i+1])).mean()
        p = max(p,1e-6); q = max(q,1e-6)
        val += (q-p) * np.log(q/p)
    return float(val)

# 4) Core-6 metrics
cohen_d = cohens_d(baseline_scores, recent_scores)          # separability
psi_val = psi(baseline_scores, recent_scores, bins=10)      # distribution shift
far_healthy = float(np.mean(baseline_scores >= thr))        # false alarms on baseline
arl0 = float(1.0 / max(far_healthy, 1e-9))                  # expected samples between false alarms
recent_flags = (recent_scores >= thr).astype(int)
pct_time_anom_recent = float(recent_flags.mean())           # % anomalous in recent
# mean CV across profiles on baseline
cv_vals = []
for p in baseline_df.columns:
    mu = baseline_df[p].mean(skipna=True)
    sd = baseline_df[p].std(skipna=True)
    cv = (sd/mu) if (mu not in [0, np.nan] and not np.isnan(mu)) else np.nan
    if not np.isnan(cv): cv_vals.append(cv)
mean_profile_cv_baseline = float(np.mean(cv_vals)) if cv_vals else float("nan")

# Log ONLY the Core-6
mlbase.log_metrics({
    "drift_cohens_d": cohen_d,
    "psi_fault_score": psi_val,
    "far_healthy": far_healthy,
    "arl0_samples": arl0,
    "recent_pct_time_anomalous": pct_time_anom_recent,
    "mean_profile_cv_baseline": mean_profile_cv_baseline
})

print("Logged metrics:",
      {k: v for k, v in {
          "drift_cohens_d": cohen_d,
          "psi_fault_score": psi_val,
          "far_healthy": far_healthy,
          "arl0_samples": arl0,
          "recent_pct_time_anomalous": pct_time_anom_recent,
          "mean_profile_cv_baseline": mean_profile_cv_baseline
      }.items()})


Logged metrics: {'drift_cohens_d': 0.3574004000931431, 'psi_fault_score': 0.14174696734949796, 'far_healthy': 0.006802721088435374, 'arl0_samples': 147.0, 'recent_pct_time_anomalous': 0.005059797608095676, 'mean_profile_cv_baseline': 1.74054039145304}


In [23]:
# Cell 6-B : A single quality index to sort runs 
quality_index = (
    (cohen_d)
    - 5.0 * (far_healthy)
    + 0.5 * (psi_val if not np.isnan(psi_val) else 0.0)
    - 0.5 * (mean_profile_cv_baseline if not np.isnan(mean_profile_cv_baseline) else 0.0)
)
mlbase.log_metrics({"quality_index": float(quality_index)})
print("quality_index:", quality_index)

quality_index: -0.4760099174008047


In [24]:
# --- Cell X: Run documentation (Notes) ---
import mlflow
import numpy as np

def _fmt(x, nd=4):
    if x is None or (isinstance(x, float) and (np.isnan(x) or np.isinf(x))):
        return "NaN"
    try:
        return f"{float(x):.{nd}f}"
    except Exception:
        return str(x)

mlflow.set_tag(
    "mlflow.note.content",
f"""
### Run Documentation

This note summarizes **what we logged** in MLflow for quick comparison across runs and why each item matters.

---

#### 📦 Artifact (the deliverable)
- **Config YAML**: `{output_yaml}`    
  This is the file used by the online realtime detector.

---

#### ⚙️ Parameters (setup & reproducibility)
- **algorithm_name**: `{RUN_NAME}` — approach used to generate the config.
- **tenant_id / machine_id**: `{tenant_id}` / `{machine_id}` — routing & traceability.
- **dataset_path**: `{os.path.basename(dataset_path)}` — source data identifier.
- **stoppage_current_threshold**: `{stoppage_current_threshold}` — filters non-operating periods.
- **harmonic buckets**: low `{low_order_range}`, intermediate `{intermediate_order_range}`, high `{high_order_range}`.
- **use_parity_profiles**: `{use_parity_profiles}` — include odd/even profiles.
- **eval_baseline_fraction**: `{BASELINE_FRACTION}` — portion assumed healthy to derive baseline.
- **eval_aggregation**: `{AGGREGATION}` — how profiles combine into one fault score.
- **eval_threshold_method**: `{THRESHOLD_METHOD}` — rule for decision threshold on baseline.
- **profiles_used**: `{profiles_used}` — number of profiles with usable data.
- **eval_decision_threshold**: `{_fmt(thr)}` — derived operating threshold for the fault score.

---

#### 📊 Core Metrics (for benchmarking without labels)
- **missing_fraction_overall**: `{_fmt(missing_fraction_overall)}` — overall data completeness.
- **coverage_valid_row_fraction_min**: `{_fmt(coverage_min)}` — worst coverage across profiles.

**Separability & shift**
- **drift_cohens_d**: `{_fmt(cohen_d)}` — effect size between recent vs baseline fault scores.
- **psi_fault_score**: `{_fmt(psi_val)}` — population stability index for fault score.

**False-alarm behavior**
- **far_healthy**: `{_fmt(far_healthy)}` — fraction of baseline flagged.
- **arl0_samples**: `{_fmt(arl0)}` — expected samples between false alarms.

**Operational recent behavior**
- **recent_pct_time_anomalous**: `{_fmt(pct_time_anom_recent)}` — fraction of recent period flagged.

**Baseline stability**
- **mean_profile_cv_baseline**: `{_fmt(mean_profile_cv_baseline)}` — average coefficient of variation across profiles in healthy window.

**Single ranking score**
- **quality_index**: `{_fmt(quality_index)}`  
  Formula: `cohen_d - 5*far_healthy + 0.5*psi - 0.5*mean_profile_cv_baseline`.

---

#### ✅ Promotion guideline
Promote when:
- `far_healthy < 0.01` and `arl0_samples ≥ 100`, **and**
- `drift_cohens_d ≥ 0.8`, **and**
- `coverage_valid_row_fraction_min ≥ 0.8`, **and**
- `quality_index` ranks near the top among runs for this machine/config family.

*Aggregation:* `{AGGREGATION}` · *Thresholding:* `{THRESHOLD_METHOD}` · *Baseline fraction:* `{BASELINE_FRACTION}`
"""
)


In [25]:
# --- Cell 7: Quick sanity view (optional) ---
check_df = pd.DataFrame(profile_series_map)
display(check_df.describe())
print("Threshold (baseline-derived):", thr)


Unnamed: 0,low_order_current_harmonics,intermediate_order_current_harmonics,high_order_current_harmonics,odd_parity_current_harmonics,even_parity_current_harmonics
count,14491.0,14491.0,0.0,14491.0,14491.0
mean,0.521031,0.09268,,0.027823,12.519126
std,0.783582,0.153551,,0.043372,16.920934
min,0.0,0.0,,0.0,0.0
25%,0.0,0.0,,0.0,0.0
50%,0.0,0.0,,0.0,0.0
75%,1.048286,0.158427,,0.061778,35.374847
max,6.307929,1.712915,,0.876519,36.245903


Threshold (baseline-derived): 3.74718309784166


In [26]:
# --- Cell 8: End MLflow run ---
mlflow.end_run()
print("MLflow run ended.")


🏃 View run harmonic_profiling_kstest_v1 at: https://mlops.zolnoi.app/#/experiments/16/runs/041bd7651d4c4afb9e4dd5713377eb4f
🧪 View experiment at: https://mlops.zolnoi.app/#/experiments/16
MLflow run ended.
