# AICD Manufacturing Dataset — Anomaly Detection

This notebook covers:
1. Data exploration
2. Anomaly detection models (Isolation Forest + Autoencoder)
3. Model evaluation
4. Business interpretation

### Objective : Build a memory-safe, deployable anomaly detection pipeline for AICD data that flags imminent item-drop risk (and related anomalies) in near real-time, enabling proactive maintenance and reduced downtime.

### 1: Imports & Config

In [10]:
# Ensure TensorFlow is installed in this kernel and importable
%pip install -q --upgrade pip
%pip install -q tensorflow==2.20.0

try:
    import tensorflow as tf
    print("TensorFlow version:", tf.__version__)
    print("TF import OK")
except Exception as e:
    print("TensorFlow import failed:", repr(e))
    print("On Windows, install 'Microsoft Visual C++ Redistributable (x64)' and restart the kernel.")

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
TensorFlow import failed: ModuleNotFoundError("No module named 'tensorflow.python'")
On Windows, install 'Microsoft Visual C++ Redistributable (x64)' and restart the kernel.


ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'C:\\Users\\shari\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python311\\site-packages\\tensorflow\\include\\external\\com_github_grpc_grpc\\src\\core\\ext\\filters\\fault_injection\\fault_injection_service_config_parser.h'



In [11]:
import os
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import IsolationForest
from sklearn.metrics import classification_report, confusion_matrix, average_precision_score

# Optional deep learning autoencoder
try:
    from tensorflow import keras
    from tensorflow.keras import layers
    TENSORFLOW_AVAILABLE = True
except Exception:
    TENSORFLOW_AVAILABLE = False

# Config (robust paths and sampling to avoid OOM)
ROOT_DIR = os.getcwd()
DATA_DIR = os.path.join(ROOT_DIR, 'authenticIndustrialCloudDataDataset', 'data')
DATA_FILE = os.path.join(DATA_DIR, 'Data1.csv')
CSV_SEP = ';'
TARGET_COL = 'Alarm.ItemDroppedError'
RANDOM_STATE = 42
SAMPLE_ROWS = 300_000  # increased to include anomalies for AP calculation

print('Using data file:', DATA_FILE)
print('TensorFlow available:', TENSORFLOW_AVAILABLE)


Using data file: c:\Users\shari\Desktop\Anomaly_Detection\authenticIndustrialCloudDataDataset\data\Data1.csv
TensorFlow available: False


### 2: Load & Explore the Data


In [12]:
def load_data_balanced(file_path=DATA_FILE, sep=CSV_SEP, total_rows=SAMPLE_ROWS, min_pos=2000, chunksize=50000):
    collected_pos = []
    collected_neg = []
    pos_count = 0
    total_count = 0
    for chunk in pd.read_csv(file_path, sep=sep, chunksize=chunksize):
        chunk = chunk.loc[:, ~chunk.columns.str.contains('^Unnamed')]
        if TARGET_COL in chunk.columns:
            pos = chunk[chunk[TARGET_COL] == 1]
            neg = chunk[chunk[TARGET_COL] == 0]
        else:
            pos = chunk.iloc[0:0]
            neg = chunk
        # collect anomalies up to min_pos
        if not pos.empty and pos_count < min_pos:
            need = min_pos - pos_count
            take = min(need, len(pos))
            if take > 0:
                collected_pos.append(pos.head(take))
                pos_count += take
                total_count += take
        # collect normals up to total_rows
        if total_rows is not None and total_count < total_rows and not neg.empty:
            remaining = total_rows - total_count
            take = min(remaining, len(neg))
            if take > 0:
                collected_neg.append(neg.sample(n=take, random_state=RANDOM_STATE))
                total_count += take
        if total_rows is not None and total_count >= total_rows and pos_count >= min_pos:
            break
    # fallback if no chunks found
    if not collected_pos and not collected_neg:
        df = pd.read_csv(file_path, sep=sep, nrows=total_rows)
        df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
        return df
    import numpy as np
    parts = []
    if collected_pos:
        parts.append(pd.concat(collected_pos, ignore_index=True))
    if collected_neg:
        parts.append(pd.concat(collected_neg, ignore_index=True))
    df = pd.concat(parts, ignore_index=True)
    if TARGET_COL in df.columns:
        # shuffle to mix classes
        df = df.sample(frac=1.0, random_state=RANDOM_STATE).reset_index(drop=True)
    return df

def eda(df):
    print("Shape:", df.shape)
    print("Columns:", len(df.columns))
    print("Missing values:", int(df.isna().sum().sum()))
    if TARGET_COL in df.columns:
        print("Target distribution:")
        print(df[TARGET_COL].value_counts())
    print(df.head(3))

# Example usage
df = load_data_balanced()
eda(df)


Shape: (300710, 96)
Columns: 96
Missing values: 0
Target distribution:
Alarm.ItemDroppedError
0    298710
1      2000
Name: count, dtype: int64
   Relative time        Date            Time  BellowSoftTouch.SequenceNo  \
0     2508602848  25-09-2019  07:37:22.848,0                         100   
1     2669612848  25-09-2019  07:40:03.858,0                         101   
2     1550112848  25-09-2019  07:21:24.358,0                         105   

   IO.ToolTemperature  IO.CurrentVacuumMotor  Sequence.SoftTouchSequenceNr  \
0                7912                   1382                           100   
1                7976                   1177                           101   
2                7656                   1081                           105   

   Sequence.SoftTouchActuator  Sequence.WantedSoleHeight  \
0                           2                        210   
1                           4                       1056   
2                           2                        936  

### 3: Preprocessing  
### We extract numerical features, handle missing values, and scale them.


In [13]:
def preprocess_for_model(df, scaler=StandardScaler()):
    feature_cols = df.select_dtypes(include=[np.number]).columns.tolist()
    if TARGET_COL in feature_cols:
        feature_cols.remove(TARGET_COL)
    # Guard: if no numeric features, raise a clear message
    if not feature_cols:
        raise ValueError("No numeric features found to model.")
    X = df[feature_cols].fillna(df[feature_cols].mean())
    Xs = scaler.fit_transform(X)
    return Xs, feature_cols

Xs, features = preprocess_for_model(df)
print("Features used (count):", len(features))
print("First 10:", features[:10])


Features used (count): 93
First 10: ['Relative time', 'BellowSoftTouch.SequenceNo', 'IO.ToolTemperature', 'IO.CurrentVacuumMotor', 'Sequence.SoftTouchSequenceNr', 'Sequence.SoftTouchActuator', 'Sequence.WantedSoleHeight', 'CompletedLog.IdleTime', 'CompletedLog.TimeToPickPos', 'CompletedLog.ToolPickTime']


### 4: Train Isolation Forest (Unsupervised)

In [14]:
def train_isolation_forest(X_train, contamination=0.01):
    iso = IsolationForest(n_estimators=200, contamination=contamination, random_state=RANDOM_STATE)
    iso.fit(X_train)
    scores = -iso.score_samples(X_train)  # higher = more anomalous
    return iso, scores

# Estimate contamination from target if available, else default
contam = 0.01
if TARGET_COL in df.columns:
    rate = max(1e-5, float(df[TARGET_COL].mean()))
    contam = min(0.02, max(0.001, rate * 3.0))

iso, iso_scores = train_isolation_forest(Xs, contamination=contam)
print("Contamination used:", contam)
print("Sample anomaly scores:", iso_scores[:10])


Contamination used: 0.019952778424395596
Sample anomaly scores: [0.48033607 0.50096356 0.52704616 0.3959773  0.42038036 0.42773469
 0.48552228 0.39655881 0.40940552 0.40733634]


### 5: Autoencoder (Optional, Deep Learning)

In [15]:
def build_dense_autoencoder(input_dim, encoding_dim=16):
    if not TENSORFLOW_AVAILABLE:
        raise RuntimeError("TensorFlow not available.")
    inp = keras.Input(shape=(input_dim,))
    x = layers.Dense(64, activation="relu")(inp)
    encoded = layers.Dense(encoding_dim, activation="relu")(x)
    decoded = layers.Dense(input_dim, activation="linear")(encoded)
    model = keras.Model(inp, decoded)
    model.compile(optimizer="adam", loss="mse")
    return model

if TENSORFLOW_AVAILABLE:
    autoencoder = build_dense_autoencoder(Xs.shape[1], encoding_dim=16)
    autoencoder.summary()
else:
    print("TensorFlow not available; skipping autoencoder.")


TensorFlow not available; skipping autoencoder.


### 6: Evaluation (if labels are available)

In [16]:
if TARGET_COL in df.columns:
    y_true = df[TARGET_COL].astype(int)
    try:
        ap = average_precision_score(y_true, iso_scores)
        print("Average Precision Score (IsolationForest):", ap)
    except Exception as e:
        print("Could not compute AP:", e)


Average Precision Score (IsolationForest): 0.06877506515524155


### 7: Business Interpretation

- **Use-case and outcome:** Detect anomalous sensor behavior to reduce unplanned downtime and scrap, improving OEE and throughput.
- **Rarity of events:** If `Alarm.ItemDroppedError` is rare, unsupervised methods (Isolation Forest, optional Autoencoder) help surface early-warning signals.
- **Alerting strategy:**
  - Define severity tiers from model scores (e.g., Info/Warning/Critical with score thresholds).
  - Route Critical alerts to on-call maintenance; batch lower tiers into daily summaries to reduce noise.
- **Dashboards and monitoring:**
  - Show anomaly rate over time, top contributing sensors/features, and recent incidents.
  - Include filtering by machine, shift, product, operator, and recipe.
- **Maintenance workflow integration:**
  - Create tickets automatically when Critical anomalies persist for N minutes or recur within T hours.
  - Attach context windows (pre/post anomaly) and sensor snapshots to work orders.
- **Root-cause triage:**
  - Display top deviating features per anomaly instance to guide initial inspection.
  - Correlate anomalies with changeovers, maintenance logs, and environmental/utility data (e.g., air pressure, temperature).
- **KPIs and targets:**
  - Track precision@K for alerts, MTTR, avoided downtime, scrap reduction, and alert acknowledgment latency.
  - Set acceptance criteria (e.g., precision ≥ 0.6 on Critical alerts, acknowledgment < 10 min).
- **False positives/negatives handling:**
  - Provide one-click feedback (Valid issue / False alarm) to retrain and recalibrate thresholds.
  - Suppress alerts during planned maintenance or known transient states.
- **Feedback loop (HITL):**
  - Capture maintainer feedback and resolution codes; use them to label data and improve models.
  - Periodically review the most frequent false-alarm patterns and adjust features or rules.
- **Model monitoring and governance:**
  - Monitor data drift (feature distributions), performance drift (AP, precision), and data quality (nulls, ranges).
  - Establish retraining cadence (e.g., monthly or after N validated incidents) and change management approvals.
- **Explainability and auditability:**
  - Log model version, features, thresholds, and explanations per alert for traceability.
  - Provide simple reason codes (e.g., “Vacuum pressure variance above baseline”).
- **Rollout plan:**
  - Start in shadow mode (no-alert) → pilot line with few Critical alerts → plant-wide.
  - A/B compare with current rules or thresholds to quantify incremental value.
- **SLA and on-call:**
  - Define alerting windows, response SLAs, and escalation paths.
  - Document an incident playbook for repeated anomaly types.
- **Security, privacy, compliance:**
  - Control access to anomaly data and explanations; log access for audits.
  - Ensure compliance with data retention and regulatory standards.
- **Performance and scaling:**
  - Specify latency and throughput targets (e.g., scoring within 1s per batch of N rows).
  - Use batch scoring for high-volume sensors; stream only KPIs needed for alerting.
- **Data enrichment:**
  - Join with contextual data (maintenance logs, BOM/recipe, operator rosters, environmental sensors) to improve precision.
- **Next steps checklist:**
  - Finalize alert thresholds with operations; define SLAs and escalation.
  - Build dashboard panels (anomaly trend, top features, recent incidents).
  - Instrument feedback capture in CMMS/ticketing and wire back to training data.
  - Stand up drift/data-quality monitors and schedule retraining.
