# üèóÔ∏è Crane Predictive Maintenance & Root Cause Analysis

This notebook demonstrates **Predictive Maintenance (PdM)** and **Root Cause Analysis (RCA)** on a synthetic crane drive dataset.

*Dieses Notebook demonstriert **Predictive Maintenance (PdM)** und **Root Cause Analysis (RCA)** anhand eines synthetischen Kran-Antriebsdatensatzes.*

**Data / Daten:** A synthetic dataset that simulates sensor readings from a bridge or tower crane hoist unit, including:
- `Load_kg` ‚Äî Current hook load / Aktuelle Last am Haken
- `Motor_Temp` ‚Äî Hoist motor temperature / Temperatur des Hubmotors
- `Vibration` ‚Äî Vibration at the hoist unit (mm/s) / Schwingung am Hubwerk (mm/s)
- `Brake_Wear` ‚Äî Remaining brake pad thickness (mm) / Verbleibende Dicke der Bremsbel√§ge (mm)
- `Error_Code` ‚Äî Fault label / Fehlerbezeichnung

**Analysis Goals / Analyseziele:**
1. **Root Cause Analysis:** Train an XGBoost classifier to predict fault codes from sensor readings.
2. **Predictive Maintenance:** Use linear regression to forecast when brake pads will reach the critical threshold of 1.0 mm.

### Dataset Generation and Loading / Datensatz-Erzeugung und -Laden

This section generates the synthetic crane dataset using the `generate_crane_dataset.py` script and loads it into a pandas DataFrame.

*Dieser Abschnitt erzeugt den synthetischen Kran-Datensatz mithilfe des Skripts `generate_crane_dataset.py` und l√§dt ihn in einen pandas DataFrame.*

1. **Generate Dataset**: Run the generator script to produce `data/kran_wartung_daten.csv`.
2. **Load Dataset**: Read the CSV into a pandas DataFrame.
3. **Initial Inspection**: Display shape and first rows.

In [None]:
import subprocess
from pathlib import Path

import pandas as pd

# Paths
repo_root = Path("..")
csv_path = repo_root / "data" / "kran_wartung_daten.csv"
script_path = repo_root / "scripts" / "generate_crane_dataset.py"

# Generate dataset if it does not exist yet
if not csv_path.exists():
    result = subprocess.run(
        ["python", str(script_path), "--output", str(csv_path)],
        capture_output=True,
        text=True,
    )
    print(result.stdout or result.stderr)

# Load dataset
df = pd.read_csv(csv_path, parse_dates=["Timestamp"])

print(f"Dataset shape: {df.shape}")
df.head()

### Exploratory Data Analysis / Explorative Datenanalyse

Describe the dataset statistically and inspect fault class distribution.

*Statistische Beschreibung des Datensatzes und √úberblick √ºber die Fehlerklassen-Verteilung.*

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Statistical summary
print("=== Statistical Summary ===")
display(df.describe())

# Fault class distribution
print("\n=== Error Code Distribution ===")
print(df["Error_Code"].value_counts())

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
fig.suptitle("Crane Sensor Distributions by Fault Type", fontsize=14)

features = ["Load_kg", "Motor_Temp", "Vibration", "Brake_Wear"]
for ax, feature in zip(axes.flat, features):
    for label, group in df.groupby("Error_Code"):
        ax.hist(group[feature], bins=30, alpha=0.5, label=label)
    ax.set_title(feature)
    ax.set_xlabel(feature)
    ax.set_ylabel("Count")
    ax.legend(fontsize=7)

plt.tight_layout()

# Save visualisation
viz_path = repo_root / "docs" / "crane_maintenance_insights.png"
plt.savefig(viz_path, dpi=100, bbox_inches="tight")
print(f"‚úÖ Visualization saved to {viz_path}")
plt.show()

### Root Cause Analysis ‚Äî Fault Classification / Fehlerklassifikation

Train an **XGBoost** classifier to identify the fault code from sensor readings.

*Training eines **XGBoost**-Klassifikators zur Identifikation des Fehlercodes aus Sensordaten.*

Pipeline:
1. Encode the categorical target `Error_Code` with `LabelEncoder`.
2. Scale numeric features with `StandardScaler`.
3. 80 / 20 train / test split.
4. Fit XGBoost classifier.
5. Evaluate with a classification report.

In [None]:
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from xgboost import XGBClassifier

# Feature columns and target
feature_cols = ["Load_kg", "Motor_Temp", "Vibration", "Brake_Wear"]
X = df[feature_cols].copy()
y_raw = df["Error_Code"].copy()

# Encode target labels
le = LabelEncoder()
y = le.fit_transform(y_raw)

# Scale numeric features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train / test split (80 / 20)
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42
)

# Train XGBoost classifier
clf = XGBClassifier(
    n_estimators=300,
    max_depth=6,
    learning_rate=0.05,
    eval_metric="mlogloss",
    random_state=42,
)
clf.fit(X_train, y_train)

# Evaluate
y_pred = clf.predict(X_test)
# Use only labels present in the data; zero_division=0 avoids warnings for unseen classes
present_labels = sorted(set(y))
present_names = [le.classes_[i] for i in present_labels]
print("=== Classification Report ===")
print(classification_report(y_test, y_pred, labels=present_labels, target_names=present_names, zero_division=0))

In [None]:
# Feature importance plot
importances = clf.feature_importances_
feat_df = pd.DataFrame({"Feature": feature_cols, "Importance": importances})
feat_df = feat_df.sort_values("Importance", ascending=False)

plt.figure(figsize=(7, 4))
sns.barplot(data=feat_df, x="Importance", y="Feature", hue="Feature", palette="viridis", legend=False)
plt.title("XGBoost Feature Importance ‚Äî Root Cause Analysis")
plt.tight_layout()
plt.show()

### Predictive Maintenance ‚Äî Brake Wear Forecast / Bremsbelag-Verschlei√üvorhersage

Use **linear regression** over time to predict when the brake pad thickness (`Brake_Wear`) will reach the critical threshold of **1.0 mm**.

*Einsatz von **linearer Regression** √ºber die Zeit zur Vorhersage, wann die Bremsbelagdicke (`Brake_Wear`) den kritischen Grenzwert von **1,0 mm** unterschreitet.*

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression

# Use row index as time proxy (hours since start)
hours = np.arange(len(df)).reshape(-1, 1)
brake_wear = df["Brake_Wear"].values

# Fit linear regression
reg = LinearRegression()
reg.fit(hours, brake_wear)

print(f"Slope (mm per hour): {reg.coef_[0]:.6f}")
print(f"Intercept:           {reg.intercept_:.4f}")

# Predict: when does the wear line cross 1.0 mm?
# 1.0 = slope * t + intercept  =>  t = (1.0 - intercept) / slope
critical_threshold_mm = 1.0
t_critical = (critical_threshold_mm - reg.intercept_) / reg.coef_[0]
t_critical_date = df["Timestamp"].iloc[0] + pd.Timedelta(hours=t_critical)

print(f"\n‚ö†Ô∏è  Predicted maintenance date (brake at {critical_threshold_mm} mm):")
print(f"    {t_critical_date.strftime('%Y-%m-%d %H:%M')}")
print(f"    (~{int(t_critical - len(df))} hours from last observation)")

In [None]:
# Visualise brake wear trend and maintenance prediction
future_hours = np.arange(int(t_critical) + 50).reshape(-1, 1)
predicted_wear = reg.predict(future_hours)

plt.figure(figsize=(10, 5))
plt.plot(df["Timestamp"], brake_wear, label="Measured Brake Wear", alpha=0.7)
future_timestamps = [
    df["Timestamp"].iloc[0] + pd.Timedelta(hours=int(h)) for h in future_hours.flatten()
]
plt.plot(future_timestamps, predicted_wear, "--", color="orange", label="Linear Trend")
plt.axhline(y=critical_threshold_mm, color="red", linestyle=":", label=f"Critical: {critical_threshold_mm} mm")
plt.axvline(x=t_critical_date, color="red", linestyle="--", alpha=0.6, label=f"Maintenance due: {t_critical_date.strftime('%Y-%m-%d')}")
plt.xlabel("Timestamp")
plt.ylabel("Brake Wear (mm)")
plt.title("Predictive Maintenance ‚Äî Brake Pad Wear Forecast")
plt.legend()
plt.tight_layout()
plt.show()