# Tata Steel Motor Anomaly Detection Capstone Project

This notebook performs exploratory analysis and machine learning modeling for the predictive maintenance of a roller table motor.

It uses the `tata_steel_rot_motor_proxy.csv` dataset, which contains simulated sensor data.


## Part 1: Exploratory Data Analysis (EDA)
Here we visualize the generated data to understand normal operating conditions and identify anomalies. We look at correlations between current, temperature, and vibration.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset
df = pd.read_csv("tata_steel_rot_motor_proxy.csv")

# Display basic info
print("Dataset Head:")
print(df.head())
print("\nDataset Info:")
df.info()

## Part 2: Machine Learning Model
We train a Random Forest Regressor to predict normal motor current based on other sensor parameters. Significant deviations between predicted and actual current indicate potential anomalies. We also calculate a Health Index to track damage progression.


### 1.1 Data Visualization
Let's look at the Motor Current and Winding Temperature over time to see the general trends.

In [None]:
plt.figure(figsize=(12, 4))
plt.plot(df["Current_Amp"], label="Motor Current (A)", alpha=0.7)
plt.plot(df["Winding_Temp_C"], label="Winding Temperature (°C)", alpha=0.7)
plt.title("Motor Current vs Winding Temperature")
plt.xlabel("Time Index")
plt.legend()
plt.tight_layout()
plt.show()

### 1.2 Zoomed View
The full dataset can be crowded. Let's zoom in on the first 300 seconds to see the cyclic loading pattern more clearly.

In [None]:
# ---- Zoomed-in view (first 300 seconds) ----
zoom_df = df.iloc[0:300]  # first 5 minutes (300 seconds)

plt.figure(figsize=(12, 4))
plt.plot(zoom_df["Current_Amp"], label="Motor Current (A)")
plt.plot(zoom_df["Winding_Temp_C"], label="Winding Temperature (°C)")
plt.title("Zoomed View: Motor Current vs Winding Temperature")
plt.xlabel("Time Index (seconds)")
plt.legend()
plt.tight_layout()
plt.show()

### 1.3 Rule-Based Anomaly Detection
We can detect simple anomalies by checking if vibration levels exceed a fixed threshold (e.g., 7.0 mm/s).

In [None]:
# ---- Simple Anomaly Detection (Rule-Based) ----
vibration_threshold = 7.0  # mm/s (abnormally high)

anomalies = df[df["Vibration_mm_s"] > vibration_threshold]

print("Number of vibration anomalies detected:", len(anomalies))
print(anomalies[["Timestamp", "Vibration_mm_s"]].head())

In [None]:
# ---- Plot Vibration with Anomalies Highlighted ----
plt.figure(figsize=(12, 4))
plt.plot(df["Vibration_mm_s"], label="Vibration (mm/s)", alpha=0.7)
plt.scatter(anomalies.index, anomalies["Vibration_mm_s"], color="red", label="Anomaly", zorder=5)
plt.title("Vibration Monitoring with Detected Anomalies")
plt.xlabel("Time Index")
plt.ylabel("Vibration (mm/s)")
plt.legend()
plt.tight_layout()
plt.show()

### 1.4 Zoomed Vibration View
Zooming in helps us see the exact moments when vibration spikes occur relative to the normal cycles.

In [None]:
# ---- Zoomed-in Anomaly Visualization (first 300 seconds) ----
zoom_df = df.iloc[0:300]
zoom_anomalies = anomalies[anomalies.index < 300]

plt.figure(figsize=(12, 4))
plt.plot(zoom_df["Vibration_mm_s"], label="Vibration (mm/s)", alpha=0.7)
plt.scatter(zoom_anomalies.index, zoom_anomalies["Vibration_mm_s"], color="red", label="Anomaly", zorder=5)

plt.title("Zoomed Vibration Monitoring with Anomalies (First 300 Seconds)")
plt.xlabel("Time Index (seconds)")
plt.ylabel("Vibration (mm/s)")
plt.legend()
plt.tight_layout()
plt.show()

## Part 2: Machine Learning Model
We will train models to predict what the "normal" current should be. If the actual current deviates significantly from the prediction, it may indicate an issue.

In [None]:
import numpy as np
from sklearn.model_selection import train_test_split

# Feature Selection
features = ["Motor_RPM", "Vibration_mm_s", "Winding_Temp_C", "Bearing_Temp_C", "Voltage_V"]
X = df[features]
y = df["Current_Amp"]

# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print("Training set size:", len(X_train))
print("Test set size:", len(X_test))

### 2.1 Model Training
We'll compare a simple **Decision Tree** against a **Random Forest**. Regression metrics (MAE) will evaluate accuracy.

In [None]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error

# 1. Decision Tree (Baseline)
dt_model = DecisionTreeRegressor(max_depth=5, random_state=42)
dt_model.fit(X_train, y_train)
dt_mae = mean_absolute_error(y_test, dt_model.predict(X_test))
print("Decision Tree MAE (Amps):", round(dt_mae, 2))

# 2. Random Forest (Main Model)
rf_model = RandomForestRegressor(n_estimators=100, max_depth=8, random_state=42, n_jobs=-1)
rf_model.fit(X_train, y_train)
y_pred_rf = rf_model.predict(X_test)
rf_mae = mean_absolute_error(y_test, y_pred_rf)
print("Random Forest MAE (Amps):", round(rf_mae, 2))

### 2.2 Health Index & Anomaly Detection
We calculate the **Prediction Error** (Residuals). Large errors mean the motor is behaving unexpectedly (Anomaly). We can map this error to a **Health Index** score (0-1).

In [None]:
# Calculate Prediction Error
prediction_error = np.abs(y_test - y_pred_rf)

# Dynamic Threshold: Mean + 3 Std Dev
error_threshold = prediction_error.mean() + 3 * prediction_error.std()
ml_anomalies = prediction_error > error_threshold

print(f"Anomaly Threshold (Amps error): {error_threshold:.2f}")
print("Number of ML-detected anomalies:", ml_anomalies.sum())

# Create Health Index (1 = Healthy, 0 = Critical)
normalized_error = prediction_error / prediction_error.max()
health_index = 1 - normalized_error

# Combine into a results DataFrame
health_df = pd.DataFrame({
    "Actual_Current": y_test.values,
    "Predicted_Current": y_pred_rf,
    "Prediction_Error": prediction_error,
    "Health_Index": health_index
})

print("\nHealth Index Summary:")
print(health_df["Health_Index"].describe())

### 2.3 Prescriptive Maintenance
Finally, we categorize the Health Index into status levels and recommend specific maintenance actions.

In [None]:
def health_status(h):
    if h > 0.8: return "Healthy"
    elif h > 0.6: return "Degrading"
    else: return "Critical"

def prescribe_action(row):
    if row["Health_Status"] == "Healthy": return "No action required"
    elif row["Health_Status"] == "Degrading": return "Schedule inspection and lubrication check"
    else: return "Immediate maintenance: check bearings, alignment, load"

health_df["Health_Status"] = health_df["Health_Index"].apply(health_status)
health_df["Prescriptive_Action"] = health_df.apply(prescribe_action, axis=1)

print("\nHealth Status Counts:")
print(health_df["Health_Status"].value_counts())

print("\nSample Recommendations:")
print(health_df[["Health_Status", "Prescriptive_Action"]].head(10))