# Predictive Maintenance Module

This notebook demonstrates a basic predictive maintenance module. It connects to MongoDB to retrieve simulated machine data from our smart factory pipeline, performs anomaly detection, and identifies potential failure patterns to generate alerts.

---

## 1. Setup and Data Retrieval

First, we'll import the necessary libraries and establish a connection to the MongoDB database to fetch the most recent PLC data.

In [None]:
import pandas as pd
import numpy as np
from pymongo import MongoClient
from sklearn.ensemble import IsolationForest
import matplotlib.pyplot as plt
import seaborn as sns

# --- MongoDB Connection ---
# Connect to your local MongoDB instance. 
# Adjust the URI if your database is hosted elsewhere or requires different credentials.
client = MongoClient("mongodb://localhost:27017/")
db = client.smart_factory
plc_collection = db.plc_data

def fetch_data_from_mongo(collection, limit=2000):
    """Fetches and prepares data from a MongoDB collection."""
    # Fetch the latest data, sorted by the document's creation time
    cursor = collection.find().sort([('_id', -1)]).limit(limit)
    df = pd.DataFrame(list(cursor))
    
    if df.empty:
        print("No data found in the collection.")
        return pd.DataFrame()

    # Convert MongoDB's ObjectId to a timestamp for time-series analysis
    df['timestamp'] = df['_id'].apply(lambda x: x.generation_time)
    
    # Normalize the nested 'analog_inputs' dictionary into separate columns
    if 'analog_inputs' in df.columns:
        normalized_analog = pd.json_normalize(df['analog_inputs'])
        df = df.join(normalized_analog)

    return df

# Fetch and display the head of the PLC data
plc_df = fetch_data_from_mongo(plc_collection)

if not plc_df.empty:
    print("Fetched and Processed PLC Data:")
    display(plc_df.head())

## 2. Anomaly Detection with Isolation Forest

We'll use the **Isolation Forest** algorithm to detect anomalies. This is an unsupervised learning algorithm that is effective for identifying outliers in data. We will focus on the `pressure` and `flow_rate` metrics.

In [None]:
if not plc_df.empty and 'pressure' in plc_df.columns and 'flow_rate' in plc_df.columns:
    features = ['pressure', 'flow_rate']
    X = plc_df[features].copy()

    # Initialize and train the model. Contamination is the expected proportion of outliers.
    model = IsolationForest(contamination=0.05, random_state=42)
    plc_df['anomaly'] = model.fit_predict(X)
    plc_df['anomaly_score'] = model.decision_function(X)

    # Separate the anomalies for review (-1 indicates an anomaly)
    anomalies = plc_df[plc_df['anomaly'] == -1]
    print(f"\nDetected {len(anomalies)} anomalies out of {len(plc_df)} data points.")
    if not anomalies.empty:
        print("Detected Anomalies:")
        display(anomalies[['timestamp', 'pressure', 'flow_rate', 'anomaly_score']])

    # --- Visualization of Anomalies ---
    sns.set_style("whitegrid")
    
    plt.figure(figsize=(15, 6))
    sns.scatterplot(x='timestamp', y='pressure', hue='anomaly', data=plc_df, 
                    palette={1: 'blue', -1: 'red'}, legend=False)
    plt.title('Pressure Anomaly Detection', fontsize=16)
    plt.ylabel('Pressure (PSI)')
    plt.xlabel('Timestamp')
    plt.show()

    plt.figure(figsize=(15, 6))
    sns.scatterplot(x='timestamp', y='flow_rate', hue='anomaly', data=plc_df, 
                    palette={1: 'blue', -1: 'red'}, legend=False)
    plt.title('Flow Rate Anomaly Detection', fontsize=16)
    plt.ylabel('Flow Rate (L/min)')
    plt.xlabel('Timestamp')
    plt.show()
else:
    print("\nSkipping anomaly detection: PLC data is empty or missing required feature columns.")

## 3. Predictive Maintenance Alerts

This final section simulates a basic predictive maintenance system. It scans the data for anomalies detected by our model, rule-based threshold breaches, and any reported error codes, then aggregates them into a list of actionable alerts.

In [None]:
def generate_maintenance_alerts(df):
    """Generates alerts based on anomalies, thresholds, and error codes."""
    alerts = []
    # Ensure required columns exist
    required_cols = ['pressure', 'flow_rate', 'error_codes', 'anomaly', 'timestamp', 'anomaly_score']
    if not all(col in df.columns for col in required_cols):
        print("Cannot generate alerts. DataFrame is missing one or more required columns.")
        return []
    
    for _, row in df.iterrows():
        # Rule-based threshold alerts
        if row['pressure'] > 90: 
            alerts.append(f"CRITICAL: High Pressure Alert: {row['pressure']:.2f} PSI at {row['timestamp'].strftime('%Y-%m-%d %H:%M:%S')}")
        if row['flow_rate'] < 5: 
            alerts.append(f"WARNING: Low Flow Rate Alert: {row['flow_rate']:.2f} L/min at {row['timestamp'].strftime('%Y-%m-%d %H:%M:%S')}")
        
        # Error code alerts (assuming '0' is nominal)
        if row['error_codes'] != 'System Operational' and row['error_codes'] != 0:
            alerts.append(f"FAULT: PLC Error Code '{row['error_codes']}' detected at {row['timestamp'].strftime('%Y-%m-%d %H:%M:%S')}")
        
        # Anomaly-based alerts
        if row['anomaly'] == -1:
            alerts.append(f"ANOMALY: Unusual sensor behavior detected at {row['timestamp'].strftime('%Y-%m-%d %H:%M:%S')} (Score: {row['anomaly_score']:.2f})")
    
    return alerts

if not plc_df.empty:
    maintenance_alerts = generate_maintenance_alerts(plc_df)
    print("\n--- Generated Maintenance Alerts ---")
    if maintenance_alerts:
        for alert in maintenance_alerts:
            print(alert)
    else:
        print("No maintenance alerts generated.")
else:
    print("\nNo PLC data available to generate alerts from.")
