# IoT Anomaly Detection - API Demonstration

This notebook demonstrates the programming interface (API) for the IoT Anomaly Detection system.

## Overview

The API is built around utility functions in `iot_anomaly_utils.py` that provide:
- Data loading and validation
- Feature engineering functions
- Model training and evaluation
- Visualization helpers
- Model persistence

## 1. Import Required Libraries

In [None]:
import pandas as pd
import numpy as np
from iot_anomaly_utils import (
    load_iot_data,
    get_feature_columns,
    compute_basic_features,
    compute_rolling_features,
    train_anomaly_detector,
    evaluate_model,
    save_model,
    load_model,
    plot_confusion_matrix,
    plot_feature_importance,
    create_forward_looking_labels,
    validate_data_quality
)

## 2. Data Loading API

### load_iot_data()
Load IoT sensor data from CSV with automatic timestamp conversion.

In [None]:
# Load data
df = load_iot_data('data/raw/smart_manufacturing_data.csv')

print(f"Shape: {df.shape}")
print(f"\nColumns: {df.columns.tolist()}")
print(f"\nFirst few rows:")
df.head()

### validate_data_quality()
Get data quality statistics and validation results.

In [None]:
# Validate data quality
validation_results = validate_data_quality(df)

print(f"Total rows: {validation_results['total_rows']}")
print(f"Number of machines: {validation_results['num_machines']}")
print(f"Date range: {validation_results['date_range']['min']} to {validation_results['date_range']['max']}")
print(f"Duplicate rows: {validation_results['duplicate_rows']}")

## 3. Feature Engineering API

### compute_basic_features()
Generate basic transformations: squared, sqrt, log.

In [None]:
# Define sensor columns
sensors = ['temperature', 'vibration', 'humidity', 'pressure', 'energy_consumption']

# Sample data for demonstration
df_sample = df.sample(n=1000, random_state=42).copy()
df_sample = df_sample.sort_values(['machine_id', 'timestamp']).reset_index(drop=True)

# Compute basic features
df_features = compute_basic_features(df_sample, sensors)

print(f"Original columns: {len(df_sample.columns)}")
print(f"After basic features: {len(df_features.columns)}")
print(f"New features added: {len(df_features.columns) - len(df_sample.columns)}")

# Show some new features
new_cols = [c for c in df_features.columns if c not in df_sample.columns]
print(f"\nExample new features: {new_cols[:10]}")

### compute_rolling_features()
Generate rolling window statistics per machine.

In [None]:
# Compute rolling features
df_rolling = compute_rolling_features(df_features, sensors, windows=[6, 12, 24])

print(f"After rolling features: {len(df_rolling.columns)}")
print(f"Total features engineered: {len(df_rolling.columns) - len(df_sample.columns)}")

# Show rolling feature examples
rolling_cols = [c for c in df_rolling.columns if 'rolling' in c]
print(f"\nExample rolling features: {rolling_cols[:10]}")

### get_feature_columns()
Extract feature column names excluding metadata and target columns.

In [None]:
# Get feature columns
feature_cols = get_feature_columns(df_rolling)

print(f"Number of feature columns: {len(feature_cols)}")
print(f"\nFeature columns (first 20):")
print(feature_cols[:20])

## 4. Model Training API

### train_anomaly_detector()
Train Random Forest anomaly detector with SMOTE balancing and StandardScaler.

In [None]:
# Prepare data for training
from sklearn.model_selection import train_test_split

X = df_rolling[feature_cols].values
y = df_rolling['anomaly_flag'].values

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Training set: {X_train.shape}")
print(f"Test set: {X_test.shape}")
print(f"\nClass distribution in training:")
print(f"Normal: {(y_train == 0).sum()} ({(y_train == 0).sum() / len(y_train) * 100:.1f}%)")
print(f"Anomaly: {(y_train == 1).sum()} ({(y_train == 1).sum() / len(y_train) * 100:.1f}%)")

In [None]:
# Train model
model, scaler = train_anomaly_detector(X_train, y_train)

print("Model trained successfully!")
print(f"Model type: {type(model).__name__}")
print(f"Number of estimators: {model.n_estimators}")
print(f"Scaler type: {type(scaler).__name__}")

## 5. Model Evaluation API

### evaluate_model()
Compute accuracy, precision, recall, F1 score, and confusion matrix.

In [None]:
# Make predictions
X_test_scaled = scaler.transform(X_test)
y_pred = model.predict(X_test_scaled)

# Evaluate
metrics = evaluate_model(y_test, y_pred)

print("Model Performance:")
print(f"Accuracy:  {metrics['accuracy']:.4f}")
print(f"Precision: {metrics['precision']:.4f}")
print(f"Recall:    {metrics['recall']:.4f}")
print(f"F1 Score:  {metrics['f1_score']:.4f}")
print(f"\nConfusion Matrix:")
print(metrics['confusion_matrix'])

## 6. Visualization API

### plot_confusion_matrix()
Visualize confusion matrix as heatmap.

In [None]:
plot_confusion_matrix(
    metrics['confusion_matrix'],
    labels=['Normal', 'Anomaly'],
    title='Anomaly Detection - Confusion Matrix'
)

### plot_feature_importance()
Display top N most important features.

In [None]:
plot_feature_importance(model, feature_cols, top_n=15)

## 7. Model Persistence API

### save_model() / load_model()
Save and load trained models using joblib.

In [None]:
# Save model
save_model(model, 'models/demo_model.pkl')
save_model(scaler, 'models/demo_scaler.pkl')

print("Models saved successfully!")

In [None]:
# Load model
loaded_model = load_model('models/demo_model.pkl')
loaded_scaler = load_model('models/demo_scaler.pkl')

# Test loaded model
y_pred_loaded = loaded_model.predict(loaded_scaler.transform(X_test))
metrics_loaded = evaluate_model(y_test, y_pred_loaded)

print("Loaded model performance:")
print(f"Accuracy: {metrics_loaded['accuracy']:.4f}")
print(f"F1 Score: {metrics_loaded['f1_score']:.4f}")

## 8. Predictive API

### create_forward_looking_labels()
Generate labels that indicate if an anomaly will occur in the next N hours.

In [None]:
# Create 24-hour forward-looking labels
df_sample_sorted = df_sample.sort_values(['machine_id', 'timestamp']).reset_index(drop=True)
labels_24h = create_forward_looking_labels(df_sample_sorted, horizon_hours=24)

print(f"Original anomaly rate: {df_sample_sorted['anomaly_flag'].mean():.2%}")
print(f"24h forward-looking rate: {labels_24h.mean():.2%}")
print(f"\nThis shows {labels_24h.mean() / df_sample_sorted['anomaly_flag'].mean():.1f}x increase in early warnings")

## 9. Complete API Workflow Example

This section demonstrates a complete end-to-end workflow using the API.

In [None]:
# Step 1: Load data
data = load_iot_data('data/raw/smart_manufacturing_data.csv')
data_sample = data.sample(n=2000, random_state=42)
data_sample = data_sample.sort_values(['machine_id', 'timestamp']).reset_index(drop=True)

# Step 2: Engineer features
sensors = ['temperature', 'vibration', 'humidity', 'pressure', 'energy_consumption']
data_engineered = compute_basic_features(data_sample, sensors)
data_engineered = compute_rolling_features(data_engineered, sensors, windows=[6, 12])

# Step 3: Prepare train/test split
features = get_feature_columns(data_engineered)
X = data_engineered[features].values
y = data_engineered['anomaly_flag'].values

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Step 4: Train model
model, scaler = train_anomaly_detector(X_train, y_train)

# Step 5: Evaluate
y_pred = model.predict(scaler.transform(X_test))
results = evaluate_model(y_test, y_pred)

print("Complete Workflow Results:")
print(f"Features engineered: {len(features)}")
print(f"Training samples: {len(X_train)}")
print(f"Test samples: {len(X_test)}")
print(f"\nPerformance:")
print(f"  Accuracy: {results['accuracy']:.4f}")
print(f"  F1 Score: {results['f1_score']:.4f}")

## Summary

This notebook demonstrated the complete API for IoT Anomaly Detection:

1. **Data Loading**: `load_iot_data()`, `validate_data_quality()`
2. **Feature Engineering**: `compute_basic_features()`, `compute_rolling_features()`, `get_feature_columns()`
3. **Model Training**: `train_anomaly_detector()`
4. **Evaluation**: `evaluate_model()`
5. **Visualization**: `plot_confusion_matrix()`, `plot_feature_importance()`
6. **Persistence**: `save_model()`, `load_model()`
7. **Predictive**: `create_forward_looking_labels()`

For a complete application example, see [iot_anomaly.example.ipynb](iot_anomaly.example.ipynb).