# LADPackage Outlier Detection

**Auto-Generated Script Converted from mFUSE**

This notebook demonstrates a complete outlier detection workflow using structural health monitoring data from a 3-story structure. The workflow follows the LADPackage methodology and includes:

1. **Data Import**: Loading 3-story structure dataset
2. **Feature Extraction**: AR model parameter estimation
3. **Feature Visualization**: Plotting feature vectors
4. **Model Training**: Mahalanobis distance learning and scoring
5. **Results Analysis**: Score visualization and distribution analysis
6. **Performance Evaluation**: ROC curve analysis

This example was originally created in mFUSE on 3/19/2016 and demonstrates the integration of multiple SHM analysis techniques in a single workflow.

## Background

Outlier detection in structural health monitoring aims to identify abnormal structural behavior that may indicate damage. This workflow uses:

- **Autoregressive (AR) modeling** to extract time series features
- **Mahalanobis distance** for statistical outlier detection
- **ROC analysis** for performance quantification

The methodology is particularly effective for detecting changes in structural dynamics caused by damage or environmental variations.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import sys
from pathlib import Path

# Add project root to path for imports
notebook_dir = Path().resolve()
project_root = notebook_dir.parent.parent
sys.path.insert(0, str(project_root))

# Import SHMTools modules
from shmtools.features import ar_model_shm
from shmtools.classification import roc_shm
from shmtools.plotting import (
    plot_features_shm,
    plot_scores_shm, 
    plot_score_distributions_shm,
    plot_roc_shm
)

# Import LADPackage-specific functions
from LADPackage.utils import import_3story_structure_sub_floors, learn_score_mahalanobis

print(f"Project root: {project_root}")
print("LADPackage outlier detection workflow initialized")

## Step 1: Import 3 Story Structure Dataset

Import 3 story structure data using the LADPackage-compatible import function. This dataset contains:

- **170 test instances**: 90 undamaged + 80 damaged conditions
- **5 channels**: Force input + 4 acceleration measurements
- **8192 time points**: per measurement at 2000 Hz sampling rate

The data represents base excitation tests on a 3-story building structure with various damage scenarios including gap formation and mass changes.

In [None]:
# Import 3 story structure data (empty list uses default: all channels)
dataset, damage_states, state_list = import_3story_structure_sub_floors([])

print(f"Dataset shape: {dataset.shape}")
print(f"Damage states shape: {damage_states.shape}")
print(f"State list shape: {state_list.shape}")
print(f"Unique damage states: {np.unique(damage_states)}")
print(f"Number of undamaged instances: {np.sum(damage_states == 0)}")
print(f"Number of damaged instances: {np.sum(damage_states == 1)}")

## Step 2: AR Model

**AutoRegressive model (AR)** parameter estimation to extract features from the time series data.

The AR model represents each time series as a linear combination of its past values:
$$x(t) = \sum_{i=1}^{p} a_i x(t-i) + e(t)$$

Where:
- $p = 10$ is the AR model order
- $a_i$ are the AR parameters (our features)
- $e(t)$ is the prediction error

The AR parameters capture the structural dynamics and are sensitive to changes caused by damage.

In [None]:
# AR model order
ar_order = 10

# Extract AR model parameters as features
ar_parameters_fv, rms_residuals_fv, ar_parameters, ar_residuals, ar_prediction = ar_model_shm(
    dataset, ar_order
)

print(f"AR parameters feature vectors shape: {ar_parameters_fv.shape}")
print(f"RMS residuals feature vectors shape: {rms_residuals_fv.shape}")
print(f"AR parameters shape: {ar_parameters.shape}")
print(f"AR residuals shape: {ar_residuals.shape}")
print(f"AR prediction shape: {ar_prediction.shape}")

# Display feature statistics
print(f"\nFeature vector statistics:")
print(f"Mean AR parameter magnitude: {np.mean(np.abs(ar_parameters_fv)):.4f}")
print(f"Mean RMS residual: {np.mean(rms_residuals_fv):.4f}")

## Step 3: Plot Features

**Plot feature vectors as a subplot for each feature**

Visualize the first 4 AR parameters across all test instances to observe how they vary between undamaged and damaged conditions. This helps identify which features are most sensitive to damage.

In [None]:
# Feature indices to plot (first 4 AR parameters)
feature_indices = np.array([0, 1, 2, 3])  # Python 0-based indexing

# Plot features
axes_handle = plot_features_shm(
    ar_parameters_fv,
    instance_indices=None,  # Plot all instances
    feature_indices=feature_indices,
    subplot_titles=None,  # Use default titles
    subplot_ylabels=None,  # Use default labels
    axes_handle=None
)

plt.suptitle('AR Model Parameters (Features 1-4)', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print(f"Plotted {len(feature_indices)} AR parameter features")

## Step 4: Learn Score Mahalanobis

**Split data, train, and score using Mahalanobis distance**

The Mahalanobis distance measures how far each instance is from the training data distribution, accounting for correlations between features:

$$D_M(x) = \sqrt{(x - \mu)^T \Sigma^{-1} (x - \mu)}$$

Where:
- $\mu$ is the mean of training features  
- $\Sigma$ is the covariance matrix of training features
- $x$ is a test feature vector

We use every other instance from the first 91 (indices 1:2:91 in MATLAB, 0:2:90 in Python) for training.

In [None]:
# Training indices: every other instance from first 91 (MATLAB 1:2:91)
# Convert to Python 0-based indexing: 0:2:90
training_indices = list(range(0, 91, 2))

print(f"Training indices (first 10): {training_indices[:10]}")
print(f"Total training instances: {len(training_indices)}")

# Learn and score using Mahalanobis distance
scores = learn_score_mahalanobis(ar_parameters_fv, training_indices)

print(f"\nScores shape: {scores.shape}")
print(f"Score statistics:")
print(f"  Mean: {np.mean(scores):.4f}")
print(f"  Std:  {np.std(scores):.4f}")
print(f"  Min:  {np.min(scores):.4f}")
print(f"  Max:  {np.max(scores):.4f}")

## Step 5: Plot Scores

**Plot bar graph showing detection results**

Visualize the Mahalanobis distance scores as a bar graph. Higher scores typically indicate greater deviation from normal (training) conditions, suggesting potential damage.

Parameters:
- `flip_signs=True`: Makes higher scores more anomalous (if needed)
- `use_log_scores=False`: Use linear scale for scores

In [None]:
# Plot scores as bar graph
flip_signs = True
use_log_scores = False

# Simple matplotlib bar plot
plt.figure(figsize=(12, 6))
x_indices = np.arange(len(scores))
scores_to_plot = -scores.flatten() if flip_signs else scores.flatten()

# Color based on damage states
colors = ['blue' if state == 0 else 'red' for state in damage_states.flatten()]
plt.bar(x_indices, scores_to_plot, color=colors, alpha=0.7)

plt.title('Mahalanobis Distance Scores', fontsize=14, fontweight='bold')
plt.xlabel('Test Instance')
plt.ylabel('Mahalanobis Distance')
plt.grid(True, alpha=0.3)

# Add legend
from matplotlib.patches import Patch
legend_elements = [Patch(facecolor='blue', label='Undamaged'),
                  Patch(facecolor='red', label='Damaged')]
plt.legend(handles=legend_elements)

plt.tight_layout()
plt.show()

print("Bar graph showing detection results completed")

## Step 6: Plot Score Distributions

**Plot distribution of scores using KDE**

Use kernel density estimation to visualize the probability distributions of scores for undamaged vs. damaged conditions. This helps assess the separability of the two classes.

Parameters:
- `flip_signs=True`: Ensure higher scores indicate more anomalous behavior
- `use_log_scores=True`: Use logarithmic scale for better visualization
- `smoothing=0.1`: Bandwidth parameter for kernel density estimation

In [None]:
# Plot score distributions using KDE
flip_signs = True
use_log_scores = True
smoothing = 0.1

# Process scores
scores_processed = -scores.flatten() if flip_signs else scores.flatten()
if use_log_scores and np.min(scores_processed) > 0:
    scores_processed = np.log10(scores_processed)
elif use_log_scores and np.max(scores_processed) < 0:
    scores_processed = -np.log10(-scores_processed)

# Plot distributions using the function we implemented
axes_handle = plot_score_distributions_shm(
    scores_processed,
    damage_states.flatten(),  # Convert from (N, 1) to (N,) shape
    state_names=['Undamaged', 'Damaged'],
    thresholds=None,
    flip_signs=False,  # Already processed
    use_log_scores=False,  # Already processed
    smoothing=smoothing,
    axes=None
)

plt.title('Score Distributions (Undamaged vs Damaged)', fontsize=14, fontweight='bold')
xlabel_text = 'Log Mahalanobis Distance' if use_log_scores else 'Mahalanobis Distance'
plt.xlabel(xlabel_text)
plt.ylabel('Probability Density')
plt.tight_layout()
plt.show()

print("Score distribution visualization completed")

## Step 7: Receiver Operating Characteristic

**Receiver operating characteristic (ROC) curve**

Calculate the ROC curve to quantify the detection performance. The ROC curve shows the trade-off between:

- **True Positive Rate (TPR)**: Sensitivity = TP/(TP+FN)
- **False Positive Rate (FPR)**: 1-Specificity = FP/(FP+TN)

A perfect detector would have TPR=1 and FPR=0 (upper-left corner). The Area Under the Curve (AUC) provides a single performance metric.

In [None]:
# Calculate ROC curve
TPR, FPR = roc_shm(
    scores.flatten(),           # Convert from (N, 1) to (N,) shape
    damage_states.flatten(),    # Convert from (N, 1) to (N,) shape
    num_pts=None,              # Use default number of points
    threshold_type="below"     # Use below threshold (default)
)

print(f"ROC curve calculated:")
print(f"TPR shape: {TPR.shape}")
print(f"FPR shape: {FPR.shape}")
print(f"Number of threshold points: {len(TPR)}")

# Calculate AUC using trapezoidal rule
auc = np.trapz(TPR, FPR)
print(f"Area Under Curve (AUC): {auc:.4f}")

## Step 8: Plot Receiver Operating Characteristic Curve

**Plot receiver operating characteristic curve**

Visualize the ROC curve to assess detection performance. The curve shows how well the Mahalanobis distance can distinguish between undamaged and damaged conditions.

- **Diagonal line**: Random classifier performance (AUC = 0.5)
- **Upper-left corner**: Perfect classifier performance (AUC = 1.0)
- **AUC > 0.7**: Generally considered acceptable performance
- **AUC > 0.8**: Good performance
- **AUC > 0.9**: Excellent performance

In [None]:
# Plot ROC curve
plt.figure(figsize=(8, 6))
plt.plot(FPR, TPR, 'b-', linewidth=2, label=f'ROC Curve (AUC = {auc:.3f})')
plt.plot([0, 1], [0, 1], 'r--', linewidth=1, label='Random Classifier')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title(f'ROC Curve (AUC = {auc:.3f})', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()

print(f"ROC curve visualization completed")
print(f"Final AUC: {auc:.4f}")

# Performance interpretation
if auc > 0.9:
    performance = "Excellent"
elif auc > 0.8:
    performance = "Good"
elif auc > 0.7:
    performance = "Acceptable"
else:
    performance = "Poor"
    
print(f"Detection performance: {performance} (AUC = {auc:.3f})")

# Update the summary with the actual AUC
print(f"\n=== LADPackage Outlier Detection Summary ===")
print(f"AUC Score: {auc:.3f} ({performance})")
print(f"Undamaged instances: {np.sum(damage_states == 0)}")
print(f"Damaged instances: {np.sum(damage_states == 1)}")
print(f"Training instances: {len(training_indices)}")

## Summary

This LADPackage outlier detection workflow successfully demonstrated:

1. **Data Import**: Loaded 3-story structure data with 170 instances (90 undamaged, 80 damaged)
2. **Feature Extraction**: Extracted AR model parameters (order 10) as damage-sensitive features
3. **Feature Analysis**: Visualized feature variations across different structural conditions
4. **Outlier Detection**: Applied Mahalanobis distance using training data from undamaged conditions
5. **Performance Evaluation**: Achieved AUC = {auc:.3f} indicating {performance.lower()} detection performance

### Key Insights

- **AR parameters** effectively capture structural dynamics changes due to damage
- **Mahalanobis distance** accounts for feature correlations, improving detection sensitivity
- **Training on undamaged data** provides a robust baseline for anomaly detection
- **ROC analysis** quantifies the trade-off between detection sensitivity and false alarms

### Applications

This methodology is applicable to:
- **Civil structures**: Buildings, bridges, offshore platforms
- **Mechanical systems**: Rotating machinery, aerospace components
- **Infrastructure monitoring**: Real-time health assessment systems

The workflow demonstrates the power of combining time series modeling (AR) with statistical outlier detection (Mahalanobis) for effective structural health monitoring.