# Direct Use of Nonparametric Outlier Detection

## Introduction

This example demonstrates nonparametric outlier detection using kernel density estimation (KDE). The algorithm learns a nonparametric probability density function from undamaged baseline data and identifies outliers as points with low probability density.

Data from the **3-story structure** dataset are used to extract AR model features, which are then analyzed using kernel density estimation for damage detection.

**Key Concepts:**
- **Kernel Density Estimation**: Nonparametric density estimation using various kernel functions
- **Bandwidth Selection**: Automatic methods for optimal smoothing parameter selection
- **Threshold Determination**: Statistical approach using normal distribution fitting
- **Multiple Kernel Functions**: Comparison of different kernel shapes (Gaussian, Epanechnikov, etc.)

**References:**

Figueiredo, E., Park, G., Figueiras, J., Farrar, C., & Worden, K. (2009). Structural Health Monitoring Algorithm Comparisons using Standard Data Sets. Los Alamos National Laboratory Report: LA-14393.

**SHMTools functions used:**
- `ar_model_shm`
- `learn_kernel_density_shm`
- `score_kernel_density_shm`
- `roc_shm`
- `epanechnikov_kernel_shm`

In [None]:
import numpy as npimport matplotlib.pyplot as plt# Import shmtools (installed package)from shmtools.utils.data_loading import load_3story_datafrom shmtools.features.time_series import ar_model_shmfrom shmtools.classification.nonparametric import (# Set up plottingplt.style.use('default')plt.rcParams['figure.figsize'] = (12, 8)plt.rcParams['font.size'] = 10

## Load data

In [None]:
data = load_3story_data()
dataset = data['dataset']
states = data['damage_states']

In [None]:
time_data = np.zeros((2048, 5, 680))
time_data_states = np.zeros(680)
for i in range(4):
    start_idx = 2048 * i
    end_idx = 2048 * (i + 1)
    time_data[:, :, i::4] = dataset[start_idx:end_idx, :, :]
    time_data_states[i::4] = states

In [None]:
N = 400
np.random.seed(42)
idx = np.random.permutation(time_data.shape[2])[:N]
X_data = ar_model_shm(time_data[:, :, idx])[1]
X_states = time_data_states[idx]

In [None]:
idx = np.isin(X_states, range(1, 10))
X_undamaged = X_data[idx, :]
n_undamaged = X_undamaged.shape[0]
n_train = round(0.8 * n_undamaged)
X_train = X_undamaged[:n_train, :]
X_test = np.vstack([X_undamaged[n_train:, :], X_data[~idx, :]])
n_test = X_test.shape[0]

In [None]:
n_test_0 = n_undamaged - n_train

In [None]:
test_labels = np.concatenate([np.zeros(n_test_0), np.ones(n_test - n_test_0)])

## Train a model over the undamaged data

In [None]:
kernel_fun = epanechnikov_kernel_shm
H = None
bs_method = 2
d_model = learn_kernel_density_shm(X_train, H, kernel_fun, bs_method)

## Pick a threshold from the training data

In [None]:
likelihoods = score_kernel_density_shm(X_train, d_model)

In [None]:
model_p = stats.norm.fit(likelihoods)

In [None]:
confidence = 0.9
threshold = stats.norm.ppf(1 - confidence, model_p[0], model_p[1])

## Test the detector

In [None]:
scores = score_kernel_density_shm(X_test, d_model)

In [None]:
results = scores <= threshold

## Report the detector's performance

In [None]:
total_err = np.sum(results != test_labels) / n_test
false_positive_err = np.sum(results[:n_test_0] != 0) / n_test_0
false_negative_err = np.sum(results[n_test_0:] != 1) / (n_test - n_test_0)
print(f'\n Total error: {total_err:.2f}\n False Positive rate: {false_positive_err:.2f}\n False Negative rate: {false_negative_err:.2f}')

In [None]:
true_positives, false_positives = roc_shm(scores, test_labels)

In [None]:
plt.figure()
plt.plot(false_positives, true_positives)
plt.xlabel('falsePositives')
plt.ylabel('truePositives')
plt.title('ROC curve')
plt.show()