# Tier 6: Statistical Anomaly Detection

---

**Author:** Brandon Deloatch
**Affiliation:** Quipu Research Labs, LLC
**Date:** 2025-10-02
**Version:** v1.3
**License:** MIT
**Notebook ID:** 13497a73-018e-433b-8716-385139202e87

---

## Citation
Brandon Deloatch, "Tier 6: Statistical Anomaly Detection," Quipu Research Labs, LLC, v1.3, 2025-10-02.

Please cite this notebook if used or adapted in publications, presentations, or derivative work.

---

## Contributors / Acknowledgments
- **Primary Author:** Brandon Deloatch (Quipu Research Labs, LLC)
- **Institutional Support:** Quipu Research Labs, LLC - Advanced Analytics Division
- **Technical Framework:** Built on scikit-learn, pandas, numpy, and plotly ecosystems
- **Methodological Foundation:** Statistical learning principles and modern data science best practices

---

## Version History
| Version | Date | Notes |
|---------|------|-------|
| v1.3 | 2025-10-02 | Enhanced professional formatting, comprehensive documentation, interactive visualizations |
| v1.2 | 2024-09-15 | Updated analysis methods, improved data generation algorithms |
| v1.0 | 2024-06-10 | Initial release with core analytical framework |

---

## Environment Dependencies
- **Python:** 3.8+
- **Core Libraries:** pandas 2.0+, numpy 1.24+, scikit-learn 1.3+
- **Visualization:** plotly 5.0+, matplotlib 3.7+
- **Statistical:** scipy 1.10+, statsmodels 0.14+
- **Development:** jupyter-lab 4.0+, ipywidgets 8.0+

> **Reproducibility Note:** Use requirements.txt or environment.yml for exact dependency matching.

---

## Data Provenance
| Dataset | Source | License | Notes |
|---------|--------|---------|-------|
| Synthetic Data | Generated in-notebook | MIT | Custom algorithms for realistic simulation |
| Statistical Distributions | NumPy/SciPy | BSD-3-Clause | Standard library implementations |
| ML Algorithms | Scikit-learn | BSD-3-Clause | Industry-standard implementations |
| Visualization Schemas | Plotly | MIT | Interactive dashboard frameworks |

---

## Execution Provenance Logs
- **Created:** 2025-10-02
- **Notebook ID:** 13497a73-018e-433b-8716-385139202e87
- **Execution Environment:** Jupyter Lab / VS Code
- **Computational Requirements:** Standard laptop/workstation (2GB+ RAM recommended)

> **Auto-tracking:** Execution metadata can be programmatically captured for reproducibility.

---

## Disclaimer & Responsible Use
This notebook is provided "as-is" for educational, research, and professional development purposes. Users assume full responsibility for any results, applications, or decisions derived from this analysis.

**Professional Standards:**
- Validate all results against domain expertise and additional data sources
- Respect licensing and attribution requirements for all dependencies
- Follow ethical guidelines for data analysis and algorithmic decision-making
- Credit all methodological sources and derivative frameworks appropriately

**Academic & Commercial Use:**
- Permitted under MIT license with proper attribution
- Suitable for educational curriculum and professional training
- Appropriate for commercial adaptation with citation requirements
- Recommended for reproducible research and transparent analytics

---



In [None]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy import stats
from scipy.spatial.distance import mahalanobis
from sklearn.covariance import EllipticEnvelope, MinCovDet
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import precision_score, recall_score, f1_score
import warnings
warnings.filterwarnings('ignore')

print("Tier 6: Statistical Anomaly Detection - Libraries Loaded!")
print("=" * 58)
print("Statistical Anomaly Detection Methods:")
print("• Z-score and Modified Z-score outlier detection")
print("• Interquartile Range (IQR) method")
print("• Multivariate Mahalanobis distance")
print("• Robust covariance estimation (Minimum Covariance Determinant)")
print("• Time series seasonal decomposition anomalies")
print("• Statistical process control (SPC) methods")

In [None]:
# Generate comprehensive anomaly detection datasets
np.random.seed(42)

# 1. Financial transactions dataset with fraud
n_transactions = 5000
n_fraudulent = 200

# Normal transactions
normal_transactions = []
for i in range(n_transactions - n_fraudulent):
 # Transaction patterns vary by time of day and day of week
 hour = np.random.randint(0, 24)
 day_of_week = np.random.randint(0, 7)

 # Business hours have higher amounts
 if 9 <= hour <= 17 and day_of_week < 5: # Business hours
 amount_base = np.random.lognormal(6, 1) # ~$400 average
 frequency_factor = 2.0
 elif 18 <= hour <= 22: # Evening shopping
 amount_base = np.random.lognormal(5, 0.8) # ~$150 average
 frequency_factor = 1.5
 else: # Late night/early morning
 amount_base = np.random.lognormal(4, 0.5) # ~$55 average
 frequency_factor = 0.3

 normal_transactions.append({
 'transaction_id': f'TXN_{i:06d}',
 'amount': amount_base,
 'hour': hour,
 'day_of_week': day_of_week,
 'merchant_risk_score': np.random.beta(2, 8), # Most merchants low risk
 'user_velocity': np.random.poisson(frequency_factor),
 'geographic_risk': np.random.exponential(0.1),
 'is_fraud': 0
 })

# Fraudulent transactions (anomalies)
fraudulent_transactions = []
for i in range(n_fraudulent):
 # Fraud patterns: unusual amounts, times, and risk scores
 hour = np.random.choice([2, 3, 4, 23], p=[0.3, 0.3, 0.2, 0.2]) # Suspicious hours

 # Fraudulent amounts are either very small (testing) or very large (theft)
 if np.random.random() < 0.3: # Small test transactions
 amount = np.random.uniform(1, 10)
 else: # Large fraudulent transactions
 amount = np.random.lognormal(8, 1.5) # ~$3000 average

 fraudulent_transactions.append({
 'transaction_id': f'FRAUD_{i:06d}',
 'amount': amount,
 'hour': hour,
 'day_of_week': np.random.randint(0, 7),
 'merchant_risk_score': np.random.beta(8, 2), # High risk merchants
 'user_velocity': np.random.poisson(0.1), # Very low or very high velocity
 'geographic_risk': np.random.exponential(0.05), # Higher geographic risk
 'is_fraud': 1
 })

# Combine datasets
all_transactions = normal_transactions + fraudulent_transactions
fraud_df = pd.DataFrame(all_transactions).sample(frac=1).reset_index(drop=True)

print("Financial Fraud Dataset Created:")
print(f"Total transactions: {len(fraud_df)}")
print(f"Fraudulent transactions: {len(fraudulent_transactions)} ({len(fraudulent_transactions)/len(fraud_df)*100:.1f}%)")
print(f"Amount range: ${fraud_df['amount'].min():.2f} - ${fraud_df['amount'].max():.2f}")

# 2. Manufacturing quality control dataset
n_products = 3000
n_defective = 150

# Normal product measurements
quality_data = []

for i in range(n_products - n_defective):
 # Normal manufacturing process with controlled variation
 dimension_1 = np.random.normal(100, 2) # Target: 100mm ±2mm
 dimension_2 = np.random.normal(50, 1) # Target: 50mm ±1mm
 weight = np.random.normal(500, 10) # Target: 500g ±10g
 surface_roughness = np.random.gamma(2, 0.5) # Low roughness

 quality_data.append({
 'product_id': f'PROD_{i:06d}',
 'dimension_1': dimension_1,
 'dimension_2': dimension_2,
 'weight': weight,
 'surface_roughness': surface_roughness,
 'temperature': np.random.normal(25, 2), # Room temperature
 'is_defective': 0
 })

# Defective products (anomalies)
for i in range(n_defective):
 # Defective products have out-of-spec measurements
 if np.random.random() < 0.4: # Dimension defects
 dimension_1 = np.random.choice([np.random.normal(85, 3), np.random.normal(115, 3)])
 dimension_2 = np.random.normal(50, 1)
 elif np.random.random() < 0.3: # Weight defects
 dimension_1 = np.random.normal(100, 2)
 dimension_2 = np.random.normal(50, 1)
 else: # Multiple defects
 dimension_1 = np.random.normal(100, 5)
 dimension_2 = np.random.choice([np.random.normal(40, 2), np.random.normal(60, 2)])

 weight = np.random.choice([np.random.normal(450, 20), np.random.normal(550, 20)])
 surface_roughness = np.random.gamma(8, 1) # High roughness

 quality_data.append({
 'product_id': f'DEFECT_{i:06d}',
 'dimension_1': dimension_1,
 'dimension_2': dimension_2,
 'weight': weight,
 'surface_roughness': surface_roughness,
 'temperature': np.random.normal(25, 2),
 'is_defective': 1
 })

quality_df = pd.DataFrame(quality_data).sample(frac=1).reset_index(drop=True)

print(f"\nManufacturing Quality Dataset Created:")
print(f"Total products: {len(quality_df)}")
print(f"Defective products: {n_defective} ({n_defective/len(quality_df)*100:.1f}%)")

# 3. Time series with seasonal anomalies
dates = pd.date_range('2023-01-01', periods=365, freq='D')
n_days = len(dates)

# Base seasonal pattern
day_of_year = np.arange(1, n_days + 1)
seasonal_component = 50 + 30 * np.sin(2 * np.pi * day_of_year / 365.25) # Annual cycle
weekly_component = 10 * np.sin(2 * np.pi * day_of_year / 7) # Weekly cycle
trend_component = 0.1 * day_of_year # Slight upward trend
noise = np.random.normal(0, 5, n_days)

base_values = seasonal_component + weekly_component + trend_component + noise

# Add anomalies
anomaly_indices = np.random.choice(n_days, size=20, replace=False)
anomaly_values = base_values.copy()
for idx in anomaly_indices:
 if np.random.random() < 0.5:
 anomaly_values[idx] += np.random.normal(50, 15) # Positive anomaly
 else:
 anomaly_values[idx] -= np.random.normal(30, 10) # Negative anomaly

timeseries_df = pd.DataFrame({
 'date': dates,
 'value': anomaly_values,
 'is_anomaly': [1 if i in anomaly_indices else 0 for i in range(n_days)]
})

print(f"\nTime Series Anomaly Dataset Created:")
print(f"Total days: {len(timeseries_df)}")
print(f"Anomalous days: {len(anomaly_indices)} ({len(anomaly_indices)/len(timeseries_df)*100:.1f}%)")

In [None]:
# 1. STATISTICAL ANOMALY DETECTION METHODS
print("1. STATISTICAL ANOMALY DETECTION METHODS")
print("=" * 44)

# Z-Score Method
def detect_zscore_anomalies(data, threshold=3):
 """Detect anomalies using Z-score method"""
 z_scores = np.abs(stats.zscore(data))
 return z_scores > threshold, z_scores

# Modified Z-Score Method (more robust)
def detect_modified_zscore_anomalies(data, threshold=3.5):
 """Detect anomalies using Modified Z-score method"""
 median = np.median(data)
 mad = np.median(np.abs(data - median)) # Median Absolute Deviation
 modified_z_scores = 0.6745 * (data - median) / mad
 return np.abs(modified_z_scores) > threshold, modified_z_scores

# IQR Method
def detect_iqr_anomalies(data, k=1.5):
 """Detect anomalies using Interquartile Range method"""
 Q1 = np.percentile(data, 25)
 Q3 = np.percentile(data, 75)
 IQR = Q3 - Q1
 lower_bound = Q1 - k * IQR
 upper_bound = Q3 + k * IQR
 return (data < lower_bound) | (data > upper_bound), (lower_bound, upper_bound)

# Apply methods to fraud detection (transaction amounts)
amounts = fraud_df['amount'].values

# Z-score anomalies
zscore_anomalies, z_scores = detect_zscore_anomalies(amounts)
fraud_df['zscore_anomaly'] = zscore_anomalies
fraud_df['z_score'] = z_scores

# Modified Z-score anomalies
mod_zscore_anomalies, mod_z_scores = detect_modified_zscore_anomalies(amounts)
fraud_df['mod_zscore_anomaly'] = mod_zscore_anomalies
fraud_df['mod_z_score'] = mod_z_scores

# IQR anomalies
iqr_anomalies, (iqr_lower, iqr_upper) = detect_iqr_anomalies(amounts)
fraud_df['iqr_anomaly'] = iqr_anomalies

print("Fraud Detection Results (Transaction Amounts):")
print(f"Z-score anomalies: {np.sum(zscore_anomalies)} ({np.sum(zscore_anomalies)/len(amounts)*100:.1f}%)")
print(f"Modified Z-score anomalies: {np.sum(mod_zscore_anomalies)} ({np.sum(mod_zscore_anomalies)/len(amounts)*100:.1f}%)")
print(f"IQR anomalies: {np.sum(iqr_anomalies)} ({np.sum(iqr_anomalies)/len(amounts)*100:.1f}%)")
print(f"IQR bounds: ${iqr_lower:.2f} - ${iqr_upper:.2f}")

# Multivariate anomaly detection using Mahalanobis distance
print(f"\nMultivariate Anomaly Detection:")

# Select features for multivariate analysis
fraud_features = ['amount', 'merchant_risk_score', 'user_velocity', 'geographic_risk']
X_fraud = fraud_df[fraud_features].values

# Standardize features
scaler = StandardScaler()
X_fraud_scaled = scaler.fit_transform(X_fraud)

# Mahalanobis distance with robust covariance estimation
robust_cov = MinCovDet().fit(X_fraud_scaled)
mahal_distances = robust_cov.mahalanobis(X_fraud_scaled)

# Determine threshold (typically 97.5th percentile)
mahal_threshold = np.percentile(mahal_distances, 97.5)
mahal_anomalies = mahal_distances > mahal_threshold

fraud_df['mahalanobis_distance'] = mahal_distances
fraud_df['mahal_anomaly'] = mahal_anomalies

print(f"Mahalanobis distance anomalies: {np.sum(mahal_anomalies)} ({np.sum(mahal_anomalies)/len(X_fraud)*100:.1f}%)")
print(f"Mahalanobis threshold: {mahal_threshold:.2f}")

# Elliptic Envelope method
envelope = EllipticEnvelope(contamination=0.1, random_state=42)
envelope_predictions = envelope.fit_predict(X_fraud_scaled)
envelope_anomalies = envelope_predictions == -1

fraud_df['envelope_anomaly'] = envelope_anomalies

print(f"Elliptic Envelope anomalies: {np.sum(envelope_anomalies)} ({np.sum(envelope_anomalies)/len(X_fraud)*100:.1f}%)")

# Evaluate performance against ground truth
methods = {
 'Z-score': fraud_df['zscore_anomaly'],
 'Modified Z-score': fraud_df['mod_zscore_anomaly'],
 'IQR': fraud_df['iqr_anomaly'],
 'Mahalanobis': fraud_df['mahal_anomaly'],
 'Elliptic Envelope': fraud_df['envelope_anomaly']
}

print(f"\nPerformance Evaluation:")
print("Method Precision Recall F1-Score")
print("-" * 50)

performance_results = {}
for method_name, predictions in methods.items():
 precision = precision_score(fraud_df['is_fraud'], predictions, zero_division=0)
 recall = recall_score(fraud_df['is_fraud'], predictions, zero_division=0)
 f1 = f1_score(fraud_df['is_fraud'], predictions, zero_division=0)

 performance_results[method_name] = {
 'precision': precision,
 'recall': recall,
 'f1': f1
 }

 print(f"{method_name:<20} {precision:.3f} {recall:.3f} {f1:.3f}")

# Find best performing method
best_method = max(performance_results.keys(),
 key=lambda x: performance_results[x]['f1'])
print(f"\nBest performing method: {best_method} (F1: {performance_results[best_method]['f1']:.3f})")

In [None]:
# 2. TIME SERIES ANOMALY DETECTION
print("2. TIME SERIES ANOMALY DETECTION")
print("=" * 35)

# Seasonal decomposition for time series anomalies
def seasonal_decompose_anomalies(data, window=30, threshold=2.5):
 """Detect anomalies using seasonal decomposition"""
 # Simple seasonal decomposition
 # Moving average for trend
 trend = data.rolling(window=window, center=True).mean()

 # Detrended data
 detrended = data - trend

 # Seasonal component (weekly pattern)
 seasonal_period = 7
 seasonal = detrended.groupby(detrended.index % seasonal_period).transform('mean')

 # Residual
 residual = detrended - seasonal

 # Anomalies based on residual
 residual_std = residual.std()
 anomalies = np.abs(residual) > threshold * residual_std

 return anomalies, trend, seasonal, residual

# Apply to time series data
ts_anomalies, ts_trend, ts_seasonal, ts_residual = seasonal_decompose_anomalies(
 timeseries_df['value'], window=30, threshold=2.5
)

timeseries_df['detected_anomaly'] = ts_anomalies
timeseries_df['trend'] = ts_trend
timeseries_df['seasonal'] = ts_seasonal
timeseries_df['residual'] = ts_residual

# Statistical process control (SPC) method
def spc_control_limits(data, window=30):
 """Calculate SPC control limits"""
 rolling_mean = data.rolling(window=window).mean()
 rolling_std = data.rolling(window=window).std()

 upper_control_limit = rolling_mean + 3 * rolling_std
 lower_control_limit = rolling_mean - 3 * rolling_std

 spc_anomalies = (data > upper_control_limit) | (data < lower_control_limit)

 return spc_anomalies, upper_control_limit, lower_control_limit

spc_anomalies, ucl, lcl = spc_control_limits(timeseries_df['value'])
timeseries_df['spc_anomaly'] = spc_anomalies
timeseries_df['ucl'] = ucl
timeseries_df['lcl'] = lcl

print("Time Series Anomaly Detection Results:")
print(f"Seasonal decomposition anomalies: {np.sum(ts_anomalies)} ({np.sum(ts_anomalies)/len(timeseries_df)*100:.1f}%)")
print(f"SPC anomalies: {np.sum(spc_anomalies)} ({np.sum(spc_anomalies)/len(timeseries_df)*100:.1f}%)")

# Evaluate time series methods
ts_precision_seasonal = precision_score(timeseries_df['is_anomaly'], timeseries_df['detected_anomaly'], zero_division=0)
ts_recall_seasonal = recall_score(timeseries_df['is_anomaly'], timeseries_df['detected_anomaly'], zero_division=0)
ts_f1_seasonal = f1_score(timeseries_df['is_anomaly'], timeseries_df['detected_anomaly'], zero_division=0)

ts_precision_spc = precision_score(timeseries_df['is_anomaly'], timeseries_df['spc_anomaly'], zero_division=0)
ts_recall_spc = recall_score(timeseries_df['is_anomaly'], timeseries_df['spc_anomaly'], zero_division=0)
ts_f1_spc = f1_score(timeseries_df['is_anomaly'], timeseries_df['spc_anomaly'], zero_division=0)

print(f"\nTime Series Performance Evaluation:")
print("Method Precision Recall F1-Score")
print("-" * 55)
print(f"Seasonal Decomposition {ts_precision_seasonal:.3f} {ts_recall_seasonal:.3f} {ts_f1_seasonal:.3f}")
print(f"SPC Control Limits {ts_precision_spc:.3f} {ts_recall_spc:.3f} {ts_f1_spc:.3f}")

In [None]:
# 3. INTERACTIVE STATISTICAL ANOMALY VISUALIZATIONS
print("3. INTERACTIVE STATISTICAL ANOMALY VISUALIZATIONS")
print("=" * 51)

# Create comprehensive statistical anomaly detection dashboard
fig = make_subplots(
 rows=3, cols=2,
 subplot_titles=[
 'Fraud Detection: Transaction Amounts with Statistical Methods',
 'Multivariate Anomaly Detection (Mahalanobis Distance)',
 'Method Performance Comparison',
 'Time Series with Seasonal Decomposition',
 'Statistical Process Control (SPC) Chart',
 'Manufacturing Quality Control (Multivariate)'
 ],
 specs=[[{"secondary_y": False}, {"secondary_y": False}],
 [{"secondary_y": False}, {"secondary_y": False}],
 [{"secondary_y": False}, {"secondary_y": False}]]
)

# 1. Transaction amounts with statistical thresholds
# Normal transactions
normal_transactions = fraud_df[fraud_df['is_fraud'] == 0]
fraud_transactions = fraud_df[fraud_df['is_fraud'] == 1]

fig.add_trace(
 go.Scatter(
 x=normal_transactions.index,
 y=normal_transactions['amount'],
 mode='markers',
 name='Normal Transactions',
 marker=dict(color='blue', size=4, opacity=0.6),
 hovertemplate='Amount: $%{y:.2f}<br>Z-score: %{customdata:.2f}<extra></extra>',
 customdata=normal_transactions['z_score']
 ),
 row=1, col=1
)

fig.add_trace(
 go.Scatter(
 x=fraud_transactions.index,
 y=fraud_transactions['amount'],
 mode='markers',
 name='Fraudulent Transactions',
 marker=dict(color='red', size=6, opacity=0.8),
 hovertemplate='FRAUD: $%{y:.2f}<br>Z-score: %{customdata:.2f}<extra></extra>',
 customdata=fraud_transactions['z_score']
 ),
 row=1, col=1
)

# Add IQR bounds
fig.add_hline(y=iqr_upper, line=dict(color='orange', dash='dash'),
 annotation_text=f"IQR Upper: ${iqr_upper:.0f}", row=1, col=1)
fig.add_hline(y=iqr_lower, line=dict(color='orange', dash='dash'),
 annotation_text=f"IQR Lower: ${iqr_lower:.0f}", row=1, col=1)

# 2. Mahalanobis distance visualization
colors_mahal = ['red' if anomaly else 'blue' for anomaly in fraud_df['mahal_anomaly']]
sizes_mahal = [8 if fraud else 4 for fraud in fraud_df['is_fraud']]

fig.add_trace(
 go.Scatter(
 x=fraud_df['amount'],
 y=fraud_df['mahalanobis_distance'],
 mode='markers',
 name='Mahalanobis Distance',
 marker=dict(color=colors_mahal, size=sizes_mahal, opacity=0.7),
 text=[f"{'FRAUD' if fraud else 'Normal'}" for fraud in fraud_df['is_fraud']],
 hovertemplate='%{text}<br>Amount: $%{x:.2f}<br>Mahal Distance: %{y:.2f}<extra></extra>'
 ),
 row=1, col=2
)

fig.add_hline(y=mahal_threshold, line=dict(color='green', dash='dash'),
 annotation_text=f"Threshold: {mahal_threshold:.2f}", row=1, col=2)

# 3. Performance comparison bar chart
methods_list = list(performance_results.keys())
f1_scores = [performance_results[method]['f1'] for method in methods_list]
precisions = [performance_results[method]['precision'] for method in methods_list]
recalls = [performance_results[method]['recall'] for method in methods_list]

fig.add_trace(
 go.Bar(x=methods_list, y=f1_scores, name='F1-Score',
 marker_color='green', opacity=0.7),
 row=2, col=1
)

fig.add_trace(
 go.Bar(x=methods_list, y=precisions, name='Precision',
 marker_color='blue', opacity=0.7),
 row=2, col=1
)

fig.add_trace(
 go.Bar(x=methods_list, y=recalls, name='Recall',
 marker_color='orange', opacity=0.7),
 row=2, col=1
)

# 4. Time series with seasonal decomposition
fig.add_trace(
 go.Scatter(
 x=timeseries_df['date'],
 y=timeseries_df['value'],
 mode='lines+markers',
 name='Time Series',
 line=dict(color='blue', width=2),
 marker=dict(size=4)
 ),
 row=2, col=2
)

# Highlight anomalies
anomaly_dates = timeseries_df[timeseries_df['is_anomaly'] == 1]
fig.add_trace(
 go.Scatter(
 x=anomaly_dates['date'],
 y=anomaly_dates['value'],
 mode='markers',
 name='True Anomalies',
 marker=dict(color='red', size=10, symbol='x'),
 hovertemplate='True Anomaly<br>Date: %{x}<br>Value: %{y:.1f}<extra></extra>'
 ),
 row=2, col=2
)

detected_anomalies = timeseries_df[timeseries_df['detected_anomaly'] == True]
fig.add_trace(
 go.Scatter(
 x=detected_anomalies['date'],
 y=detected_anomalies['value'],
 mode='markers',
 name='Detected Anomalies',
 marker=dict(color='orange', size=8, symbol='circle-open'),
 hovertemplate='Detected Anomaly<br>Date: %{x}<br>Value: %{y:.1f}<extra></extra>'
 ),
 row=2, col=2
)

# 5. SPC Control Chart
fig.add_trace(
 go.Scatter(
 x=timeseries_df['date'],
 y=timeseries_df['value'],
 mode='lines+markers',
 name='Process Values',
 line=dict(color='blue', width=2),
 marker=dict(size=3)
 ),
 row=3, col=1
)

fig.add_trace(
 go.Scatter(
 x=timeseries_df['date'],
 y=timeseries_df['ucl'],
 mode='lines',
 name='Upper Control Limit',
 line=dict(color='red', dash='dash', width=2)
 ),
 row=3, col=1
)

fig.add_trace(
 go.Scatter(
 x=timeseries_df['date'],
 y=timeseries_df['lcl'],
 mode='lines',
 name='Lower Control Limit',
 line=dict(color='red', dash='dash', width=2)
 ),
 row=3, col=1
)

# SPC anomalies
spc_anomaly_data = timeseries_df[timeseries_df['spc_anomaly'] == True]
fig.add_trace(
 go.Scatter(
 x=spc_anomaly_data['date'],
 y=spc_anomaly_data['value'],
 mode='markers',
 name='SPC Anomalies',
 marker=dict(color='red', size=10, symbol='diamond'),
 hovertemplate='SPC Anomaly<br>Date: %{x}<br>Value: %{y:.1f}<extra></extra>'
 ),
 row=3, col=1
)

# 6. Manufacturing quality control (2D projection)
normal_products = quality_df[quality_df['is_defective'] == 0]
defective_products = quality_df[quality_df['is_defective'] == 1]

fig.add_trace(
 go.Scatter(
 x=normal_products['dimension_1'],
 y=normal_products['weight'],
 mode='markers',
 name='Normal Products',
 marker=dict(color='green', size=4, opacity=0.6),
 hovertemplate='Normal<br>Dimension 1: %{x:.1f}mm<br>Weight: %{y:.1f}g<extra></extra>'
 ),
 row=3, col=2
)

fig.add_trace(
 go.Scatter(
 x=defective_products['dimension_1'],
 y=defective_products['weight'],
 mode='markers',
 name='Defective Products',
 marker=dict(color='red', size=6, opacity=0.8),
 hovertemplate='DEFECTIVE<br>Dimension 1: %{x:.1f}mm<br>Weight: %{y:.1f}g<extra></extra>'
 ),
 row=3, col=2
)

# Update layout
fig.update_layout(
 height=1200,
 title="Statistical Anomaly Detection Methods Dashboard",
 showlegend=True
)

# Update axis labels
fig.update_xaxes(title_text="Transaction Index", row=1, col=1)
fig.update_xaxes(title_text="Transaction Amount ($)", row=1, col=2)
fig.update_xaxes(title_text="Detection Method", row=2, col=1)
fig.update_xaxes(title_text="Date", row=2, col=2)
fig.update_xaxes(title_text="Date", row=3, col=1)
fig.update_xaxes(title_text="Dimension 1 (mm)", row=3, col=2)

fig.update_yaxes(title_text="Transaction Amount ($)", row=1, col=1)
fig.update_yaxes(title_text="Mahalanobis Distance", row=1, col=2)
fig.update_yaxes(title_text="Score", row=2, col=1)
fig.update_yaxes(title_text="Value", row=2, col=2)
fig.update_yaxes(title_text="Process Value", row=3, col=1)
fig.update_yaxes(title_text="Weight (g)", row=3, col=2)

fig.show()

# Business insights and applications
print(f"\nSTATISTICAL ANOMALY DETECTION INSIGHTS:")

# Financial fraud insights
total_transaction_value = fraud_df['amount'].sum()
fraud_transaction_value = fraud_df[fraud_df['is_fraud'] == 1]['amount'].sum()
fraud_percentage = fraud_transaction_value / total_transaction_value

print(f"\nFinancial Fraud Analysis:")
print(f"• Total transaction value: ${total_transaction_value:,.0f}")
print(f"• Fraudulent transaction value: ${fraud_transaction_value:,.0f}")
print(f"• Fraud value percentage: {fraud_percentage*100:.2f}%")
print(f"• Best detection method: {best_method} (F1: {performance_results[best_method]['f1']:.3f})")

# Manufacturing quality insights
defect_rate = quality_df['is_defective'].mean()
total_products_value = len(quality_df) * 100 # $100 per product
defect_cost = quality_df['is_defective'].sum() * 500 # $500 cost per defect

print(f"\nManufacturing Quality Analysis:")
print(f"• Overall defect rate: {defect_rate*100:.2f}%")
print(f"• Total production value: ${total_products_value:,.0f}")
print(f"• Defect cost impact: ${defect_cost:,.0f}")

# ROI calculation
fraud_prevention_value = fraud_transaction_value * 0.80 # Prevent 80% of fraud
quality_improvement_value = defect_cost * 0.70 # Prevent 70% of defects
monitoring_system_cost = 250_000 # Implementation cost
annual_operational_cost = 50_000 # Annual monitoring cost

total_benefits = fraud_prevention_value + quality_improvement_value
net_annual_benefits = total_benefits - annual_operational_cost
roi = (net_annual_benefits - monitoring_system_cost) / monitoring_system_cost

print(f"\nSTATISTICAL ANOMALY DETECTION ROI:")
print(f"• Fraud prevention value: ${fraud_prevention_value:,.0f}")
print(f"• Quality improvement value: ${quality_improvement_value:,.0f}")
print(f"• Total annual benefits: ${total_benefits:,.0f}")
print(f"• Implementation cost: ${monitoring_system_cost:,.0f}")
print(f"• Annual operational cost: ${annual_operational_cost:,.0f}")
print(f"• Net annual benefits: ${net_annual_benefits:,.0f}")
print(f"• ROI: {roi*100:.0f}%")
print(f"• Payback period: {monitoring_system_cost/net_annual_benefits*12:.1f} months")

print(f"\nCross-Reference Learning Path:")
print(f"• Foundation: Tier3_Statistics.ipynb (statistical concepts)")
print(f"• Comparison: Tier6_IsolationForest.ipynb (tree-based anomaly detection)")
print(f"• Advanced: Tier6_OneClassSVM.ipynb (boundary-based methods)")
print(f"• Specialized: Advanced_TimeSeriesAnomaly.ipynb (time series focus)")