# 04 – Anomaly Detection & Quality Signal Generation

This notebook loads the weekly device-level time series and applies:
- Z-score anomaly detection (statistical spike detection)
- Isolation Forest (machine-learning based anomaly detection)
- Combined "strong_signal" flag
- Export of a Tableau-ready Quality Signals dataset

These outputs will power the final Tableau dashboard.


In [1]:
# Load Weekly dataset

import pandas as pd
from pathlib import Path

weekly_path = Path("/Volumes/Personal Drive/GitHub/Proactive-Device-Quality-Signal-Detection/Dataset/weekly_complaint_timeseries.csv")

weekly = pd.read_csv(weekly_path, parse_dates=["week_start"])
weekly.head()


Unnamed: 0,device_brand,device_model,week_start,total_reviews,total_complaints,complaint_rate,roll_mean,roll_std
0,Apple,iPhone 13,2022-10-24,17,2,0.117647,,
1,Apple,iPhone 13,2022-10-31,11,3,0.272727,,
2,Apple,iPhone 13,2022-11-07,16,5,0.3125,0.234291,0.102956
3,Apple,iPhone 13,2022-11-14,9,5,0.555556,0.314607,0.181299
4,Apple,iPhone 13,2022-11-21,7,3,0.428571,0.3374,0.165074


## Z-Score Anomaly Detection

We compute:

\[
z = \frac{\text{complaint\_rate} - \text{roll\_mean}}{\text{roll\_std}}
\]

A week is considered anomalous when:
- The spike is statistically large (`z_score > 2.0`)
- There are enough historical data points (rolling mean/std already handles this)

Z-score anomalies are simple, reliable, and highly interpretable.


In [3]:
import numpy as np

# Compute Z-score
weekly["z_score"] = (
    (weekly["complaint_rate"] - weekly["roll_mean"]) / weekly["roll_std"]
)

# Replace NaN / inf values with 0
weekly["z_score"] = weekly["z_score"].replace([np.inf, -np.inf], 0).fillna(0)

# Flag anomalies
weekly["z_anomaly"] = weekly["z_score"] > 2.0   # threshold can be tuned

weekly[["device_brand", "device_model", "week_start", "complaint_rate", "roll_mean", "roll_std", "z_score", "z_anomaly"]].head(10)


Unnamed: 0,device_brand,device_model,week_start,complaint_rate,roll_mean,roll_std,z_score,z_anomaly
0,Apple,iPhone 13,2022-10-24,0.117647,,,0.0,False
1,Apple,iPhone 13,2022-10-31,0.272727,,,0.0,False
2,Apple,iPhone 13,2022-11-07,0.3125,0.234291,0.102956,0.759632,False
3,Apple,iPhone 13,2022-11-14,0.555556,0.314607,0.181299,1.329011,False
4,Apple,iPhone 13,2022-11-21,0.428571,0.3374,0.165074,0.552304,False
5,Apple,iPhone 13,2022-11-28,0.25,0.322834,0.151897,-0.479493,False
6,Apple,iPhone 13,2022-12-05,0.285714,0.350845,0.118264,-0.55072,False
7,Apple,iPhone 13,2022-12-12,0.666667,0.416501,0.165957,1.50741,False
8,Apple,iPhone 13,2022-12-19,0.153846,0.390059,0.195798,-1.206409,False
9,Apple,iPhone 13,2022-12-26,0.545455,0.388376,0.194127,0.809156,False


## Isolation Forest Anomaly Detection

Isolation Forest is a machine-learning algorithm that isolates unusual 
observations by random partitioning. It works well for nonlinear patterns.

We feed it multiple signals:

- complaint_rate  
- total_complaints  
- total_reviews  
- roll_mean  
- roll_std  

The model marks each point as:
- `1` → normal  
- `-1` → anomaly  


In [4]:
from sklearn.ensemble import IsolationForest
import numpy as np

# Features used by Isolation Forest
features = ["complaint_rate", "total_complaints", "total_reviews", "roll_mean", "roll_std"]

X = weekly[features].fillna(0)

iso = IsolationForest(
    contamination=0.05,   # ~5% anomalies
    random_state=42
)

weekly["iso_pred"] = iso.fit_predict(X)
weekly["iso_anomaly"] = weekly["iso_pred"] == -1

weekly[["complaint_rate", "total_complaints", "iso_pred", "iso_anomaly"]].head(10)




Unnamed: 0,complaint_rate,total_complaints,iso_pred,iso_anomaly
0,0.117647,2,-1,True
1,0.272727,3,-1,True
2,0.3125,5,1,False
3,0.555556,5,1,False
4,0.428571,3,1,False
5,0.25,3,1,False
6,0.285714,2,1,False
7,0.666667,8,1,False
8,0.153846,2,1,False
9,0.545455,6,1,False


## Combined Signal: “strong_signal”

We define a **strong quality signal** as:

- The Z-score detector marks the week as anomalous **AND**
- The Isolation Forest model also marks it anomalous

This drastically reduces false positives and surfaces *truly meaningful* spikes.


In [5]:
weekly["strong_signal"] = weekly["z_anomaly"] & weekly["iso_anomaly"]

weekly[["device_brand", "device_model", "week_start", "z_anomaly", "iso_anomaly", "strong_signal"]].head(15)


Unnamed: 0,device_brand,device_model,week_start,z_anomaly,iso_anomaly,strong_signal
0,Apple,iPhone 13,2022-10-24,False,True,False
1,Apple,iPhone 13,2022-10-31,False,True,False
2,Apple,iPhone 13,2022-11-07,False,False,False
3,Apple,iPhone 13,2022-11-14,False,False,False
4,Apple,iPhone 13,2022-11-21,False,False,False
5,Apple,iPhone 13,2022-11-28,False,False,False
6,Apple,iPhone 13,2022-12-05,False,False,False
7,Apple,iPhone 13,2022-12-12,False,False,False
8,Apple,iPhone 13,2022-12-19,False,False,False
9,Apple,iPhone 13,2022-12-26,False,False,False


## Export Tableau-Ready Quality Signals Dataset

This file will be used for the Tableau dashboard:
- Weekly complaint rate
- Rolling stats
- Z-score anomalies
- Isolation Forest anomalies
- Strong signal indicator


In [6]:
output_path = Path("/Volumes/Personal Drive/GitHub/Proactive-Device-Quality-Signal-Detection/Dataset/quality_signals_tableau.csv")

weekly.to_csv(output_path, index=False)

output_path


PosixPath('/Volumes/Personal Drive/GitHub/Proactive-Device-Quality-Signal-Detection/Dataset/quality_signals_tableau.csv')