# Support Intelligence & Risk Monitoring — Anomaly Detection & Alerting (T6)

This notebook demonstrates a **monitoring + alerting layer** for support/product signals.

## Goal
Detect unusual patterns such as:
- spike in ticket volume overall or for a specific queue
- increase in **high-priority** rate
- sudden surge of specific tags (e.g., outages / refunds / login issues)

## Important note about timestamps
Many public ticket datasets do not contain real timestamps.  
To demonstrate monitoring logic, we build a **pseudo-timeline**:
- Option A: generate synthetic `created_at` dates (recommended for a portfolio demo)
- Option B: monitor by **batch windows** (e.g., each 500 tickets = 1 interval)

Outputs:
- alert table `outputs/t6_alerts.csv`
- plots of time series + thresholds


## 0) Imports

In [None]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from dataclasses import dataclass


## 1) Load cleaned dataset (T3)

In [None]:
DATA_DIR = "data/processed"
CSV_PATH = os.path.join(DATA_DIR, "tickets_clean_en.csv")
PARQUET_PATH = os.path.join(DATA_DIR, "tickets_clean_en.parquet")

if os.path.exists(CSV_PATH):
    print("Loading CSV:", CSV_PATH)
    df = pd.read_csv(CSV_PATH)
elif os.path.exists(PARQUET_PATH):
    print("Loading Parquet:", PARQUET_PATH)
    df = pd.read_parquet(PARQUET_PATH)
else:
    raise FileNotFoundError("Run T3 first to create data/processed/tickets_clean_en.csv (recommended).")

print("Loaded shape:", df.shape)
df.head(3)


## 2) Build a pseudo timeline (created_at)

In [None]:
# If dataset doesn't contain real timestamps, create synthetic ones.
# This creates a realistic-looking timeline for monitoring demos.

rng = np.random.default_rng(42)
n = len(df)

# Create dates over ~120 days
start = np.datetime64("2025-01-01")
days = rng.integers(0, 120, size=n)
df["created_at"] = start + days.astype("timedelta64[D]")

# Ensure types
df["created_at"] = pd.to_datetime(df["created_at"])
df["priority_norm"] = df["priority_norm"].astype(str).str.lower().str.strip()
df["queue"] = df["queue"].fillna("Unknown").astype(str)
df["tags_str"] = df["tags_str"].fillna("").astype(str)

df[["created_at"]].head()


## 3) Aggregate metrics by day

We compute daily signals used for detection:
- `tickets_total`
- `tickets_high`
- `high_rate`
- top queue volumes


In [None]:
daily = (
    df.groupby(df["created_at"].dt.date)
      .agg(
          tickets_total=("created_at","size"),
          tickets_high=("priority_norm", lambda s: (s=="high").sum()),
      )
      .reset_index()
      .rename(columns={"created_at":"date"})
)
daily["date"] = pd.to_datetime(daily["date"])
daily["high_rate"] = daily["tickets_high"] / daily["tickets_total"]

daily.head()


## 4) Simple spike detector (rolling z-score)

In [None]:
def rolling_zscore(series, window=14):
    # z-score vs rolling mean/std
    mu = series.rolling(window, min_periods=max(3, window//3)).mean()
    sd = series.rolling(window, min_periods=max(3, window//3)).std(ddof=0)
    z = (series - mu) / (sd.replace(0, np.nan))
    return z.fillna(0.0)

daily["z_tickets"] = rolling_zscore(daily["tickets_total"], window=14)
daily["z_highrate"] = rolling_zscore(daily["high_rate"], window=14)

# thresholds (tune)
Z_TICKETS = 3.0
Z_HIGHRATE = 3.0

daily["alert_tickets_spike"] = daily["z_tickets"] >= Z_TICKETS
daily["alert_highrate_spike"] = daily["z_highrate"] >= Z_HIGHRATE

daily.tail()


## 5) Queue-level monitoring (top queues)

In [None]:
top_queues = df["queue"].value_counts().head(6).index.tolist()

q_daily = (
    df[df["queue"].isin(top_queues)]
    .groupby([df["created_at"].dt.date, "queue"])
    .size()
    .reset_index(name="count")
    .rename(columns={"created_at":"date"})
)
q_daily["date"] = pd.to_datetime(q_daily["date"])

# pivot to wide format
q_wide = q_daily.pivot(index="date", columns="queue", values="count").fillna(0).sort_index()

# compute z-scores per queue
q_z = q_wide.apply(lambda s: rolling_zscore(s, window=14))

# identify spikes
QUEUE_Z = 3.5
q_spikes = (q_z >= QUEUE_Z)
q_spikes.tail()


## 6) Tag surge monitoring (top tags)

In [None]:
# explode tags from tags_str (which looks like 'tagA | tagB')
# Keep only top N tags for monitoring
df["tags_list"] = df["tags_str"].apply(lambda s: [t.strip() for t in str(s).split("|") if t.strip()])

# explode
tags_long = df.explode("tags_list").rename(columns={"tags_list":"tag"})
tags_long = tags_long[tags_long["tag"].notna() & (tags_long["tag"].astype(str).str.len() > 0)].copy()

top_tags = tags_long["tag"].value_counts().head(8).index.tolist()

t_daily = (
    tags_long[tags_long["tag"].isin(top_tags)]
    .groupby([tags_long["created_at"].dt.date, "tag"])
    .size()
    .reset_index(name="count")
    .rename(columns={"created_at":"date"})
)
t_daily["date"] = pd.to_datetime(t_daily["date"])

t_wide = t_daily.pivot(index="date", columns="tag", values="count").fillna(0).sort_index()
t_z = t_wide.apply(lambda s: rolling_zscore(s, window=14))

TAG_Z = 3.5
t_spikes = (t_z >= TAG_Z)
t_spikes.tail()


## 7) Build alerts table

In [None]:
alerts = []

# global alerts
for _, r in daily.iterrows():
    if r["alert_tickets_spike"]:
        alerts.append({"date": r["date"], "signal": "tickets_total", "value": float(r["tickets_total"]), "z": float(r["z_tickets"]), "level": "global"})
    if r["alert_highrate_spike"]:
        alerts.append({"date": r["date"], "signal": "high_rate", "value": float(r["high_rate"]), "z": float(r["z_highrate"]), "level": "global"})

# queue alerts
for d in q_spikes.index:
    for q in q_spikes.columns:
        if bool(q_spikes.loc[d, q]):
            alerts.append({"date": d, "signal": "queue_volume", "value": float(q_wide.loc[d, q]), "z": float(q_z.loc[d, q]), "level": f"queue:{q}"})

# tag alerts
for d in t_spikes.index:
    for tg in t_spikes.columns:
        if bool(t_spikes.loc[d, tg]):
            alerts.append({"date": d, "signal": "tag_volume", "value": float(t_wide.loc[d, tg]), "z": float(t_z.loc[d, tg]), "level": f"tag:{tg}"})

alerts_df = pd.DataFrame(alerts).sort_values(["date","signal","level"])
print("Alerts:", len(alerts_df))
alerts_df.head(20)


## 8) Save alerts

In [None]:
OUT_DIR = "outputs"
os.makedirs(OUT_DIR, exist_ok=True)

out_csv = os.path.join(OUT_DIR, "t6_alerts.csv")
alerts_df.to_csv(out_csv, index=False, encoding="utf-8")
print("Saved alerts:", out_csv)


## 9) Plot — Global signals

In [None]:
# tickets_total with z-score threshold markers
fig, ax = plt.subplots(figsize=(11, 4.8))
ax.plot(daily["date"], daily["tickets_total"], marker="o", linewidth=2)
ax.set_title("Daily Ticket Volume (Global)")
ax.set_xlabel("date")
ax.set_ylabel("tickets_total")

# mark spikes
spike_dates = daily.loc[daily["alert_tickets_spike"], "date"]
spike_vals = daily.loc[daily["alert_tickets_spike"], "tickets_total"]
ax.scatter(spike_dates, spike_vals, s=80, label=f"spike (z≥{Z_TICKETS})")

plt.tight_layout()
plt.show()

# high_rate
fig, ax = plt.subplots(figsize=(11, 4.8))
ax.plot(daily["date"], daily["high_rate"], marker="o", linewidth=2)
ax.set_title("Daily High-Priority Rate (Global)")
ax.set_xlabel("date")
ax.set_ylabel("high_rate")

spike_dates = daily.loc[daily["alert_highrate_spike"], "date"]
spike_vals = daily.loc[daily["alert_highrate_spike"], "high_rate"]
ax.scatter(spike_dates, spike_vals, s=80, label=f"spike (z≥{Z_HIGHRATE})")

plt.tight_layout()
plt.show()


## 10) Plot — Queue volumes (top queues)

In [None]:
for q in top_queues:
    fig, ax = plt.subplots(figsize=(11, 3.8))
    ax.plot(q_wide.index, q_wide[q], marker="o", linewidth=2)
    ax.set_title(f"Daily Queue Volume — {q}")
    ax.set_xlabel("date")
    ax.set_ylabel("ticket count")

    spike_idx = q_spikes.index[q_spikes[q]]
    if len(spike_idx) > 0:
        ax.scatter(spike_idx, q_wide.loc[spike_idx, q], s=70, label=f"spike (z≥{QUEUE_Z})")

    plt.tight_layout()
    plt.show()


## 11) Interpretation (publication-ready)

- **Global volume spikes** can indicate incidents/outages or releases causing user friction.
- **High-rate spikes** indicate a change in severity mix (possible production incident).
- **Queue/tag spikes** localize the issue (e.g., login failures, billing refunds).
- Thresholds should be tuned to the organization’s tolerance for missed incidents vs alert fatigue.

Next: integrate this logic into a scheduled job (e.g., daily batch) that writes alerts to a DB and triggers notifications.
