# Chapter 1b: Temporal Quality Assessment (Event Bronze Track)

**Purpose:** Run quality checks specific to event-level datasets to identify data issues before feature engineering.

**When to use this notebook:**
- After completing 01a_temporal_deep_dive.ipynb
- Your dataset is EVENT_LEVEL granularity
- You want to validate temporal data integrity before aggregation

| Check | What It Detects | Why It Matters for ML |
|-------|-----------------|----------------------|
| **TQ001** | Duplicate events (same entity + timestamp) | Inflates counts, skews aggregations, creates artificial sequence patterns |
| **TQ002** | Unexpected temporal gaps | Rolling features become misleading; "events in last 30d" drops during gaps |
| **TQ003** | Future dates | Data leakage ‚Äî model sees future during training |
| **TQ004** | Ambiguous event ordering | Sequence features undefined when multiple events share timestamp |

## 1b.1 Load Findings and Data

In [None]:
from pathlib import Path
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots

from customer_retention.analysis.auto_explorer import ExplorationFindings, RecommendationEngine
from customer_retention.analysis.visualization import ChartBuilder, display_figure
from customer_retention.core.config.column_config import ColumnType
from customer_retention.stages.profiling import (
    DuplicateEventCheck, TemporalGapCheck, FutureDateCheck, EventOrderCheck,
    TemporalQualityReporter, SegmentAwareOutlierAnalyzer
)
from customer_retention.stages.temporal import load_data_with_snapshot_preference

In [None]:
FINDINGS_DIR = Path("../experiments/findings")
findings_files = sorted(
    [f for f in FINDINGS_DIR.glob("*_findings.yaml") if "multi_dataset" not in f.name],
    key=lambda f: f.stat().st_mtime, reverse=True
)
if not findings_files:
    raise FileNotFoundError(f"No findings in {FINDINGS_DIR}. Run notebook 01 first.")

FINDINGS_PATH = str(findings_files[0])
findings = ExplorationFindings.load(FINDINGS_PATH)
print(f"Using: {FINDINGS_PATH}")

ts_meta = findings.time_series_metadata
ENTITY_COLUMN, TIME_COLUMN = ts_meta.entity_column, ts_meta.time_column
print(f"Entity: {ENTITY_COLUMN}, Time: {TIME_COLUMN}")

df, data_source = load_data_with_snapshot_preference(findings, output_dir="../experiments/findings")
charts = ChartBuilder()
print(f"Loaded {len(df):,} rows ({data_source})")

## 1b.2 Configure Quality Checks

In [None]:
REFERENCE_DATE = pd.Timestamp.now()  # or pd.Timestamp("2024-01-01")
EXPECTED_FREQUENCY = "D"  # D=daily, W=weekly, M=monthly, H=hourly
MAX_GAP_MULTIPLE = 3.0

print(f"Reference: {REFERENCE_DATE.date()}, Frequency: {EXPECTED_FREQUENCY}, Gap threshold: {MAX_GAP_MULTIPLE}x")

## 1b.3 Run Temporal Quality Checks

| Issue Type | ML Impact | Mitigation |
|------------|-----------|------------|
| Duplicates | Sum/count features inflated; artificial patterns in sequences | Deduplicate or add sequence index |
| Gaps | Rolling aggregations drop; recency features spike | Document gaps; add gap indicator feature |
| Future dates | Model trains on leaked future info | Filter to reference date; check timezone handling |
| Ordering | "Previous event" features undefined | Add tiebreaker column; use stable sort |

In [None]:
checks = [
    DuplicateEventCheck(entity_column=ENTITY_COLUMN, time_column=TIME_COLUMN),
    TemporalGapCheck(time_column=TIME_COLUMN, expected_frequency=EXPECTED_FREQUENCY, max_gap_multiple=MAX_GAP_MULTIPLE),
    FutureDateCheck(time_column=TIME_COLUMN, reference_date=REFERENCE_DATE),
    EventOrderCheck(entity_column=ENTITY_COLUMN, time_column=TIME_COLUMN),
]
results = [check.run(df) for check in checks]
reporter = TemporalQualityReporter(results, len(df))
reporter.print_results()

## 1b.4 Quality Score

| Component | Weight | Scoring Logic |
|-----------|--------|---------------|
| Each check | 25% | 100 if no issues; deductions proportional to % affected |
| Grade A | 90-100 | Proceed with confidence |
| Grade B | 75-89 | Document issues, proceed with caution |
| Grade C | 60-74 | Address issues before feature engineering |
| Grade D | <60 | Investigation required |

In [None]:
reporter.print_score()
quality_score, grade, passed = reporter.quality_score, reporter.grade, reporter.passed

## 1b.5 Event Volume Analysis

| What to Look For | Indicates | Action |
|-----------------|-----------|--------|
| Missing bars | Data gaps (TQ002) | Document; add gap indicator |
| Declining trend | Population shrinkage or data cutoff | Check if intentional |
| Spikes | Campaigns, seasonality, or data issues | Investigate cause |
| Flat periods | Possible logging outages | Verify with data source |

In [None]:
df_temp = df.copy()
df_temp[TIME_COLUMN] = pd.to_datetime(df_temp[TIME_COLUMN])
time_span = (df_temp[TIME_COLUMN].max() - df_temp[TIME_COLUMN].min()).days

freq, label = ("D", "Daily") if time_span <= 90 else ("W", "Weekly") if time_span <= 365 else ("ME", "Monthly")
counts = df_temp.groupby(pd.Grouper(key=TIME_COLUMN, freq=freq)).size()

fig = go.Figure(go.Bar(x=counts.index, y=counts.values, marker_color="#4682B4"))
fig.update_layout(title=f"{label} Event Volume (gaps = missing bars)", height=300, template="plotly_white")
display_figure(fig)

## 1b.6 Target Distribution (Event Level)

| Imbalance Level | Ratio | Note |
|-----------------|-------|------|
| Mild | <3:1 | Standard methods work |
| Moderate | 3-10:1 | Use stratified splits |
| Severe | >10:1 | Resampling needed (see notebook 07) |

**Note:** This shows event-level distribution. Entity-level distribution (after aggregation) is what matters for modeling ‚Äî analyzed in notebook 07.

In [None]:
if findings.target_column and findings.target_column in df.columns:
    target_counts = df[findings.target_column].value_counts().sort_index()
    labels = [f"{'Retained' if v == 1 else 'Churned'} ({v})" for v in target_counts.index]
    
    fig = make_subplots(rows=1, cols=2, specs=[[{"type": "pie"}, {"type": "bar"}]])
    fig.add_trace(go.Pie(labels=labels, values=target_counts.values, hole=0.4,
        marker_colors=["#e74c3c", "#2ecc71"], textinfo="percent"), row=1, col=1)
    fig.add_trace(go.Bar(x=labels, y=target_counts.values, marker_color=["#e74c3c", "#2ecc71"],
        text=[f"{c:,}" for c in target_counts.values], textposition="inside", showlegend=False), row=1, col=2)
    fig.update_layout(height=350, template="plotly_white", title="Target Distribution (Event Level)")
    display_figure(fig)
    
    if len(target_counts) == 2:
        ratio = target_counts.max() / target_counts.min()
        print(f"\nEvent-level imbalance: {ratio:.1f}:1 (minority: {target_counts.idxmin()})")
        print("‚ÑπÔ∏è Note: Resampling strategies apply after aggregation - see notebook 07")
else:
    print("No target column detected.")

## 1b.7 Outlier Analysis

| Approach | When to Use | Why It Matters |
|----------|-------------|----------------|
| Global detection | Homogeneous data | Simple threshold works |
| Segment-aware | Data has natural groups | Avoids false positives when segments have different scales |

Segment-aware detection clusters entities by target (or other segment) and detects outliers within each group separately.

In [None]:
numeric_cols = [n for n, c in findings.columns.items()
    if c.inferred_type in [ColumnType.NUMERIC_CONTINUOUS, ColumnType.NUMERIC_DISCRETE]
    and n not in [ENTITY_COLUMN, TIME_COLUMN]]

if numeric_cols:
    analyzer = SegmentAwareOutlierAnalyzer(max_segments=5)
    result = analyzer.analyze(df, feature_cols=numeric_cols, segment_col=None, target_col=findings.target_column)
    
    print(f"Segments detected: {result.n_segments}")
    if result.n_segments > 1:
        data = [{"Feature": c, "Global": result.global_analysis[c].outliers_detected,
            "Segment": sum(s[c].outliers_detected for s in result.segment_analysis.values() if c in s)}
            for c in numeric_cols]
        display(pd.DataFrame(data))
        if result.segmentation_recommended:
            print("\nüí° Segment-specific outlier treatment recommended")
    else:
        print("Data appears homogeneous - using global outlier detection")
else:
    print("No numeric columns for outlier analysis.")

## 1b.8 Data Validation

| Check | Issue | Impact |
|-------|-------|--------|
| Binary fields | Values outside {0, 1} | Model crashes or silent errors |
| String consistency | Case/spacing variants ("Yes" vs "yes") | Inflated cardinality; split categories |
| Missing patterns | Systematic missingness | Bias in imputation |

In [None]:
# Binary field validation
binary_cols = [n for n, c in findings.columns.items() if c.inferred_type == ColumnType.BINARY]
for col in binary_cols:
    c0, c1 = (df[col] == 0).sum(), (df[col] == 1).sum()
    print(f"‚úì {col}: 0={c0:,} ({c0/(c0+c1)*100:.1f}%), 1={c1:,} ({c1/(c0+c1)*100:.1f}%)")

# Consistency check
issues = []
for col in df.select_dtypes(include=['object']).columns:
    if col in [ENTITY_COLUMN, TIME_COLUMN]: continue
    variants = {}
    for v in df[col].dropna().unique():
        key = str(v).lower().strip()
        variants.setdefault(key, []).append(v)
    issues.extend([{"Column": col, "Variants": vs} for vs in variants.values() if len(vs) > 1])

print(f"\n{'‚ö†Ô∏è Consistency issues: ' + str(len(issues)) if issues else '‚úÖ No consistency issues'}")

## 1b.9 Recommendations

Framework-generated recommendations based on column-level issues detected during exploration.

In [None]:
rec_engine = RecommendationEngine()
recs = rec_engine.recommend_cleaning(findings)

if recs:
    for r in sorted(recs, key=lambda x: {"high": 0, "medium": 1, "low": 2}.get(x.severity, 3)):
        icon = {"high": "üî¥", "medium": "üü°", "low": "üü¢"}.get(r.severity, "‚ö™")
        print(f"{icon} [{r.severity.upper()}] {r.column_name}: {r.description}")
        label = r.strategy_label if r.strategy_label else r.strategy.replace("_", " ").title()
        print(f"   Strategy: {label}")
else:
    print("‚úÖ No critical cleaning recommendations")

## 1b.10 Save Results

In [None]:
if not findings.metadata:
    findings.metadata = {}
findings.metadata["temporal_quality"] = reporter.to_dict()
findings.save(FINDINGS_PATH)
print(f"Saved to: {FINDINGS_PATH}")
print(f"Score: {quality_score:.0f}/100 (Grade {grade})")

---

## Summary: What We Learned

In this notebook, we validated temporal data quality:

1. **Temporal Quality Checks** ‚Äî Detected duplicates, gaps, future dates, ordering issues
2. **Quality Score** ‚Äî Quantified overall data health with pass/fail grading
3. **Event Volume** ‚Äî Visualized data coverage over time
4. **Target Distribution** ‚Äî Checked event-level class balance
5. **Outlier Analysis** ‚Äî Compared global vs segment-aware detection
6. **Data Validation** ‚Äî Verified binary fields and string consistency

## Quality Score Interpretation

| Grade | Score | Meaning | Action |
|-------|-------|---------|--------|
| A | 90-100 | Excellent | Proceed with confidence |
| B | 75-89 | Good | Document issues, proceed |
| C | 60-74 | Fair | Address issues before aggregation |
| D | <60 | Poor | Investigation required |

---

## Next Steps

Continue with the **Event Bronze Track**:

1. **01c_temporal_patterns.ipynb** ‚Äî Detect trends, seasonality, cohort effects
2. **01d_event_aggregation.ipynb** ‚Äî Aggregate events to entity-level features

After 01d, continue with **Entity Bronze Track** (02 ‚Üí 03 ‚Üí 04) on aggregated data.