## Biometric Update Stress & Anomaly Analysis

### Overview
This analysis focuses on biometric Aadhaar updates to identify
system stress periods and anomalous spikes that may correspond
to policy changes, deadlines, or infrastructure strain.

---

### Objectives
- Analyze biometric update volumes over time
- Detect abnormal spikes using machine learning
- Support proactive infrastructure and manpower planning

---

### Methodology
- Biometric update data is loaded from UIDAI ZIP files
- Temporal aggregation is performed at yearly level
- Isolation Forest is used for anomaly detection
- Visual trends are plotted for interpretability

---

### Key Outputs
- Year-wise biometric update volume
- Anomaly flags highlighting abnormal years

The analysis is resilient to missing or inconsistent columns.


In [None]:
# ============================================================
# BIOMETRIC & ANOMALY ANALYSIS – FINAL SINGLE CELL
# ============================================================

import sys, os
sys.path.append(os.path.abspath(".."))

import pandas as pd
import matplotlib.pyplot as plt

plt.rcParams["figure.autolayout"] = True

try:
    from IPython.display import display
except ImportError:
    def display(x): print(x)

from src.data_loader import load_uidai_zip
from src.data_cleaning import clean_dataframe
from src.feature_engineering import add_time_features
from src.anomaly_detection import detect_anomalies

print("Loading biometric update data...")

bio_df = load_uidai_zip("../data/raw/api_data_aadhar_biometric.zip")
bio_df = clean_dataframe(bio_df)

# ---------- HANDLE DATE ----------
possible_date_cols = ["date", "update_date", "created_date"]
date_col = next((c for c in possible_date_cols if c in bio_df.columns), None)

if date_col:
    bio_df = add_time_features(bio_df, date_col)
else:
    raise RuntimeError("No date column found in biometric dataset")

display(bio_df.head())

# ---------- YEARLY AGGREGATION ----------
numeric_cols = bio_df.select_dtypes("number").columns
value_col = numeric_cols[0]

bio_yearly = bio_df.groupby("year")[value_col].sum().reset_index()
print("\nYear-wise biometric updates:")
display(bio_yearly)

# ---------- ANOMALY DETECTION ----------
bio_yearly = detect_anomalies(bio_yearly, value_col)
print("\nAnomaly detection results:")
display(bio_yearly)

# ---------- VISUALIZATION ----------
plt.figure(figsize=(10,5))
plt.plot(bio_yearly["year"], bio_yearly[value_col], marker="o")
plt.title("Biometric Update Volume Over Time")
plt.xlabel("Year")
plt.ylabel("Update Count")
plt.show()

print("✅ Biometric & anomaly analysis completed successfully.")
