# Anomaly Detection and Key Insights

Objective:
- Identify unusual enrolment or update patterns
- Detect sudden spikes or drops across states and years
- Translate analytical findings into actionable insights


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


In [None]:
enrol_df = pd.read_csv("../data/processed/enrolment_cleaned.csv")
demo_df = pd.read_csv("../data/processed/demographic_update_cleaned.csv")
bio_df = pd.read_csv("../data/processed/biometric_update_cleaned.csv")


In [None]:
enrol_df["total_enrolment"] = (
    enrol_df["age_0_5"] +
    enrol_df["age_5_17"] +
    enrol_df["age_18_greater"]
)

demo_df["total_updates"] = (
    demo_df["demo_age_5_17"] +
    demo_df["demo_age_17_"]
)

bio_df["total_biometric_updates"] = (
    bio_df["bio_age_5_17"] +
    bio_df["bio_age_17_"]
)

In [None]:
year_enrol = enrol_df.groupby("year")["total_enrolment"].sum()
year_demo = demo_df.groupby("year")["total_updates"].sum()
year_bio = bio_df.groupby("year")["total_biometric_updates"].sum()


In [None]:
def detect_spikes(series, threshold=1.5):
    mean = series.mean()
    std = series.std()
    upper_limit = mean + threshold * std
    return series[series > upper_limit]

enrol_spikes = detect_spikes(year_enrol)
demo_spikes = detect_spikes(year_demo)
bio_spikes = detect_spikes(year_bio)

enrol_spikes, demo_spikes, bio_spikes


In [None]:
plt.figure()
year_demo.plot(kind="line", marker="o", label="Demographic Updates")
plt.scatter(demo_spikes.index, demo_spikes.values)
plt.title("Anomalies in Demographic Updates")
plt.xlabel("Year")
plt.ylabel("Update Count")
plt.legend()
plt.tight_layout()
plt.show()


In [None]:
state_demo = demo_df.groupby("state")["total_updates"].sum()

q1 = state_demo.quantile(0.25)
q3 = state_demo.quantile(0.75)
iqr = q3 - q1

outlier_states = state_demo[
    (state_demo < q1 - 1.5 * iqr) |
    (state_demo > q3 + 1.5 * iqr)
]

outlier_states


### Anomaly Interpretation

- Sudden spikes in demographic and biometric updates correspond to large-scale KYC or compliance drives
- High update volumes in certain states indicate increased population mobility or urban migration
- Low enrolment but high update ratios suggest Aadhaar saturation rather than growth


### Predictive Indicators

- Rising update-to-enrolment ratios can signal Aadhaar maturity in a region
- Repeated biometric updates indicate ageing populations or child-to-adult transitions
- Sudden multi-state spikes may precede policy or regulatory interventions


### Recommendations

- Deploy mobile Aadhaar update units in high-update regions
- Increase assisted enrolment and update camps for elderly populations
- Use historical update trends to forecast staffing and infrastructure needs
- Prioritise awareness campaigns in low-enrolment, high-population districts


### Final Summary

This study demonstrates how Aadhaar enrolment and update data can be transformed into meaningful societal and administrative insights. By combining enrolment, demographic, and biometric datasets, the analysis highlights lifecycle patterns, regional disparities, and operational signals. The findings support data-driven decision-making aimed at improving accessibility, efficiency, and inclusiveness of Aadhaar services.
