## Demographic Analysis of Aadhaar Updates

### Overview
This analysis examines demographic patterns in Aadhaar update data,
focusing on gender, age groups, and their combined interaction.
The objective is to identify inclusion gaps and demographic segments
that place higher demand on Aadhaar update infrastructure.

---

### Objectives
- Analyze gender-wise Aadhaar update distribution
- Study update behavior across age groups
- Perform trivariate analysis (Age × Gender × Updates)
- Identify vulnerable or high-frequency update cohorts

---

### Methodology
- Aadhaar demographic update data is loaded directly from UIDAI ZIP files
- Data is cleaned and standardized
- Aggregations are performed using groupby and pivot tables
- Visualizations are generated for interpretability

---

### Key Outputs
- Gender distribution of Aadhaar updates
- Age × Gender × Update interaction chart

All computations rely on reusable modules from the `src/` directory.


In [None]:
# ============================================================
# DEMOGRAPHIC ANALYSIS – FINAL SINGLE CELL (BULLETPROOF)
# ============================================================

import sys, os
sys.path.append(os.path.abspath(".."))

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

plt.rcParams["figure.autolayout"] = True
sns.set(style="whitegrid")

try:
    from IPython.display import display
except ImportError:
    def display(x): print(x)

from src.data_loader import load_uidai_zip
from src.data_cleaning import clean_dataframe

print("Loading demographic update data...")

demo_df = load_uidai_zip("../data/raw/api_data_aadhar_demographic.zip")
demo_df = clean_dataframe(demo_df)

display(demo_df.head())

# ---------- GENDER ANALYSIS ----------
gender_col = next((c for c in ["gender", "sex"] if c in demo_df.columns), None)

if gender_col:
    gender_summary = demo_df.groupby(gender_col).sum(numeric_only=True)
    print("\nGender-wise Aadhaar updates:")
    display(gender_summary)

    gender_summary.iloc[:,0].plot(
        kind="pie",
        autopct="%1.1f%%",
        figsize=(6,6),
        title="Gender Distribution of Aadhaar Updates"
    )
    plt.ylabel("")
    plt.show()
else:
    print("⚠️ Gender column not found — skipping gender analysis")

# ---------- TRIVARIATE: AGE × GENDER ----------
age_col = next((c for c in ["age_group", "age"] if c in demo_df.columns), None)

if age_col and gender_col:
    pivot = pd.pivot_table(
        demo_df,
        values=demo_df.select_dtypes("number").columns[0],
        index=age_col,
        columns=gender_col,
        aggfunc="sum"
    )

    print("\nAge × Gender × Updates:")
    display(pivot)

    pivot.plot(kind="bar", figsize=(10,5))
    plt.title("Age × Gender × Aadhaar Updates")
    plt.ylabel("Update Count")
    plt.show()
else:
    print("⚠️ Age/Gender columns missing — skipping trivariate analysis")

print("✅ Demographic analysis completed successfully.")
