# üèõÔ∏è Aadhaar Pulse 2.0
## Unlocking Societal Trends in Aadhaar Enrolment and Updates

---

### UIDAI Data Hackathon 2026

---

## Executive Summary

**Aadhaar Pulse 2.0** treats India's identity ecosystem as a living sensor of socio-economic dynamics. By analyzing enrolment and update patterns across **10 months** and **36 states**, we derive actionable intelligence for:

1. **Service Optimization** - Identifying overloaded service centers
2. **Child Welfare Protection** - Detecting compliance gaps in mandatory biometric updates
3. **Resource Allocation** - Predicting seasonal demand patterns

### Key Findings

| Metric | Finding |
|--------|--------|
| **920** | Districts analyzed after data cleaning |
| **36** | States/UTs covered |
| **5M+** | Total records processed |
| **Delhi** | Most stressed region (59K+ transactions/PIN) |
| **Gujarat** | Highest child compliance risk (4 of top 5 at-risk districts) |


---

## 1. Problem Statement & Approach

### Problem Statement
> "Identify meaningful patterns, trends, anomalies, or predictive indicators and translate them into clear insights or solution frameworks that can support informed decision-making and system improvements."

### Our Approach: The Three Pillars

| Pillar | Metric | Problem Solved |
|--------|--------|----------------|
| **SAI** | Service Pressure Score | Where are centers overwhelmed? |
| **CLCS** | Child Compliance Z-Score | Which children are at risk of ID deactivation? |
| **DIH** | Demand Intensity Heatmap | When should resources be deployed? |

---

## 2. Datasets Used

| Dataset | Records | Columns | Description |
|---------|---------|---------|-------------|
| **Enrolment** | 1,006,029 | date, state, district, pincode, age_0_5, age_5_17, age_18_greater | New Aadhaar registrations |
| **Demographic Updates** | 2,071,700 | date, state, district, pincode, demo_age_5_17, demo_age_17_ | Address/name/DOB changes |
| **Biometric Updates** | 1,861,108 | date, state, district, pincode, bio_age_5_17, bio_age_17_ | Fingerprint/iris/face updates |

**Date Range:** March 2025 - December 2025 (10 months)


In [None]:
# Setup and Imports
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import json
import urllib.request
import warnings
warnings.filterwarnings('ignore')

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:,.2f}'.format)

# Define paths
BASE_PATH = "/Users/balamsanjay/Desktop/UDIAI-DataHackthon/"

print("‚úÖ Setup complete!")


In [None]:
# Load pre-processed clean data
district_df = pd.read_csv(f"{BASE_PATH}aadhaar_pulse_district_clean.csv")
state_df = pd.read_csv(f"{BASE_PATH}aadhaar_pulse_state_clean.csv")
trends_df = pd.read_csv(f"{BASE_PATH}aadhaar_pulse_trends_clean.csv")

print(f"üìä Loaded Data:")
print(f"   Districts: {len(district_df):,} records")
print(f"   States: {len(state_df):,} records")
print(f"   Monthly Trends: {len(trends_df):,} records")
print(f"\nüìã Data Quality Summary:")
print(f"   Unique States: {district_df['state'].nunique()}")
print(f"   Unique Districts: {district_df['district'].nunique()}")
print(f"   Total Volume: {district_df['total_volume'].sum():,.0f} transactions")


---

## 3. Methodology

### 3.1 Data Cleaning & Preprocessing

**Challenges Identified:**
1. State name variations (50+ found ‚Üí normalized to 36 official)
2. District duplicates (Bengaluru/Bangalore, 24 Parganas variations)
3. Garbage data ("100000", "?", "5th cross" as district names)

**Solution:** Created `cleaning_utils.py` with:
- `normalize_state_names()` - Maps all variations to official 36 State/UT names
- `normalize_district_names()` - Consolidates 100+ district name variations

---

## 4. Data Analysis & Visualisation

### 4.1 Pillar 1: Service Accessibility Index (SAI)

**The Problem:** UIDAI knows *where* centers exist, but not *if they're overwhelmed*.

**Our Solution:** Service Pressure Score (SPS) = Total Transactions / Active PIN Codes

**Interpretation:**
- High SPS = Each PIN code serving unsustainable volume = **Service Desert**
- Low SPS = Adequate coverage


In [None]:
# Pillar 1: Top 20 Districts by Service Pressure Score
top_pressure = district_df.nlargest(20, 'sps_score')

fig_sps = px.bar(
    top_pressure,
    x='district',
    y='sps_score',
    color='state',
    title='<b>Top 20 Districts by Service Pressure Score (SAI)</b><br><sup>Higher = More Transactions per PIN Code = Potential Service Bottleneck</sup>',
    labels={'sps_score': 'Service Pressure Score', 'district': 'District'},
    template='plotly_dark',
    color_discrete_sequence=px.colors.qualitative.Set2
)
fig_sps.update_layout(xaxis_tickangle=-45, height=500)
fig_sps.show()

print("\nüìä SAI KEY INSIGHT:")
print("   Delhi is critically overwhelmed - all Delhi districts appear in top 10")
print(f"   North East Delhi: {top_pressure.iloc[0]['sps_score']:,.0f} transactions/PIN")


### 4.2 Pillar 2: Child Lifecycle Compliance Score (CLCS)

**The Problem:** Children enrolled at ages 0-5 *must* update biometrics at ages 5 and 15. Failure leads to:
- ID deactivation
- Exclusion from school meals, scholarships, and welfare schemes

**Our Solution:** Z-Score Relative Benchmarking

- Compliance Share = Biometric Updates (5-17) / Total Child Activity
- Z-Score = (District Share - National Mean) / National Std Dev

**Interpretation:**
- Z-Score < -1.5œÉ = **High Risk Zone** (significantly below national average)
- Z-Score ‚âà 0 = On par with national average


In [None]:
# Pillar 2: Child Risk Scatter Plot
active_districts = district_df[district_df['total_child_activity'] > 1000]

fig_risk = px.scatter(
    active_districts,
    x='total_child_activity',
    y='clcs_zscore',
    color='state',
    size='total_volume',
    hover_name='district',
    title='<b>Child Compliance Risk Map</b><br><sup>Districts below -1.5œÉ are in HIGH RISK zone</sup>',
    labels={'clcs_zscore': 'Compliance Z-Score (œÉ)', 'total_child_activity': 'Total Child Activity'},
    template='plotly_dark',
    height=600
)

# Add risk threshold line
fig_risk.add_hline(y=-1.5, line_dash="dash", line_color="red", annotation_text="HIGH RISK THRESHOLD (-1.5œÉ)")
fig_risk.add_hline(y=0, line_dash="dot", line_color="gray", annotation_text="National Average")
fig_risk.show()

# Show top 10 at-risk districts
print("\n‚ö†Ô∏è TOP 10 AT-RISK DISTRICTS (Urgent Awareness Camps Needed):")
print("=" * 70)
risk_districts = active_districts.nsmallest(10, 'clcs_zscore')[['district', 'state', 'clcs_zscore', 'total_child_activity']]
for _, row in risk_districts.iterrows():
    print(f"   {row['district']:30} ({row['state']:15}) Z-Score: {row['clcs_zscore']:.2f}œÉ")


In [None]:
# India Choropleth Map - State-level Compliance (with J&K and Ladakh)
geojson_url = "https://gist.githubusercontent.com/jbrobst/56c13bbbf9d97d187fea01ca62ea5112/raw/e388c4cae20aa53cb5090210a42ebb9b765c0a36/india_states.geojson"

try:
    with urllib.request.urlopen(geojson_url) as url:
        india_states = json.loads(url.read().decode())
    
    # Map state names to GeoJSON ST_NM property names
    state_name_map = {
        'Andaman & Nicobar Islands': 'Andaman & Nicobar',
        'Dadra & Nagar Haveli': 'Dadra and Nagar Haveli and Daman and Diu',
        'Daman & Diu': 'Dadra and Nagar Haveli and Daman and Diu',
    }
    map_df = state_df.copy()
    map_df['state_geo'] = map_df['state'].map(lambda x: state_name_map.get(x, x))
    
    fig_map = px.choropleth(
        map_df,
        geojson=india_states,
        featureidkey='properties.ST_NM',
        locations='state_geo',
        color='clcs_zscore',
        color_continuous_scale='RdYlGn',
        range_color=[-2, 2],
        title='<b>India: Child Compliance Z-Score by State</b><br><sup>Red = Below National Avg | Green = Above National Avg</sup>',
        template='plotly_dark',
        hover_name='state'
    )
    
    # Show complete India map with J&K and Ladakh
    fig_map.update_geos(
        visible=False,
        fitbounds="locations",
        projection_type="natural earth"
    )
    fig_map.update_layout(
        height=700,
        geo=dict(
            lonaxis_range=[68, 98],  # Longitude range for India
            lataxis_range=[6, 38],   # Latitude range including J&K and Ladakh
        )
    )
    fig_map.show()
    
    print("\nüìä CLCS KEY INSIGHT:")
    print("   Gujarat has a child compliance crisis - 4 of top 5 at-risk districts")
    print("   Recommendation: Deploy 'School Aadhaar Camps' in Gujarat's rural districts")
except Exception as e:
    print(f"Map loading error: {e}")


### 4.3 Pillar 3: Demand Intensity Heatmap (DIH)

**The Problem:** Demand is seasonal, but resource allocation is static.

**Our Solution:** Analyze monthly volume patterns and tag seasonality:
- **School Rush:** June-August (new academic year)
- **Financial Year End:** March-April
- **Year End:** December


In [None]:
# Pillar 3: National Monthly Trend with Seasonality
national_trend = trends_df.groupby(['month', 'season_type'])['volume'].sum().reset_index()
national_trend['month'] = national_trend['month'].astype(str)

fig_trend = px.bar(
    national_trend,
    x='month',
    y='volume',
    color='season_type',
    title='<b>National Activity Volume by Month</b><br><sup>Seasonality patterns detected</sup>',
    labels={'volume': 'Total Transactions', 'month': 'Month'},
    template='plotly_dark',
    color_discrete_map={
        'Normal': '#636EFA',
        'School Rush': '#EF553B',
        'Year End': '#FFA15A',
        'Financial Year End': '#00CC96'
    },
    height=450
)
fig_trend.show()

print("\nüìä DIH KEY INSIGHT:")
print("   June-August shows elevated activity across all regions (School Rush)")
print("   Recommendation: Pre-position Mobile Aadhaar Vans 2 weeks before June")


In [None]:
# District-Month Heatmap (Top 20 by Volume)
top_districts = district_df.nlargest(20, 'total_volume')['district'].tolist()
heatmap_data = trends_df[trends_df['district'].isin(top_districts)]
heatmap_pivot = heatmap_data.pivot_table(index='district', columns='month', values='volume', aggfunc='sum')

fig_heatmap = px.imshow(
    heatmap_pivot,
    labels=dict(x="Month", y="District", color="Volume"),
    title='<b>Demand Intensity Heatmap (Top 20 Districts)</b><br><sup>Darker = Higher activity</sup>',
    template='plotly_dark',
    aspect='auto',
    color_continuous_scale='YlOrRd'
)
fig_heatmap.update_layout(height=600)
fig_heatmap.show()


---

## 5. Trivariate Analysis: Combined Insights

Analyzing the relationship between Service Pressure, Child Compliance, and Volume across states.


In [None]:
# Trivariate: State √ó SPS √ó CLCS
fig_tri = px.scatter(
    state_df,
    x='sps_score',
    y='clcs_zscore',
    size='total_volume',
    color='num_districts',
    hover_name='state',
    title='<b>Trivariate Analysis: Service Pressure vs Child Compliance by State</b><br><sup>Size = Total Volume | Color = Number of Districts</sup>',
    labels={'sps_score': 'Service Pressure Score', 'clcs_zscore': 'Child Compliance Z-Score'},
    template='plotly_dark',
    height=550
)

# Add quadrant lines
fig_tri.add_vline(x=state_df['sps_score'].median(), line_dash="dash", line_color="gray")
fig_tri.add_hline(y=0, line_dash="dash", line_color="gray")
fig_tri.show()

print("\nüìä TRIVARIATE INSIGHT:")
print("   States in bottom-right quadrant (High Stress + Low Compliance) need immediate intervention")
print("   These regions have overwhelmed infrastructure AND falling behind on child updates")


---

## 6. Summary & Recommendations

### Key Findings

| Finding | Impact | Recommendation |
|---------|--------|----------------|
| Delhi districts show 30K-59K transactions/PIN | Service bottleneck | Open 5+ new enrolment centers in East/North Delhi |
| Gujarat has 4 of top 5 at-risk districts | Child welfare crisis | Deploy School Aadhaar Camps in rural Gujarat |
| June-August shows "School Rush" pattern | Predictable demand spike | Pre-position Mobile Vans 2 weeks before June |
| Bihar's Pashchim Champaran worst nationally | Compliance gap | Targeted awareness campaign needed |

### Actionable Recommendations for UIDAI

**Immediate (0-3 months):**
- Deploy additional kits to Delhi's eastern divisions
- Launch awareness campaign in Gujarat's at-risk districts

**Short-term (3-6 months):**
- Implement "School Rush Readiness" protocol for May deployment
- Create child compliance dashboard for real-time monitoring

**Long-term (6-12 months):**
- Develop predictive model for resource allocation
- Integrate with school admission systems for proactive biometric updates

---

## 7. Technical Implementation

### Technology Stack
- **Language:** Python 3.13
- **Data Processing:** Pandas, NumPy
- **Visualization:** Plotly
- **Analysis:** Statistical Z-Score, Time Series

### Code Repository
All code available at: https://github.com/Sanjay-Balam/UIDAI-Data-Hackathon-2026

---

## Thank You!

*"Aadhaar Pulse 2.0 - Turning data into actionable intelligence for a more inclusive India."*
