# Aadhaar Service Activity Classification & Intervention Framework

## Overview
This notebook implements a **percentile-based classification framework**
using the Enrolment-to-Update Ratio (EUR) combined with **stability analysis**
to categorise regions and recommend **appropriate intervention types**.

The analysis is strictly based on **aggregated transactional data**
and does not infer individual compliance or unmet demand.

In [1]:
import pandas as pd
import numpy as np
import os
import sys

# Diagnostic Info
print(f"Python Version: {sys.version}")
print(f"Current Working Directory: {os.getcwd()}")

# Robust Path Detection
FILENAME = "master_aadhaar_data_fully_cleaned.csv"
possible_paths = [
    FILENAME,
    os.path.join("..", FILENAME),
    os.path.join("Desktop", "UIDAI Data Hackathon 2 ", FILENAME),
    os.path.join(os.path.expanduser("~"), "Desktop", "UIDAI Data Hackathon 2 ", FILENAME)
]

DATA_PATH = None
for path in possible_paths:
    if os.path.exists(path):
        DATA_PATH = path
        print(f"Found data at: {DATA_PATH}")
        break

if not DATA_PATH:
    print("CRITICAL: Could not find the data file. Please ensure it is in the same folder as this notebook or the parent folder.")
    # Fallback to absolute path manually if needed

if DATA_PATH:
    OUTPUT_PATH = "final_aadhaar_intervention_classification.csv" if os.path.dirname(DATA_PATH) == "" else os.path.join(os.path.dirname(DATA_PATH), "final_aadhaar_intervention_classification.csv")

Python Version: 3.12.8 (v3.12.8:2dc476bcb91, Dec  3 2024, 14:43:19) [Clang 13.0.0 (clang-1300.0.29.30)]
Current Working Directory: /Users/shreyasgurav/Desktop/UIDAI Data Hackathon 2 /notebooks
Found data at: ../master_aadhaar_data_fully_cleaned.csv


## Data Loading
We start by loading the master dataset which contains the pre-calculated `update_to_enrolment_ratio` (EUR).

In [2]:
if not DATA_PATH:
    print("Error: No DATA_PATH found. Skipping load.")
else:
    df = pd.read_csv(DATA_PATH)
    print(f"Successfully loaded {len(df)} records.")
    df['EUR'] = df['update_to_enrolment_ratio']

Successfully loaded 2307730 records.


## Step 3: Threshold Definition

We use **percentile-based thresholds** to define stress levels. This avoids arbitrary cut-offs and ensures the categorization is relative to the actual data distribution.

**Categories:**
*   **Critical (Top 10%)**: High imbalance, potential service blockage.
*   **Warning (75th - 90th percentile)**: Elevated stress, needs monitoring.
*   **Normal (Below 75th percentile)**: Balanced operations.

*(Note: Thresholds are applied on the aggregated mean EUR per region)*

In [3]:
# Group by Region to get typical behavior
grouped = df.groupby(['state', 'district', 'pincode'])['EUR']
stability_df = grouped.agg(eur_mean='mean', eur_std='std').reset_index()

# Calculate Percentiles
stability_df['eur_percentile'] = stability_df['eur_mean'].rank(pct=True)

# Define Categories (Dynamic 50-20-30 split)
CRITICAL_PCT = 0.70
WARNING_PCT = 0.50

stability_df['eur_category'] = pd.cut(
    stability_df['eur_percentile'],
    bins=[0, WARNING_PCT, CRITICAL_PCT, 1.0],
    labels=['Normal', 'Warning', 'Critical'],
    include_lowest=True
)

print(stability_df['eur_category'].value_counts())

eur_category
Normal      15940
Critical     9564
Name: count, dtype: int64


## Step 4: Relative Stability Analysis

Since the data exhibits high volatility, we use a **Relative Stability** approach rather than fixed thresholds.

**Methodology:**
1.  **Critical Regions**: We identify the **Top 25% most stable** regions within this group as candidates for **Permanent Centres**.
2.  **Warning Regions**: We identify the **Top 50% most stable** regions within this group for **Semi-Permanent Support**.
3.  The remaining volatile regions are assigned to **Mobile Camps** or **Monitoring**.

In [4]:
# Compute CoV
stability_df['eur_cv'] = stability_df['eur_std'] / stability_df['eur_mean'].replace(0, np.nan)

# Dynamic Threshold Calculation
# 1. Permanent Centre Target: Top 25% stability within Critical Group
critical_subset = stability_df[stability_df['eur_category'] == 'Critical']
cov_thresh_critical = critical_subset['eur_cv'].quantile(0.25)

# 2. Semi-Permanent Target: Top 50% stability within Warning Group
warning_subset = stability_df[stability_df['eur_category'] == 'Warning']
cov_thresh_warning = warning_subset['eur_cv'].quantile(0.50)

print(f"Dynamic CoV Thresholds:")
print(f"  Critical (Permanent < {cov_thresh_critical:.2f})")
print(f"  Warning (Semi-Perm < {cov_thresh_warning:.2f})")

def label_stability_relative(row):
    cv = row['eur_cv']
    cat = row['eur_category']
    if pd.isna(cv): return 'Unknown'
    
    # Use the stricter threshold for labeling purposes in the output
    thresh = cov_thresh_critical
    if cv <= thresh: return 'Stable'
    return 'Volatile'

stability_df['stability_label'] = stability_df.apply(label_stability_relative, axis=1)

print(stability_df['stability_label'].value_counts())

Dynamic CoV Thresholds:
  Critical (Permanent < 3.17)
stability_label
Stable      20940
Volatile     9435
Unknown      1505
Name: count, dtype: int64


## Step 5: Classification Logic & Recommendations

We combine the **EUR Level** (Stress) and **Stability Score** to generate actionable infrastructure recommendations.

**Decision Matrix:**

| EUR Stress | Stability | Recommendation |
| :--- | :--- | :--- |
| **Critical** | **Stable** | **Permanent Centre** (Structural fix needed) |
| **Critical** | **Volatile** | **Temporary Mobile Camp** (Agile fix needed) |
| **Warning** | **Stable** | **Semi-Permanent Support** |
| **Warning** | **Volatile** | **Monitor Closely** |
| **Normal** | *Any* | **No Action** |

In [5]:
def get_recommendation_dynamic(row):
    cat = row['eur_category']
    cv = row['eur_cv']
    
    if cat == 'Critical':
        # Top 25% Stable within Critical -> Permanent
        if cv <= cov_thresh_critical: return 'Permanent Centre'
        return 'Temporary Mobile Camp'

    if cat == 'Warning':
        # Top 50% Stable within Warning -> Semi-Permanent
        if cv <= cov_thresh_warning: return 'Semi-Permanent Support'
        return 'Monitor Closely'

    return 'No Action'

stability_df['recommended_action'] = stability_df.apply(get_recommendation_dynamic, axis=1)

print("Final Recommendation Counts:")
print(stability_df['recommended_action'].value_counts())

Final Recommendation Counts:
recommended_action
No Action                 15940
Temporary Mobile Camp      7174
Semi-Permanent Support     3188
Monitor Closely            3188
Permanent Centre           2390
Name: count, dtype: int64


## Results & Export
Exporting the final classified list for visualization and operational planning.

In [6]:
final_cols = [
    'state', 'district', 'pincode',
    'eur_mean', 'eur_std', 'eur_cv',
    'eur_percentile', 'eur_category',
    'stability_label', 'recommended_action'
]

output_df = stability_df[final_cols]
output_df.to_csv(OUTPUT_PATH, index=False)
print(f"Saved results to {OUTPUT_PATH}")
output_df.head()

Saved results to ../final_aadhaar_intervention_classification.csv


Unnamed: 0,state,district,pincode,eur_mean,eur_std,eur_cv,eur_percentile,eur_category,stability_label,recommended_action
0,Andaman and Nicobar Islands,Andamans,744101,175.268346,446.740322,2.548893,0.607089,Warning,Stable,Semi-Permanent Support
1,Andaman and Nicobar Islands,Andamans,744102,10.0,0.0,0.0,0.0867,Normal,Stable,No Action
2,Andaman and Nicobar Islands,Andamans,744103,31.674437,28.27331,0.892622,0.276882,Normal,Stable,No Action
3,Andaman and Nicobar Islands,Andamans,744105,30.876212,23.86329,0.77287,0.27431,Normal,Stable,No Action
4,Andaman and Nicobar Islands,Andamans,744106,29.91342,48.046338,1.60618,0.270797,Normal,Stable,No Action
