# India's Aadhaar Stress, Vulnerability & Inclusion Observatory

**UIDAI Data Hackathon 2026**

---

## Executive Summary

This notebook presents a comprehensive analysis of India's Aadhaar ecosystem, focusing on:
- **Pattern Detection**: Identifying trends and anomalies in enrolment and update data
- **Vulnerability Assessment**: Quantifying regional fragility and stress levels
- **Predictive Analytics**: Forecasting demand and classifying risk zones
- **Policy Insights**: Providing actionable recommendations for policymakers

---

## 1. Setup and Imports

In [None]:
# Core libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Custom modules
import sys
sys.path.append('../src')

from data_loader import AadhaarDataLoader, quick_data_summary
from feature_engineering import AadhaarFeatureEngineer
from anomaly_detection import AnomalyDetector
from predictive_models import DemandForecaster, RiskClassifier, RegionClusterer
from visualizations import AadhaarVisualizer

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
plt.style.use('seaborn-v0_8-darkgrid')

print("✓ All libraries imported successfully")

## 2. Data Loading and Cleaning

Loading three official UIDAI datasets:
1. **Aadhaar Enrolment Dataset**: New enrolments by age group
2. **Aadhaar Demographic Update Dataset**: Demographic changes (name, DOB, gender, address)
3. **Aadhaar Biometric Update Dataset**: Biometric updates by age group

In [None]:
# Initialize data loader
loader = AadhaarDataLoader(data_dir='../data/raw')

# Load all datasets
print("Loading datasets...\n")
enrolment_df, demographic_df, biometric_df = loader.load_all_data()

print("\n" + "="*80)
print("DATA LOADING COMPLETE")
print("="*80)

In [None]:
# Display dataset summaries
quick_data_summary(enrolment_df, "Enrolment Data")
quick_data_summary(demographic_df, "Demographic Update Data")
quick_data_summary(biometric_df, "Biometric Update Data")

In [None]:
# Save cleaned data
loader.save_cleaned_data(output_dir='../data/processed')
print("✓ Cleaned data saved to ../data/processed/")

## 3. Exploratory Data Analysis (EDA)

Comprehensive analysis across multiple dimensions:
- Univariate analysis
- Bivariate analysis
- Trivariate analysis
- Temporal patterns
- Geographic patterns
- Age-group analysis

### 3.1 Univariate Analysis

In [None]:
# Summary statistics for enrolments
print("ENROLMENT STATISTICS")
print("="*80)
print(enrolment_df[['total_enrolments', 'age_0_5', 'age_5_17', 'age_18_greater']].describe())

# Distribution plots
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

axes[0, 0].hist(enrolment_df['total_enrolments'], bins=50, edgecolor='black', alpha=0.7, color='steelblue')
axes[0, 0].set_title('Total Enrolments Distribution', fontweight='bold', fontsize=12)
axes[0, 0].set_xlabel('Total Enrolments')
axes[0, 0].set_ylabel('Frequency')

axes[0, 1].hist(enrolment_df['age_0_5'], bins=50, edgecolor='black', alpha=0.7, color='coral')
axes[0, 1].set_title('Age 0-5 Enrolments Distribution', fontweight='bold', fontsize=12)
axes[0, 1].set_xlabel('Enrolments (Age 0-5)')
axes[0, 1].set_ylabel('Frequency')

axes[1, 0].hist(enrolment_df['age_5_17'], bins=50, edgecolor='black', alpha=0.7, color='lightgreen')
axes[1, 0].set_title('Age 5-17 Enrolments Distribution', fontweight='bold', fontsize=12)
axes[1, 0].set_xlabel('Enrolments (Age 5-17)')
axes[1, 0].set_ylabel('Frequency')

axes[1, 1].hist(enrolment_df['age_18_greater'], bins=50, edgecolor='black', alpha=0.7, color='gold')
axes[1, 1].set_title('Age 18+ Enrolments Distribution', fontweight='bold', fontsize=12)
axes[1, 1].set_xlabel('Enrolments (Age 18+)')
axes[1, 1].set_ylabel('Frequency')

plt.tight_layout()
plt.savefig('../outputs/figures/univariate_enrolments.png', dpi=300, bbox_inches='tight')
plt.show()

print("\n✓ Univariate analysis complete")

### 3.2 Bivariate Analysis

In [None]:
# Correlation analysis
print("CORRELATION ANALYSIS")
print("="*80)

# Enrolment correlations
enrol_corr = enrolment_df[['age_0_5', 'age_5_17', 'age_18_greater', 'total_enrolments']].corr()

fig, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(enrol_corr, annot=True, fmt='.3f', cmap='coolwarm', center=0, 
            square=True, linewidths=1, cbar_kws={"shrink": 0.8}, ax=ax)
ax.set_title('Enrolment Age Group Correlations', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.savefig('../outputs/figures/correlation_matrix.png', dpi=300, bbox_inches='tight')
plt.show()

print("\n✓ Bivariate analysis complete")

In [None]:
# State-wise enrolment analysis
state_enrolments = enrolment_df.groupby('state')['total_enrolments'].sum().sort_values(ascending=False).head(15)

fig = px.bar(x=state_enrolments.index, y=state_enrolments.values,
             labels={'x': 'State', 'y': 'Total Enrolments'},
             title='Top 15 States by Total Enrolments',
             color=state_enrolments.values,
             color_continuous_scale='Blues')
fig.update_layout(showlegend=False, height=500)
fig.write_html('../outputs/figures/state_enrolments.html')
fig.show()

print("\nTop 15 States by Enrolments:")
print(state_enrolments)

### 3.3 Temporal Analysis

In [None]:
# Time series of total enrolments
ts_enrolments = enrolment_df.groupby('date')['total_enrolments'].sum().reset_index()

fig = go.Figure()
fig.add_trace(go.Scatter(x=ts_enrolments['date'], y=ts_enrolments['total_enrolments'],
                        mode='lines+markers', name='Total Enrolments',
                        line=dict(color='steelblue', width=3)))
fig.update_layout(title='Total Enrolments Over Time',
                 xaxis_title='Date',
                 yaxis_title='Total Enrolments',
                 hovermode='x unified',
                 height=500)
fig.write_html('../outputs/figures/temporal_enrolments.html')
fig.show()

print("✓ Temporal analysis complete")

### 3.4 Geographic Analysis

In [None]:
# State-level heatmap
visualizer = AadhaarVisualizer(output_dir='../outputs/figures')

# Create heatmap
visualizer.plot_heatmap(enrolment_df, 'total_enrolments', 
                       title='State-wise Enrolment Heatmap Over Time',
                       save_name='heatmap_enrolments')
plt.show()

print("✓ Geographic analysis complete")

### 3.5 Age Group Analysis

In [None]:
# Age group distribution
age_totals = {
    'Age 0-5': enrolment_df['age_0_5'].sum(),
    'Age 5-17': enrolment_df['age_5_17'].sum(),
    'Age 18+': enrolment_df['age_18_greater'].sum()
}

fig = px.pie(values=list(age_totals.values()), names=list(age_totals.keys()),
            title='Enrolment Distribution by Age Group',
            color_discrete_sequence=px.colors.qualitative.Set3)
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.write_html('../outputs/figures/age_distribution.html')
fig.show()

print("\nAge Group Distribution:")
for age, count in age_totals.items():
    print(f"{age}: {count:,} ({count/sum(age_totals.values())*100:.1f}%)")

## 4. Feature Engineering

Creating advanced metrics:
1. **Growth Rate** (MoM, YoY)
2. **Rolling Averages** (3m, 6m, 12m)
3. **Volatility Index**
4. **Seasonal Index**
5. **Update-to-Enrolment Ratio**
6. **Biometric Stress Index (BSI)**
7. **Demographic Volatility Index (DVI)**
8. **Aadhaar Fragility Index (AFI)** - Composite Metric

In [None]:
# Create all features
print("Creating advanced features...\n")

enrol_featured, demo_featured, bio_featured, merged_featured = \
    AadhaarFeatureEngineer.create_all_features(enrolment_df, demographic_df, biometric_df)

print("\n" + "="*80)
print("FEATURE ENGINEERING COMPLETE")
print("="*80)
print(f"\nEnrolment Features: {enrol_featured.shape}")
print(f"Demographic Features: {demo_featured.shape}")
print(f"Biometric Features: {bio_featured.shape}")
print(f"Merged Features: {merged_featured.shape}")

In [None]:
# Display sample of engineered features
print("\nSample of Engineered Features:")
print("="*80)
feature_cols = [col for col in merged_featured.columns if any(x in col for x in 
                ['growth', 'volatility', 'seasonal', 'ratio', 'index'])]
print(merged_featured[['date', 'state', 'district'] + feature_cols[:10]].head())

### 4.1 Metric Formulas and Interpretations

#### 1. Growth Rate
**Formula**: `Growth Rate = ((Value_t - Value_t-n) / Value_t-n) × 100`

**Interpretation**: Percentage change in enrolments/updates over time. Positive values indicate growth, negative indicate decline.

#### 2. Volatility Index
**Formula**: `Volatility = StdDev(Growth_Rate) over rolling window`

**Interpretation**: Measures instability in demand. Higher values indicate unpredictable patterns.

#### 3. Seasonal Index
**Formula**: `Seasonal Index = (Actual Value / 12-month MA) × 100`

**Interpretation**: Values >100 indicate above-average activity, <100 indicate below-average.

#### 4. Update-to-Enrolment Ratio
**Formula**: `Ratio = (Total Updates / Total Enrolments) × 100`

**Interpretation**: High ratios suggest regions with frequent updates relative to new enrolments.

#### 5. Biometric Stress Index (BSI)
**Formula**: `BSI = 0.5×(Bio_Update_Rate) + 0.3×(Bio_Volatility) + 0.2×(Age_Concentration)`

**Interpretation**: Composite score indicating biometric system stress. Higher values indicate regions under pressure.

#### 6. Demographic Volatility Index (DVI)
**Formula**: `DVI = 0.6×(Demo_Update_Freq) + 0.4×(Demo_Volatility)`

**Interpretation**: Measures demographic instability. High values indicate frequent demographic changes.

#### 7. Aadhaar Fragility Index (AFI)
**Formula**: `AFI = 0.3×BSI + 0.25×DVI + 0.25×(Update_Ratio) + 0.2×(Growth_Volatility)`

**Interpretation**:
- 0-25: Low Fragility (Stable)
- 25-50: Moderate Fragility (Monitor)
- 50-75: High Fragility (Intervention Needed)
- 75-100: Critical Fragility (Urgent Action Required)

In [None]:
# Visualize key metrics
if 'aadhaar_fragility_index' in merged_featured.columns:
    fig = visualizer.plot_distribution(merged_featured, 'aadhaar_fragility_index',
                                      title='Aadhaar Fragility Index Distribution',
                                      save_name='afi_distribution')
    plt.show()
    
    # Fragility category breakdown
    fragility_counts = merged_featured['fragility_category'].value_counts()
    print("\nFragility Category Distribution:")
    print(fragility_counts)
    
    fig = px.pie(values=fragility_counts.values, names=fragility_counts.index,
                title='Regions by Fragility Level',
                color_discrete_sequence=['green', 'yellow', 'orange', 'red'])
    fig.write_html('../outputs/figures/fragility_categories.html')
    fig.show()

## 5. Anomaly Detection

Implementing multiple anomaly detection techniques:
1. **Rolling Z-score**: Statistical outlier detection
2. **STL Decomposition**: Seasonal-Trend-Residual analysis
3. **Isolation Forest**: ML-based outlier detection
4. **Local Outlier Factor (LOF)**: Density-based detection
5. **Change-point Detection**: Structural break identification

In [None]:
# Initialize anomaly detector
detector = AnomalyDetector(threshold=3)

# Run all anomaly detection methods
print("Running anomaly detection...\n")
enrol_anomalies = detector.detect_all_anomalies(
    enrol_featured, 
    value_col='total_enrolments',
    feature_cols=['total_enrolments', 'total_enrolments_growth_rate_1m', 
                 'total_enrolments_volatility_6m', 'total_enrolments_seasonal_index']
)

print("\n" + "="*80)
print("ANOMALY DETECTION COMPLETE")
print("="*80)

In [None]:
# Anomaly summary
anomaly_summary = detector.get_anomaly_summary()
print("\nAnomaly Detection Summary:")
print(anomaly_summary)

In [None]:
# Visualize anomalies
# Select top states for visualization
top_states = enrol_anomalies.groupby('state')['total_enrolments'].sum().nlargest(5).index
enrol_top = enrol_anomalies[enrol_anomalies['state'].isin(top_states)]

fig = visualizer.plot_time_series(
    enrol_top,
    value_col='total_enrolments',
    title='Enrolments Over Time with Anomalies (Top 5 States)',
    group_col='state',
    anomaly_col='is_anomaly',
    save_name='anomalies_time_series'
)
fig.show()

print("✓ Anomaly visualization complete")

In [None]:
# Analyze anomalous regions
anomalous_regions = enrol_anomalies[enrol_anomalies['is_anomaly'] == 1]
print(f"\nTotal Anomalous Records: {len(anomalous_regions)}")
print(f"Affected States: {anomalous_regions['state'].nunique()}")
print(f"Affected Districts: {anomalous_regions['district'].nunique()}")

# Top anomalous states
top_anomalous = anomalous_regions.groupby('state').size().sort_values(ascending=False).head(10)
print("\nTop 10 States with Most Anomalies:")
print(top_anomalous)

## 6. Predictive Modeling

### 6.1 Demand Forecasting

In [None]:
# Initialize forecaster
forecaster = DemandForecaster()

# Prepare time series data
ts_data = forecaster.prepare_time_series(enrol_featured, 'total_enrolments', group_cols=['state'])

print("Time series data prepared for forecasting")
print(f"Shape: {ts_data.shape}")
print(f"Date range: {ts_data['date'].min()} to {ts_data['date'].max()}")

In [None]:
# ARIMA Forecasting (top 5 states)
top_states_forecast = ts_data[ts_data['state'].isin(top_states)]

print("Running ARIMA forecasting...\n")
arima_forecast = forecaster.arima_forecast(
    top_states_forecast,
    value_col='total_enrolments',
    periods=6,
    order=(1, 1, 1),
    group_col='state'
)

print("\n✓ ARIMA forecasting complete")
print(f"Forecast shape: {arima_forecast.shape}")

In [None]:
# Prophet Forecasting
print("Running Prophet forecasting...\n")
prophet_forecast = forecaster.prophet_forecast(
    top_states_forecast,
    value_col='total_enrolments',
    periods=6,
    group_col='state'
)

print("\n✓ Prophet forecasting complete")
if len(prophet_forecast) > 0:
    print(f"Forecast shape: {prophet_forecast.shape}")

In [None]:
# Visualize forecasts
if len(arima_forecast) > 0:
    fig = visualizer.plot_forecast(
        top_states_forecast,
        arima_forecast,
        value_col='total_enrolments',
        title='6-Month Enrolment Forecast (ARIMA)',
        group_col='state',
        save_name='arima_forecast'
    )
    fig.show()

if len(prophet_forecast) > 0:
    fig = visualizer.plot_forecast(
        top_states_forecast,
        prophet_forecast,
        value_col='total_enrolments',
        title='6-Month Enrolment Forecast (Prophet)',
        group_col='state',
        save_name='prophet_forecast'
    )
    fig.show()

print("✓ Forecast visualization complete")

### 6.2 Risk Classification

In [None]:
# Initialize risk classifier
classifier = RiskClassifier()

# Create risk labels
merged_risk = classifier.create_risk_labels(merged_featured, risk_col='aadhaar_fragility_index')

print("Risk labels created")
print("\nRisk Distribution:")
print(merged_risk['risk_label'].value_counts())

In [None]:
# Train risk classifier
feature_cols_risk = ['total_enrolments', 'total_updates', 'update_to_enrolment_ratio',
                    'biometric_stress_index', 'demographic_volatility_index']

# Filter valid features
feature_cols_risk = [col for col in feature_cols_risk if col in merged_risk.columns]

if len(feature_cols_risk) > 0:
    model, feature_importance = classifier.train_risk_classifier(
        merged_risk,
        feature_cols=feature_cols_risk,
        target_col='risk_label'
    )
    
    # Visualize feature importance
    fig = px.bar(feature_importance.head(10), x='importance', y='feature',
                orientation='h', title='Top 10 Risk Prediction Features',
                color='importance', color_continuous_scale='Reds')
    fig.update_layout(showlegend=False, height=500)
    fig.write_html('../outputs/figures/feature_importance.html')
    fig.show()

### 6.3 Regional Clustering

In [None]:
# Initialize clusterer
clusterer = RegionClusterer()

# Prepare clustering features
cluster_features = ['total_enrolments', 'total_updates', 'update_to_enrolment_ratio']
cluster_features = [col for col in cluster_features if col in merged_featured.columns]

# Aggregate by state for clustering
state_agg = merged_featured.groupby('state')[cluster_features].mean().reset_index()

print(f"Clustering {len(state_agg)} states based on {len(cluster_features)} features")

In [None]:
# KMeans clustering
state_clustered = clusterer.kmeans_clustering(state_agg, cluster_features, n_clusters=4)

print("\nClustered States:")
for cluster in sorted(state_clustered['cluster_kmeans'].unique()):
    states_in_cluster = state_clustered[state_clustered['cluster_kmeans'] == cluster]['state'].tolist()
    print(f"\nCluster {cluster}: {len(states_in_cluster)} states")
    print(f"States: {', '.join(states_in_cluster[:10])}{'...' if len(states_in_cluster) > 10 else ''}")

In [None]:
# Visualize clusters
if len(cluster_features) >= 2:
    fig = px.scatter(state_clustered, x=cluster_features[0], y=cluster_features[1],
                    color='cluster_kmeans', hover_data=['state'],
                    title='State Clustering (KMeans)',
                    labels={'cluster_kmeans': 'Cluster'},
                    color_continuous_scale='Viridis')
    fig.update_layout(height=600)
    fig.write_html('../outputs/figures/state_clusters.html')
    fig.show()

## 7. Advanced Visualizations

In [None]:
# Risk ranking visualization
if 'aadhaar_fragility_index' in merged_featured.columns:
    latest_data = merged_featured.sort_values('date').groupby(['state', 'district']).tail(1)
    
    fig = visualizer.plot_risk_ranking(
        latest_data,
        risk_col='aadhaar_fragility_index',
        top_n=20,
        save_name='risk_ranking'
    )
    fig.show()

In [None]:
# Bubble chart: Multi-dimensional analysis
if all(col in merged_featured.columns for col in ['total_enrolments', 'total_updates', 
                                                   'update_to_enrolment_ratio', 'aadhaar_fragility_index']):
    sample_data = merged_featured.sample(min(1000, len(merged_featured)))
    
    fig = visualizer.plot_bubble_chart(
        sample_data,
        x_col='total_enrolments',
        y_col='total_updates',
        size_col='update_to_enrolment_ratio',
        color_col='aadhaar_fragility_index',
        title='Multi-dimensional Analysis: Enrolments vs Updates vs Fragility',
        save_name='bubble_chart'
    )
    fig.show()

In [None]:
# Dashboard summary
dashboard_fig = visualizer.create_dashboard_summary(merged_featured)
dashboard_fig.write_html('../outputs/figures/dashboard_summary.html')
dashboard_fig.show()

print("✓ All visualizations complete")

## 8. Key Insights and Findings

### 8.1 Enrolment Patterns

In [None]:
print("KEY INSIGHTS")
print("="*80)

# Total statistics
print("\n1. OVERALL STATISTICS")
print(f"   Total Enrolments: {enrolment_df['total_enrolments'].sum():,}")
print(f"   Total States: {enrolment_df['state'].nunique()}")
print(f"   Total Districts: {enrolment_df['district'].nunique()}")
print(f"   Date Range: {enrolment_df['date'].min()} to {enrolment_df['date'].max()}")

# Growth trends
if 'total_enrolments_growth_rate_1m' in enrol_featured.columns:
    avg_growth = enrol_featured['total_enrolments_growth_rate_1m'].mean()
    print(f"\n2. GROWTH TRENDS")
    print(f"   Average Monthly Growth Rate: {avg_growth:.2f}%")
    print(f"   Max Monthly Growth: {enrol_featured['total_enrolments_growth_rate_1m'].max():.2f}%")
    print(f"   Min Monthly Growth: {enrol_featured['total_enrolments_growth_rate_1m'].min():.2f}%")

# Volatility
if 'total_enrolments_volatility_6m' in enrol_featured.columns:
    avg_volatility = enrol_featured['total_enrolments_volatility_6m'].mean()
    print(f"\n3. VOLATILITY")
    print(f"   Average Volatility Index: {avg_volatility:.2f}")
    high_volatility = enrol_featured[enrol_featured['total_enrolments_volatility_6m'] > 
                                    enrol_featured['total_enrolments_volatility_6m'].quantile(0.9)]
    print(f"   High Volatility Regions: {high_volatility['state'].nunique()} states")

# Fragility
if 'aadhaar_fragility_index' in merged_featured.columns:
    print(f"\n4. FRAGILITY ASSESSMENT")
    fragility_stats = merged_featured.groupby('fragility_category').size()
    for category, count in fragility_stats.items():
        print(f"   {category} Fragility: {count:,} regions ({count/len(merged_featured)*100:.1f}%)")

print("\n" + "="*80)

## 9. Decision Support Framework

### 9.1 Policy Recommendations

In [None]:
print("POLICY RECOMMENDATIONS")
print("="*80)

# Identify high-risk regions
if 'aadhaar_fragility_index' in merged_featured.columns:
    critical_regions = merged_featured[merged_featured['fragility_category'] == 'Critical']
    high_risk_regions = merged_featured[merged_featured['fragility_category'] == 'High']
    
    print("\n1. IMMEDIATE INTERVENTION REQUIRED")
    print(f"   Critical Fragility Regions: {len(critical_regions)}")
    if len(critical_regions) > 0:
        top_critical = critical_regions.nlargest(5, 'aadhaar_fragility_index')[['state', 'district', 'aadhaar_fragility_index']]
        print("\n   Top 5 Critical Regions:")
        for idx, row in top_critical.iterrows():
            print(f"   - {row['state']}, {row['district']}: AFI = {row['aadhaar_fragility_index']:.2f}")
        print("\n   Recommended Actions:")
        print("   • Deploy additional enrolment centers")
        print("   • Increase staff capacity")
        print("   • Implement mobile enrolment units")
        print("   • Provide alternative authentication mechanisms")
    
    print("\n2. MONITORING REQUIRED")
    print(f"   High Fragility Regions: {len(high_risk_regions)}")
    print("\n   Recommended Actions:")
    print("   • Increase monitoring frequency")
    print("   • Prepare contingency plans")
    print("   • Conduct awareness campaigns")
    print("   • Optimize resource allocation")

# Anomaly-based recommendations
if 'is_anomaly' in enrol_anomalies.columns:
    print("\n3. ANOMALY RESPONSE")
    print(f"   Regions with Anomalies: {enrol_anomalies[enrol_anomalies['is_anomaly']==1]['state'].nunique()}")
    print("\n   Recommended Actions:")
    print("   • Investigate root causes of anomalies")
    print("   • Check for system errors or data quality issues")
    print("   • Assess impact of policy changes")
    print("   • Implement early warning systems")

print("\n4. CAPACITY PLANNING")
print("   Based on forecasts:")
print("   • Prepare for projected demand surges")
print("   • Allocate resources to high-growth regions")
print("   • Plan infrastructure expansion")
print("   • Train additional personnel")

print("\n5. SYSTEM OPTIMIZATION")
print("   • Reduce biometric update frequency where possible")
print("   • Streamline demographic update processes")
print("   • Implement predictive maintenance")
print("   • Enhance digital infrastructure in fragile regions")

print("\n" + "="*80)

### 9.2 Early Warning Framework

In [None]:
print("EARLY WARNING FRAMEWORK")
print("="*80)

print("\nTHRESHOLDS FOR ALERTS:")
print("\n1. Fragility Index Alerts")
print("   • Yellow Alert: AFI > 50 (High Fragility)")
print("   • Red Alert: AFI > 75 (Critical Fragility)")

print("\n2. Growth Rate Alerts")
print("   • Surge Alert: Growth > 50% MoM")
print("   • Decline Alert: Growth < -20% MoM")

print("\n3. Volatility Alerts")
print("   • High Volatility: Volatility Index > 75th percentile")
print("   • Extreme Volatility: Volatility Index > 90th percentile")

print("\n4. Anomaly Alerts")
print("   • Single Method Detection: Monitor")
print("   • Multiple Method Detection: Investigate")
print("   • Persistent Anomalies: Urgent Action")

print("\nMONITORING FREQUENCY:")
print("   • Critical Regions: Daily")
print("   • High-Risk Regions: Weekly")
print("   • Moderate-Risk Regions: Monthly")
print("   • Low-Risk Regions: Quarterly")

print("\n" + "="*80)

## 10. Conclusion

### Summary

This analysis has provided a comprehensive view of India's Aadhaar ecosystem through:

1. **Deep Data Analysis**: Explored enrolment and update patterns across time, geography, and demographics

2. **Advanced Metrics**: Created 8 custom indices including the Aadhaar Fragility Index (AFI) to quantify system stress

3. **Anomaly Detection**: Identified unusual patterns using 5 different detection methods

4. **Predictive Models**: Forecasted demand and classified risk levels for proactive planning

5. **Actionable Insights**: Provided specific policy recommendations and early warning frameworks

### Impact

This observatory enables:
- **Proactive Resource Allocation**: Identify regions needing support before crises occur
- **Risk Mitigation**: Reduce Aadhaar-based exclusion through early intervention
- **System Optimization**: Improve efficiency and reduce stress on the Aadhaar infrastructure
- **Evidence-Based Policy**: Make data-driven decisions for better outcomes

### Future Work

1. **Real-time Monitoring**: Implement live dashboard with automated alerts
2. **Deep Learning**: Use LSTM networks for more accurate long-term forecasts
3. **Causal Analysis**: Identify root causes of fragility and anomalies
4. **Integration**: Connect with other government databases for holistic analysis
5. **Mobile App**: Create field app for on-ground data collection and intervention tracking

In [None]:
# Save final datasets
print("Saving final datasets...")

enrol_anomalies.to_csv('../data/processed/enrolment_with_anomalies.csv', index=False)
merged_featured.to_csv('../data/processed/merged_with_features.csv', index=False)

if len(arima_forecast) > 0:
    arima_forecast.to_csv('../outputs/models/arima_forecast.csv', index=False)

if len(prophet_forecast) > 0:
    prophet_forecast.to_csv('../outputs/models/prophet_forecast.csv', index=False)

print("\n" + "="*80)
print("ANALYSIS COMPLETE")
print("="*80)
print("\nAll outputs saved to:")
print("  • Processed data: ../data/processed/")
print("  • Visualizations: ../outputs/figures/")
print("  • Models: ../outputs/models/")
print("\nThank you for using the Aadhaar Observatory!")