# Derived Indicators Analysis

This notebook calculates and analyzes custom derived indicators for Aadhaar data.

## Indicators Calculated:
1. **Update Pressure Index (UPI)**: Total Updates / Total Enrollments
2. **Age Transition Rate**: Transition from child to adult category
3. **Demographic/Biometric Ratio**: Balance between update types
4. **Growth Rates**: Period-over-period changes
5. **Coverage Metrics**: Enrollment penetration

In [None]:
import sys
sys.path.append('..')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from utils.indicators import (
    calculate_all_indicators,
    get_top_districts,
    create_indicator_summary
)

print("‚úì Libraries and utilities imported")

## 1. Load Data

In [None]:
df = pd.read_csv('../data/processed/unified_analytical_table.csv')
df['date'] = pd.to_datetime(df['date'])

print(f"Loaded {len(df):,} records")
df.head()

## 2. Calculate All Indicators

In [None]:
# Calculate all derived indicators
df_indicators = calculate_all_indicators(df, include_growth=True)

print("\n‚úÖ Indicators calculated successfully!")
df_indicators.head()

## 3. Indicator Summary Statistics

In [None]:
# Generate summary
summary = create_indicator_summary(df_indicators)
print("\nüìä Indicator Summary Statistics:\n")
summary

## 4. Update Pressure Index Analysis

In [None]:
# Top districts by Update Pressure Index
top_upi_districts = get_top_districts(df_indicators, metric='update_pressure_index', n=15)

fig = px.bar(
    top_upi_districts,
    x='update_pressure_index',
    y='district',
    orientation='h',
    title='Top 15 Districts by Update Pressure Index',
    labels={'update_pressure_index': 'Update Pressure Index', 'district': 'District'},
    color='update_pressure_index',
    color_continuous_scale='Reds'
)

fig.update_layout(height=600)
fig.show()

print("\nüéØ Policy Insight:")
print("High UPI indicates districts with heavy update demand relative to enrollment base.")
print("These districts need additional staffing and resources for update processing.")

## 5. Temporal Evolution of Indicators

In [None]:
# Monthly evolution
monthly_indicators = df_indicators.groupby(df_indicators['date'].dt.to_period('M')).agg({
    'update_pressure_index': 'mean',
    'total_enrolment': 'sum',
    'total_updates': 'sum'
}).reset_index()

monthly_indicators['date'] = monthly_indicators['date'].dt.to_timestamp()

fig, axes = plt.subplots(2, 1, figsize=(14, 10))

# Update Pressure Index trend
axes[0].plot(monthly_indicators['date'], monthly_indicators['update_pressure_index'],
             marker='o', linewidth=2, markersize=6, color='darkred')
axes[0].set_title('Update Pressure Index Over Time', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Date')
axes[0].set_ylabel('Average UPI')
axes[0].grid(True, alpha=0.3)

# Enrollment vs Updates
ax2 = axes[1].twinx()
axes[1].bar(monthly_indicators['date'], monthly_indicators['total_enrolment'],
            alpha=0.7, label='Total Enrollments', color='blue')
ax2.plot(monthly_indicators['date'], monthly_indicators['total_updates'],
         marker='s', linewidth=2, markersize=6, color='orange', label='Total Updates')

axes[1].set_title('Enrollments vs Updates Over Time', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Date')
axes[1].set_ylabel('Total Enrollments', color='blue')
ax2.set_ylabel('Total Updates', color='orange')
axes[1].legend(loc='upper left')
ax2.legend(loc='upper right')

plt.tight_layout()
plt.show()

## 6. State-wise Indicator Comparison

In [None]:
# State comparison
state_indicators = df_indicators.groupby('state').agg({
    'update_pressure_index': 'mean',
    'total_enrolment': 'sum',
    'total_updates': 'sum',
    'demo_bio_ratio': 'mean'
}).reset_index()

state_indicators = state_indicators.sort_values('update_pressure_index', ascending=False).head(10)

fig = px.scatter(
    state_indicators,
    x='total_enrolment',
    y='total_updates',
    size='update_pressure_index',
    color='state',
    title='State-wise: Enrollments vs Updates (sized by UPI)',
    labels={'total_enrolment': 'Total Enrollments', 'total_updates': 'Total Updates'},
    hover_data=['update_pressure_index']
)

fig.update_layout(height=600)
fig.show()

## 7. Growth Rate Analysis

In [None]:
# Analysis growth rates
growth_cols = [col for col in df_indicators.columns if '_growth' in col]

if growth_cols:
    growth_summary = df_indicators[growth_cols].describe()
    print("\nüìà Growth Rate Summary:\n")
    print(growth_summary)
    
    # Plot growth distribution
    fig, axes = plt.subplots(1, len(growth_cols), figsize=(15, 5))
    
    if len(growth_cols) == 1:
        axes = [axes]
    
    for idx, col in enumerate(growth_cols):
        axes[idx].hist(df_indicators[col].dropna(), bins=30, edgecolor='black')
        axes[idx].set_title(f'{col.replace("_", " ").title()}', fontsize=10)
        axes[idx].set_xlabel('Growth Rate (%)')
        axes[idx].set_ylabel('Frequency')
    
    plt.tight_layout()
    plt.show()
else:
    print("‚ö†Ô∏è No growth rate columns found")

## 8. Key Findings

### Derived Indicator Insights:

1. **Update Pressure Index (UPI)**:
   - Identifies districts with high update-to-enrollment ratios
   - Critical for resource allocation and capacity planning
   - Temporal variations suggest seasonal patterns in update demand

2. **Growth Rates**:
   - Help identify emerging trends and changing patterns
   - Useful for predicting future resource needs
   - Can detect sudden spikes or drops requiring investigation

3. **Demographic/Biometric Ratio**:
   - Shows balance between different types of updates
   - Helps in workload distribution across centers
   - Can indicate shifts in update priorities

### Policy Recommendations:
- **High UPI Districts**: Deploy additional staff and resources
- **Negative Growth**: Investigate causes in declining districts
- **Seasonal Patterns**: Plan for peak months in advance

In [None]:
# Save enriched data
df_indicators.to_csv('../data/processed/data_with_indicators.csv', index=False)
print("\n‚úÖ Indicator analysis complete!")
print("Enriched data saved to: data/processed/data_with_indicators.csv")
print("\nNext: Proceed to 04_anomaly_detection.ipynb")