# Problem Framing

## 1.1 Selected Problem: Migration Intelligence Tool

**Problem Statement:**

Using Aadhaar enrolment time-series data to detect high-velocity population influx anomalies that indicate sudden mass settlements or administrative drives.


## 1.2 Why this problem?

- UIDAI data is naturally time-series and suitable for trend analysis.
- Analysis is possible from State → District level with policy relevance.
- The problem supports urban planning, governance, and service delivery.
- Entire analysis is aggregate-level, ensuring privacy.

## 1.3 What exactly we will measure?

We are not tracking individuals. We are tracking aggregate district-level volume shifts.

We aim to detect:
1. **Sudden Enrolment Spikes:** Identifying districts where enrolment counts jump significantly above their historical baseline.
2. **High-Volume Hubs:** Identifying districts handling the highest absolute load of new enrolments.
3. **Growth Velocity:** Measuring the speed of population inflow (percentage growth) to distinguish between steady growth and critical anomalies.

## 2.1 Granularity Decision

We will analyse data at the following levels:

- **Primary Unit:** District level  
  (Most stable unit for migration and urbanisation signals)

- **Secondary Unit:** State level  
  (Used for validation and baseline comparison)

- **Optional Unit:** Pincode level  
  (Only for deep-dive, not for core signals to avoid noise)


## 2.2 Time Aggregation Strategy

- All daily records will be aggregated to **monthly totals**
- Daily data is noisy (holidays, operational variation)
- Monthly aggregation gives policy-relevant trends

**Time format:** YYYY-MM


## 3. Signal Blueprint (The Detection Logic)

We will construct a **district-month level anomaly signal** from the UIDAI dataset.

### Core Signal: "The Velocity Spike"

Instead of looking at raw numbers alone, we calculate the **Sudden Growth Velocity**:

1. **Metric:** Month-on-Month Percentage Growth
   - *Formula:* `((Current Enrolment - Previous Enrolment) / Previous Enrolment) * 100`
   - * Smoothing:* We apply a smoothing factor (+1) to the denominator to handle zero-baseline districts (districts with no prior activity).

2. **Interpretation:**
   - **Low Velocity:** Standard administrative growth.
   - **High Velocity:** Potential migration event or administrative drive requiring attention.

### Output Unit
The signal is computed at: **District × Month**