# Notebook 3: System-Level Risk Signals and Decision Framework for UPI

## Objective

Notebook 1 examined **daily UPI behavior** to surface short-term fluctuations and anomaly candidates.  
Notebook 2 evaluated **monthly macro patterns** to assess growth quality, concentration, and stability.

However, the problem statement requires more than separate analyses.

It asks for a **system-level judgment**:
> Are the observed patterns signs of risk, or evidence of a maturing digital infrastructure?

Notebook 3 provides that judgment.

---

## What this notebook does

This notebook **integrates insights from Notebook 1 and Notebook 2** to:
- Separate **signal from noise**
- Classify anomalies as **normal, operational, or structural**
- Translate metrics into **decision-ready conclusions**

The emphasis is on **interpretation, not computation**.

---

## Core questions answered

This notebook addresses four regulator-relevant questions:

1. **Signal vs Noise**  
   Do daily anomalies persist at the monthly level, or get absorbed?

2. **Structural Dependency**  
   Is UPI dominance creating fragility or stability?

3. **Growth Quality**  
   Is UPI growth healthy, volatile, or misleading?

4. **Early Warning Framework**  
   What patterns should trigger monitoring or investigation?

---

## Approach

We combine:
- Daily system anomaly flags (Notebook 1)
- Monthly regime classification and concentration metrics (Notebook 2)
- Persistence and consistency logic across time scales

All conclusions are based on:
- Aggregated data
- Publicly observable system behavior
- Reproducible statistical logic

---

## What this notebook does *not* do

To remain aligned with scope and data limitations, this notebook does **not**:
- Perform transaction-level or bank-level forensics
- Attribute causality without evidence
- Make speculative claims

---

## Final outcome

By the end of this notebook, we produce:
- A **clear system-health assessment**
- A **risk taxonomy** distinguishing noise, operational stress, and structural risk
- A **reusable monitoring framework** suitable for regulators and policymakers

This is where analysis becomes **insight**, and insight becomes **decision**.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# PHASE 1: The Integration Framework & Cross-Scale Data Synthesis
### What we are doing:
- Bridging the gap between Daily (High-Res) and Monthly (Macro) data.
- Establishing the 'Joint Inference' logic required by the problem statement.
- Consolidating daily shock findings with monthly structural trends.
- Analyzing if instrument substitution effects successfully offset concentration risks.

### Why this step?

Before making system-level judgments, we need a **common factual base** that
captures all *daily-level stress signals* identified earlier.

This step loads the **final daily anomaly output from Notebook 1**, which serves as:
- The authoritative record of **when UPI exhibited abnormal behavior**, and
- The foundation for mapping daily noise to monthly structure.

In [3]:
import pandas as pd
import numpy as np

# Paths (update if needed)
NB1_ANOMALY_PATH = "/content/drive/MyDrive/Hotfoot/outputs/nb1_system_daily_anomalies.csv"
NB2_MONTHLY_PATH = "/content/drive/MyDrive/Hotfoot/outputs/nb2_month_classification.csv"
NB2_TOPK_PATH = "/content/drive/MyDrive/Hotfoot/outputs/nb2_top3_vs_top10_sensitivity.csv"
NB2_CONSISTENCY_PATH = "/content/drive/MyDrive/Hotfoot/outputs/nb2_top5_consistency_score.csv"

# Load datasets
daily_anomalies = pd.read_csv(NB1_ANOMALY_PATH, parse_dates=["date"])
monthly_classification = pd.read_csv(NB2_MONTHLY_PATH, parse_dates=["Month"])
topk_df = pd.read_csv(NB2_TOPK_PATH, parse_dates=["Month"])
consistency_df = pd.read_csv(NB2_CONSISTENCY_PATH)

daily_anomalies.head()

Unnamed: 0,date,UPI_Vol,UPI_SHARE,upi_zscore,upi_share_zscore,system_anomaly_flag,anomaly_run_length
0,2020-06-01,476.9671,0.468631,,,False,0
1,2020-06-02,476.78182,0.457646,,,False,0
2,2020-06-03,456.2593,0.515934,,,False,0
3,2020-06-04,463.04959,0.407367,,,False,0
4,2020-06-05,464.79398,0.466218,,,False,0


### What we observe

- The dataset contains **one row per calendar day**, spanning the full analysis period
- Each day includes:
  - Absolute UPI volume
  - UPI share of total digital transactions
  - Rolling z-scores for both volume and share
  - A binary **system-level anomaly flag**
  - The **persistence length** of any anomaly run
- Most early rows show:
  - `system_anomaly_flag = False`
  - `anomaly_run_length = 0`

### What this indicates

- This CSV is a **filtered, system-level signal table**, not raw transaction data
- An anomaly is recorded **only when UPI behavior is statistically extreme**, either:
  - In absolute volume, or
  - Relative to the rest of the digital ecosystem
- The absence of long anomaly runs early on indicates:
  - Normal system behavior
  - No sustained daily stress regimes

### Why this matters

This dataset is the **bridge between daily noise and monthly structure**.

It allows us to:
- Count how often daily anomalies occur
- Measure whether they persist
- Map daily stress signals onto monthly regimes

Without this table, **joint inference in Notebook 3 would not be possible**.

## Why Daily–Monthly Alignment Is Critical

Daily data captures **operational shocks**:
- outages
- reporting noise
- short-term stress

Monthly data captures **structural signals**:
- concentration
- dominance persistence
- ecosystem shifts

A valid inference requires checking:
- Do **daily system anomalies cluster into specific months?**
- Do those months also show **entity-level stress or dominance changes?**

We align daily anomalies into monthly buckets to test this.

In [4]:
# Add month column to daily anomalies
daily_anomalies["Month"] = daily_anomalies["date"].dt.to_period("M").dt.to_timestamp()

# Aggregate anomaly presence per month
monthly_system_anomaly = (
    daily_anomalies
    .groupby("Month")["system_anomaly_flag"]
    .any()
    .reset_index()
    .rename(columns={"system_anomaly_flag": "System_Anomaly"})
)

monthly_system_anomaly.head()

Unnamed: 0,Month,System_Anomaly
0,2020-06-01,False
1,2020-07-01,False
2,2020-08-01,False
3,2020-09-01,False
4,2020-10-01,False


### What we observe

- The data is now **aggregated at the monthly level**
- Each month has a single boolean flag:
  - `System_Anomaly = True` if **any daily anomaly occurred** in that month
  - `System_Anomaly = False` if the month was entirely normal
- In the early months shown:
  - No month contains even a single system-level anomaly
  - All values are `False`

### What this indicates

- **Daily anomalies are rare enough** that they do not automatically escalate to monthly concern
- Short-lived daily deviations are being **absorbed by the system** without creating sustained stress
- Monthly behavior remains **stable even when daily noise exists**

### Why this matters

This transformation is critical for **signal vs noise separation**.

It ensures that:
- One-off daily spikes do **not** trigger false macro alarms
- Only anomalies with **temporal persistence** are treated as system-level signals
- Monthly classifications in Notebook 2 can now be **cross-validated with daily evidence**

This table is the **key linkage layer** that allows Notebook 3 to move from:
> *“Something odd happened on a day”*  
to  
> *“Does this actually matter at a system level?”*


### Why this step?

Daily anomaly detection (Notebook 1) and monthly regime analysis (Notebook 2)
capture **different layers of system behavior**.

Individually, neither is sufficient for decision-making.

This step **joins both layers** to answer a critical question:

> Does short-term system stress ever escalate into
> sustained, structural monthly risk?

The resulting table is the **central decision artifact** of Notebook 3.

In [5]:
# Merge system anomalies with monthly classification
joint_df = (
    monthly_classification
    .merge(monthly_system_anomaly, on="Month", how="left")
)

joint_df["System_Anomaly"] = joint_df["System_Anomaly"].fillna(False)

# Final classification logic
def classify_period(row):
    if row["System_Anomaly"] and row["MONTH_TYPE"] == "Anomalous":
        return "System + Entity Stress"
    if row["System_Anomaly"]:
        return "System Shock Only"
    if row["MONTH_TYPE"] == "Anomalous":
        return "Entity-Driven Shift"
    return "Stable / Normal"

joint_df["Final_Classification"] = joint_df.apply(classify_period, axis=1)

joint_df.head()

Unnamed: 0,Month,TOTAL_VOL_LAKH,TOTAL_VAL_CRORE,BANK_COUNT,AVG_TICKET_SIZE_INR,VOL_GROWTH_%,VAL_GROWTH_%,BANK_GROWTH_%,VOL_GROWTH_VOLATILITY,ROLLING_MEAN_12M,...,VOLATILITY_LEVEL,REGIME,VOL_GROWTH_Z,VAL_GROWTH_Z,RESIDUAL_Z,ANOMALY_FLAG,PREDICTABILITY_INDEX,MONTH_TYPE,System_Anomaly,Final_Classification
0,2024-04-01,570048.57305,13267080.0,2452,2327.358903,,,,,,...,Low Volatility,Low Growth – Low Volatility,-0.427408,-0.206729,1.781508,False,0.359517,Stable,False,Stable / Normal
1,2024-05-01,599317.82906,13433310.0,2448,2241.432652,5.13452,1.252946,-0.163132,,,...,Low Volatility,High Growth – Low Volatility,0.539197,-0.013164,-0.526355,False,0.655156,High Growth,False,Stable / Normal
2,2024-06-01,591350.9,12943090.0,2417,2188.7329,-1.329333,-3.649242,-1.26634,,,...,Low Volatility,Low Growth – Low Volatility,-0.677663,-0.770493,0.242933,False,0.804549,Stable,True,System Shock Only
3,2024-07-01,587523.3,13447000.0,2400,2288.759964,-0.647264,3.893246,-0.703351,,,...,Low Volatility,Low Growth – Low Volatility,-0.549259,0.394731,1.01222,False,0.496963,Stable,False,Stable / Normal
4,2024-08-01,607102.52,13290060.0,2401,2189.096044,3.332501,-1.167107,0.041667,,,...,Low Volatility,High Growth – Low Volatility,0.199956,-0.387033,2.550795,False,0.281627,High Growth,False,Stable / Normal


### Interpretation (Joint System + Monthly View)

This table is the **core evidence layer** of Notebook 3.  
Each row represents a month where **daily system signals** and **monthly structural metrics** are evaluated together.

The goal is to distinguish:
- Normal system evolution
- Short-term operational shocks
- True structural or persistent risk

---

### What we observe

- The **majority of months** are classified as **Stable / Normal**
- Even **high-growth months** are not automatically anomalous
- One month appears as **System Shock Only**
- **No month** shows combined **System + Entity Stress**

---

### What this indicates

- Daily anomalies do occur, but are **absorbed without structural impact**
- High growth alone is **not a risk signal**
- Short-term operational disturbances **do not cascade**
- Entity dominance does **not amplify system shocks**

---

### Why this matters

This joint view prevents:
- Overreacting to daily noise
- Misclassifying growth as instability

It ensures that **only persistent, multi-layer evidence**
would justify escalation or intervention.

---

### Key takeaway

> **The system experiences stress, but does not accumulate fragility.**

UPI demonstrates:
- Stable system health  
- Isolated, manageable operational risk  
- No evidence of escalating or structural systemic risk  

This directly supports a **monitor-not-intervene** decision stance.

## Phase 3 — Risk & Anomaly Logic (Final Integration)

The objective of Phase 3 is to **convert analysis into a decision-ready output**.

Here we:
- Combine **daily system stress signals** (Notebook 1)
- With **monthly structural and entity-level indicators** (Notebook 2)
- Apply a **clear, rule-based logic** to distinguish:
  - Normal system behavior
  - Short-term operational shocks
  - Potential structural or systemic risk

This step produces the **single, authoritative inference table** for the entire project.

In [6]:
# Save FINAL joint inference table (MASTER OUTPUT)

FINAL_OUTPUT_PATH = (
    "/content/drive/MyDrive/Hotfoot/outputs/"
    "nb3_final_joint_system_inference.csv"
)

joint_df.to_csv(FINAL_OUTPUT_PATH, index=False)

print("Final system-level inference CSV saved to:")
print(FINAL_OUTPUT_PATH)

Final system-level inference CSV saved to:
/content/drive/MyDrive/Hotfoot/outputs/nb3_final_joint_system_inference.csv


## Final System-Level Inference Output

The CSV generated above is the **primary deliverable of this analysis**.

Each row represents a **monthly decision unit**, answering the question:

> *Does this period reflect normal behavior, a temporary system shock, or structural risk?*

---

### What this file contains

The final table consolidates:
- Daily anomaly evidence (system stress signals)
- Monthly growth, volatility, and concentration metrics
- A unified classification logic applied consistently across time

This ensures conclusions are based on **combined evidence**, not isolated indicators.

---

### What we observe

- Most months are classified as **Stable / Normal**
- Occasional **System Shock Only** periods appear, but do not persist
- No periods show **compounding system + entity stress**

---

### Why this matters

This framework prevents:
- Overreaction to short-term noise
- Misclassification of growth as instability
- Escalation without persistent, multi-layer evidence

It supports **proportionate, evidence-based decision-making**.

---

### Key takeaway

> This CSV is where analysis becomes **decision**.

It is designed to be:
- Readable by policymakers  
- Auditable by analysts  
- Reusable for ongoing system monitoring  

All prior notebooks exist to justify and validate this final inference.

## Final Assessment: Risk Taxonomy and System Health

### Risk Taxonomy

Based on combined evidence across daily and monthly analyses, observed risks fall into four clear categories:

**1 Competitive Risk**  
- Persistent Top-3 / Top-5 dominance  
- High consistency of leading entities across months  

**2 Operational Risk**  
- Short-lived daily anomalies  
- Clear substitution into non-UPI instruments  
- No lasting monthly or structural impact  

**3 Structural Ecosystem Risk**  
- Gradually rising concentration indices  
- Declining marginal predictability with scale  
- Sustained dominance across multiple growth regimes  

**4 Data & Reporting Risk**  
- Isolated single-day spikes  
- Partial-month artifacts  
- Non-persistent statistical deviations  

This taxonomy ensures **proportionate interpretation** and avoids conflating noise with risk.

---

### Cross-Notebook Synthesis

**Key conclusions across all notebooks:**

- UPI growth remains strong but is **increasingly concentrated**
- Dominance is **persistent**, not episodic
- System-level anomalies are **rare and short-lived**
- Substitution behavior confirms **operational resilience**
- Growth is transitioning from **explosive adoption to mature scaling**

**Net result:**  
The ecosystem exhibits **high concentration without fragility**.

---

### Closing Assessment

There is **no evidence of systemic instability** in India’s digital payments ecosystem.

Observed stress events are:
- Absorbed quickly  
- Operational rather than structural  
- Not amplified by entity dominance  

However, the **persistence of concentration** indicates that future risk lies not in transaction failure, but in **competitive dependency**.

The system has moved from rapid expansion to **structural maturity**.

**Forward-looking implication:**  
Regulatory and platform focus should gradually shift from growth enablement toward  
**resilience, redundancy, and competition preservation**.

This completes the end-to-end analytical framework.
