# 00D — Structural Flow Data Loader (BoP)

**Purpose**: Extract Balance of Payments (BoP) data to identify "Structural Liquidity Regimes" (Portfolio vs. sticky FDI flows).

**Inputs**: `../data_processed/rbi_macro_all_long.parquet`

**Outputs**:
- `../data_processed/flow_regime_monthly.parquet`

---

## 1. Load RBI Data

In [1]:
import pandas as pd
import numpy as np
from pathlib import Path

PROCESSED_PATH = Path('../data_processed')
RBI_DATA_PATH = PROCESSED_PATH / 'rbi_macro_all_long.parquet'

if RBI_DATA_PATH.exists():
    df = pd.read_parquet(RBI_DATA_PATH)
    print(f"Loaded {len(df):,} observations from RBI data")
else:
    raise FileNotFoundError("RBI data not found. Run 00B first.")

Loaded 41,884 observations from RBI data


## 2. Extract Key Flow Metrics

We focus on **USD Net Inflows** for:
1.  **Portfolio Investment** (Hot Money / FII)
2.  **Direct Investment** (Sticky Capital / FDI)

In [2]:
# Define the flow variables we found in the discovery phase
FLOW_VARS = {
    'FPI_NET_USD': 'BoP - PORTFOLIO INVESTMENT IN INDIA - NET USD',
    'FDI_NET_USD': 'BoP - FOREIGN DIRECT INVESTMENT (INDIA+ABROAD) - NET USD',
    'TOTAL_CAPITAL_NET_USD': 'Total Investment Inflows (US $ Million)' # Proxy if available, else derive
}

# Filter for these series
flow_df = df[df['series_name'].isin(FLOW_VARS.values())].copy()
flow_df['series_mapped'] = flow_df['series_name'].map({v: k for k, v in FLOW_VARS.items()})

# Pivot to Wide Format (Date x Series)
flow_wide = flow_df.pivot_table(index='Date', columns='series_mapped', values='value')

# Resample to Monthly (BoP is often Quarterly, but some series might be monthly)
# We forward fill for 2 months to handle quarterly data in a monthly index
flow_monthly = flow_wide.resample('ME').last().ffill(limit=2)

print("Flow Data Preview:")
print(flow_monthly.tail())

Flow Data Preview:
series_mapped  FDI_NET_USD  FPI_NET_USD  TOTAL_CAPITAL_NET_USD
Date                                                          
2025-07-31             NaN          NaN                    NaN
2025-08-31             NaN          NaN                    NaN
2025-09-30             NaN          NaN                    NaN
2025-10-31             NaN          NaN                    NaN
2025-11-30             NaN          NaN                 237.23


## 3. Compute Liquidity Z-Scores

Absolute flow numbers scale with the economy (GDP). A $1B inflow in 2010 is huge; in 2026 it's noise.
We normalize using a **Rolling Z-Score** to identify "Regimes".

- **Positive Z-Score**: Excess Liquidity (Bullish Risk Assets)
- **Negative Z-Score**: Liquidity Crunch (Bearish/Correction Risk)

In [3]:
def calculate_z_score(series, window=36):
    """Calculate Rolling Z-Score (3-Year Window)."""
    # Minimum 12 months data required
    roll = series.rolling(window=window, min_periods=12)
    return (series - roll.mean()) / roll.std()

if not flow_monthly.empty:
    # Calculate Z-Scores
    for col in flow_monthly.columns:
        flow_monthly[f'{col}_ZSCORE'] = calculate_z_score(flow_monthly[col])
        
    # Calculate 12M Cumulative Flow (Momentum)
    for col in [c for c in flow_monthly.columns if 'ZSCORE' not in c]:
        flow_monthly[f'{col}_12M_SUM'] = flow_monthly[col].rolling(12).sum()

    print("\nComputed Z-Scores and Cumulative Flows.")
else:
    print("No flow data available to compute metrics.")


Computed Z-Scores and Cumulative Flows.


## 4. Export

In [4]:
if not flow_monthly.empty:
    output_path = PROCESSED_PATH / 'flow_regime_monthly.parquet'
    flow_monthly.to_parquet(output_path)
    print(f"✓ Saved: {output_path}")
    print(f"  Shape: {flow_monthly.shape}")
    print(f"  Columns: {list(flow_monthly.columns)}")
else:
    print("Skipping export due to empty data.")

✓ Saved: ..\data_processed\flow_regime_monthly.parquet
  Shape: (179, 9)
  Columns: ['FDI_NET_USD', 'FPI_NET_USD', 'TOTAL_CAPITAL_NET_USD', 'FDI_NET_USD_ZSCORE', 'FPI_NET_USD_ZSCORE', 'TOTAL_CAPITAL_NET_USD_ZSCORE', 'FDI_NET_USD_12M_SUM', 'FPI_NET_USD_12M_SUM', 'TOTAL_CAPITAL_NET_USD_12M_SUM']
