# 01A — Frequency Decision Matrix

**Purpose**: Decide correct frequency per variable to prevent statistical nonsense

**Principle**: Lower frequency wins in macro causality
- Daily = behavior/noise
- Monthly = structure/signal

## ⚠️ CRITICAL: Point-in-Time Architecture (UPGRADE 2)

All macro data has **publication lag**. Using January CPI to predict January returns is look-ahead bias!

| Data Type | Actual Release Lag | Backtest Lag |
|-----------|-------------------|---------------|
| Stock Returns | 0 (real-time) | 0 |
| CPI/WPI/IIP | ~2-4 weeks after month end | **1 month** |
| GDP | ~2 months after quarter end | **1 quarter** |
| Trade Balance | ~15 days after month end | **1 month** |
| Forex Reserves | Weekly, ~1 week lag | **0** |
| Global Market Data | Real-time | **0** |

**Output**: Frequency constitution table with publication lags

---

In [1]:
import pandas as pd
import numpy as np
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

PROCESSED_PATH = Path('../data_processed')

# Load coverage data
rbi_coverage = pd.read_parquet(PROCESSED_PATH / 'rbi_macro_coverage.parquet')
print(f"RBI series: {len(rbi_coverage)}")

RBI series: 99


## 1. Frequency Rules (Macro Constitution)

### Core Principles

| Principle | Rule |
|-----------|------|
| **1. Never mix frequencies** | Don't correlate daily CPI with daily NIFTY |
| **2. Lower frequency dominates** | Monthly macro drives weekly returns, not vice versa |
| **3. Lead-lag structure matters** | Macro leads markets (1-3 months typical) |
| **4. Transformations must be consistent** | YoY for inflation, MoM for IIP, levels for rates |
| **5. Point-in-Time compliance** | Lag macro data by publication delay |

In [2]:
# UPGRADE 2: Publication Lag Map (Point-in-Time Compliance)
# This is the MOST IMPORTANT file in the entire system
# Getting this wrong = amazing backtest, losing money live

PUBLICATION_LAG_MAP = {
    # === INDIA MACRO (Released with lag) ===
    'CPI': {'lag_months': 1, 'typical_release': '12th of following month'},
    'WPI': {'lag_months': 1, 'typical_release': '14th of following month'},
    'IIP': {'lag_months': 2, 'typical_release': '12th of M+2'},
    'GDP': {'lag_months': 3, 'typical_release': 'End of Q+1'},
    'M3': {'lag_months': 0, 'typical_release': 'Weekly, few days lag'},
    'credit': {'lag_months': 0, 'typical_release': 'Weekly, few days lag'},
    'forex_reserves': {'lag_months': 0, 'typical_release': 'Weekly'},
    'trade_balance': {'lag_months': 1, 'typical_release': 'Mid-month'},
    'repo_rate': {'lag_months': 0, 'typical_release': 'Announcement day'},
    'gsec_10y': {'lag_months': 0, 'typical_release': 'Real-time'},
    
    # === GLOBAL MARKET DATA (Real-time) ===
    'SP500': {'lag_months': 0, 'typical_release': 'Real-time'},
    'NASDAQ100': {'lag_months': 0, 'typical_release': 'Real-time'},
    'VIX': {'lag_months': 0, 'typical_release': 'Real-time'},
    'DXY': {'lag_months': 0, 'typical_release': 'Real-time'},
    'USDINR': {'lag_months': 0, 'typical_release': 'Real-time'},
    'BRENT': {'lag_months': 0, 'typical_release': 'Real-time'},
    'GOLD': {'lag_months': 0, 'typical_release': 'Real-time'},
    'US10Y': {'lag_months': 0, 'typical_release': 'Real-time'},
    
    # === INDIA SECTORS (Real-time) ===
    'NIFTY': {'lag_months': 0, 'typical_release': 'Real-time'},
}

print("Publication Lag Map:")
for k, v in PUBLICATION_LAG_MAP.items():
    lag = v['lag_months']
    status = '⚠️ LAGGED' if lag > 0 else '✓ Real-time'
    print(f"  {k}: {lag} months {status}")

Publication Lag Map:
  CPI: 1 months ⚠️ LAGGED
  WPI: 1 months ⚠️ LAGGED
  IIP: 2 months ⚠️ LAGGED
  GDP: 3 months ⚠️ LAGGED
  M3: 0 months ✓ Real-time
  credit: 0 months ✓ Real-time
  forex_reserves: 0 months ✓ Real-time
  trade_balance: 1 months ⚠️ LAGGED
  repo_rate: 0 months ✓ Real-time
  gsec_10y: 0 months ✓ Real-time
  SP500: 0 months ✓ Real-time
  NASDAQ100: 0 months ✓ Real-time
  VIX: 0 months ✓ Real-time
  DXY: 0 months ✓ Real-time
  USDINR: 0 months ✓ Real-time
  BRENT: 0 months ✓ Real-time
  GOLD: 0 months ✓ Real-time
  US10Y: 0 months ✓ Real-time
  NIFTY: 0 months ✓ Real-time


In [3]:
# Define the macro constitution
FREQUENCY_CONSTITUTION = {
    # === INDIAN MACRO ===
    'inflation': {
        'series_patterns': ['CPI', 'WPI', 'inflation', 'price index'],
        'native_frequency': 'monthly',
        'use_frequency': 'monthly',
        'transformation': 'YoY %',
        'lag_range': '1-3 months',
        'publication_lag_months': 1,  # UPGRADE 2
        'economic_role': 'Cost structure, RBI policy driver'
    },
    'growth': {
        'series_patterns': ['GDP', 'IIP', 'production', 'output', 'PMI'],
        'native_frequency': 'monthly/quarterly',
        'use_frequency': 'monthly',
        'transformation': 'YoY % or MoM %',
        'lag_range': '1-2 months',
        'publication_lag_months': 2,  # UPGRADE 2: IIP is 2 months lag
        'economic_role': 'Earnings growth driver'
    },
    'rates': {
        'series_patterns': ['repo', 'reverse repo', 'policy rate', 'yield', 'G-Sec'],
        'native_frequency': 'daily/weekly',
        'use_frequency': 'monthly',
        'transformation': 'Level + Δ (bps)',
        'lag_range': '0-1 months',
        'publication_lag_months': 0,  # Real-time
        'economic_role': 'Discount rate, valuation'
    },
    'liquidity': {
        'series_patterns': ['M1', 'M2', 'M3', 'money supply', 'credit', 'deposit'],
        'native_frequency': 'weekly/fortnightly',
        'use_frequency': 'monthly',
        'transformation': 'YoY %',
        'lag_range': '1-3 months',
        'publication_lag_months': 0,  # Weekly data, minimal lag
        'economic_role': 'Market liquidity, flow driver'
    },
    'fx': {
        'series_patterns': ['USD', 'INR', 'exchange', 'forex', 'reserves'],
        'native_frequency': 'daily',
        'use_frequency': 'weekly/monthly',
        'transformation': '% change, YoY',
        'lag_range': '0-1 months',
        'publication_lag_months': 0,  # Real-time
        'economic_role': 'Import cost, FII flows'
    },
    'trade': {
        'series_patterns': ['export', 'import', 'trade balance', 'CAD'],
        'native_frequency': 'monthly',
        'use_frequency': 'monthly',
        'transformation': 'Level, YoY %',
        'lag_range': '1-2 months',
        'publication_lag_months': 1,  # UPGRADE 2
        'economic_role': 'Sector earnings, FX pressure'
    },
    
    # === GLOBAL MACRO (All real-time) ===
    'global_equity': {
        'series_patterns': ['SP500', 'NASDAQ', 'MSCI', 'CSI', 'STOXX', 'NIKKEI'],
        'native_frequency': 'daily',
        'use_frequency': 'weekly/monthly',
        'transformation': '% return (1W, 1M, 3M)',
        'lag_range': '0-2 weeks',
        'publication_lag_months': 0,
        'economic_role': 'Risk sentiment, EM flows'
    },
    'global_fx': {
        'series_patterns': ['DXY', 'USDINR', 'USDCNY', 'USDJPY'],
        'native_frequency': 'daily',
        'use_frequency': 'weekly',
        'transformation': '% change',
        'lag_range': '0-1 weeks',
        'publication_lag_months': 0,
        'economic_role': 'EM pressure, carry trade'
    },
    'commodities': {
        'series_patterns': ['BRENT', 'WTI', 'GOLD', 'COPPER', 'SILVER'],
        'native_frequency': 'daily',
        'use_frequency': 'weekly',
        'transformation': '% change, YoY',
        'lag_range': '0-4 weeks',
        'publication_lag_months': 0,
        'economic_role': 'Input costs, inflation driver'
    },
    'us_rates': {
        'series_patterns': ['US10Y', 'US2Y', 'YIELD_CURVE', 'Fed'],
        'native_frequency': 'daily',
        'use_frequency': 'monthly',
        'transformation': 'Level + Δ (bps)',
        'lag_range': '0-1 months',
        'publication_lag_months': 0,
        'economic_role': 'Global discount rate, EM flows'
    },
    'global_liquidity': {
        'series_patterns': ['VIX', 'INDIAVIX', 'HYG', 'HY_SPREAD'],
        'native_frequency': 'daily',
        'use_frequency': 'weekly',
        'transformation': 'Level + Δ',
        'lag_range': '0-1 weeks',
        'publication_lag_months': 0,
        'economic_role': 'Global leverage, risk appetite'
    },
    
    # === INDIA INDICES ===
    'india_indices': {
        'series_patterns': ['NIFTY', 'BANK', 'IT', 'PHARMA', 'AUTO', 'FMCG'],
        'native_frequency': 'daily',
        'use_frequency': 'monthly (for macro correlation)',
        'transformation': '% return (1M)',
        'lag_range': 'N/A (dependent variable)',
        'publication_lag_months': 0,
        'economic_role': 'Target variable for macro analysis'
    }
}

print("Frequency constitution defined with", len(FREQUENCY_CONSTITUTION), "categories")

Frequency constitution defined with 12 categories


## 2. Create Frequency Decision Table

In [4]:
# Convert constitution to DataFrame
freq_table = pd.DataFrame([
    {
        'category': cat,
        'series_patterns': ', '.join(v['series_patterns']),
        'native_frequency': v['native_frequency'],
        'use_frequency': v['use_frequency'],
        'transformation': v['transformation'],
        'lag_range': v['lag_range'],
        'publication_lag_months': v.get('publication_lag_months', 0),
        'economic_role': v['economic_role']
    }
    for cat, v in FREQUENCY_CONSTITUTION.items()
])

freq_table

Unnamed: 0,category,series_patterns,native_frequency,use_frequency,transformation,lag_range,publication_lag_months,economic_role
0,inflation,"CPI, WPI, inflation, price index",monthly,monthly,YoY %,1-3 months,1,"Cost structure, RBI policy driver"
1,growth,"GDP, IIP, production, output, PMI",monthly/quarterly,monthly,YoY % or MoM %,1-2 months,2,Earnings growth driver
2,rates,"repo, reverse repo, policy rate, yield, G-Sec",daily/weekly,monthly,Level + Δ (bps),0-1 months,0,"Discount rate, valuation"
3,liquidity,"M1, M2, M3, money supply, credit, deposit",weekly/fortnightly,monthly,YoY %,1-3 months,0,"Market liquidity, flow driver"
4,fx,"USD, INR, exchange, forex, reserves",daily,weekly/monthly,"% change, YoY",0-1 months,0,"Import cost, FII flows"
5,trade,"export, import, trade balance, CAD",monthly,monthly,"Level, YoY %",1-2 months,1,"Sector earnings, FX pressure"
6,global_equity,"SP500, NASDAQ, MSCI, CSI, STOXX, NIKKEI",daily,weekly/monthly,"% return (1W, 1M, 3M)",0-2 weeks,0,"Risk sentiment, EM flows"
7,global_fx,"DXY, USDINR, USDCNY, USDJPY",daily,weekly,% change,0-1 weeks,0,"EM pressure, carry trade"
8,commodities,"BRENT, WTI, GOLD, COPPER, SILVER",daily,weekly,"% change, YoY",0-4 weeks,0,"Input costs, inflation driver"
9,us_rates,"US10Y, US2Y, YIELD_CURVE, Fed",daily,monthly,Level + Δ (bps),0-1 months,0,"Global discount rate, EM flows"


## 3. Map RBI Series to Categories

In [5]:
def categorize_series(series_name: str) -> str:
    """Map series to category based on name patterns."""
    series_lower = series_name.lower()
    
    for category, config in FREQUENCY_CONSTITUTION.items():
        for pattern in config['series_patterns']:
            if pattern.lower() in series_lower:
                return category
    return 'other'

# Apply to RBI coverage
rbi_coverage['category'] = rbi_coverage['series_name'].apply(categorize_series)

# Add use_frequency and transformation
rbi_coverage['use_frequency'] = rbi_coverage['category'].map(
    {cat: v['use_frequency'] for cat, v in FREQUENCY_CONSTITUTION.items()}
).fillna('monthly')

rbi_coverage['transformation'] = rbi_coverage['category'].map(
    {cat: v['transformation'] for cat, v in FREQUENCY_CONSTITUTION.items()}
).fillna('Level')

rbi_coverage['lag_range'] = rbi_coverage['category'].map(
    {cat: v['lag_range'] for cat, v in FREQUENCY_CONSTITUTION.items()}
).fillna('1-3 months')

# UPGRADE 2: Add publication lag
rbi_coverage['publication_lag_months'] = rbi_coverage['category'].map(
    {cat: v.get('publication_lag_months', 0) for cat, v in FREQUENCY_CONSTITUTION.items()}
).fillna(1)  # Default 1 month lag for unknown series

print("Category distribution:")
rbi_coverage['category'].value_counts()

Category distribution:


category
fx               26
other            24
liquidity        14
india_indices    13
inflation        10
rates             8
trade             3
growth            1
Name: count, dtype: int64

## 4. Identify Key Usable Series

In [6]:
# Filter to usable series (5+ years, categorized)
usable = rbi_coverage[
    (rbi_coverage['years'] >= 5) & 
    (rbi_coverage['category'] != 'other')
].copy()

# Sort by category and coverage
usable = usable.sort_values(['category', 'years'], ascending=[True, False])

print(f"Usable categorized series: {len(usable)}")
print("\nTop series by category:")
usable.groupby('category').head(3)[['series_name', 'category', 'frequency', 'years', 'use_frequency', 'transformation', 'publication_lag_months']]

Usable categorized series: 66

Top series by category:


Unnamed: 0,series_name,category,frequency,years,use_frequency,transformation,publication_lag_months
26,"BoP - TRANSFER OFFICIAL,DEBIT USD",fx,quarterly,15.496235,weekly/monthly,"% change, YoY",0.0
27,"BoP - SERVICES, GOVERNMENT SERVICES NOT INCLUD...",fx,quarterly,15.496235,weekly/monthly,"% change, YoY",0.0
28,BoP - PORTFOLIO FOREIGN INVESTMENT ABROAD NET USD,fx,quarterly,15.496235,weekly/monthly,"% change, YoY",0.0
78,Index of Industrial Production,growth,monthly,8.167009,monthly,YoY % or MoM %,2.0
2,CURRENCY WITH THE PUBLIC,india_indices,monthly,74.839151,monthly (for macro correlation),% return (1M),0.0
6,MARKET CAPITALISATION - BSE,india_indices,monthly,45.670089,monthly (for macro correlation),% return (1M),0.0
11,FISCAL DEFICIT,india_indices,monthly,25.670089,monthly (for macro correlation),% return (1M),0.0
33,ALL INDIA HOUSE PRICE INDEX (Base-Year : Q1:20...,inflation,quarterly,14.754278,monthly,YoY %,1.0
38,WPI-Monthly-MANUFACTURED PRODUCTS,inflation,monthly,13.667351,monthly,YoY %,1.0
39,WPI-Monthly-ALL COMMODITY,inflation,monthly,13.667351,monthly,YoY %,1.0


## 5. Create Master Series Registry

In [7]:
# Create a clean registry of all series to be used
series_registry = pd.concat([
    # RBI Macro series
    usable[['series_name', 'category', 'frequency', 'use_frequency', 'transformation', 'lag_range', 'publication_lag_months', 'years']].assign(source='RBI'),
], ignore_index=True)

# Add global series manually (UPGRADE 2: all with publication_lag = 0)
global_series = pd.DataFrame([
    # Equity
    {'series_name': 'SP500', 'category': 'global_equity', 'frequency': 'daily', 'use_frequency': 'weekly', 'transformation': '% return', 'lag_range': '0-2 weeks', 'publication_lag_months': 0, 'years': 15, 'source': 'Yahoo'},
    {'series_name': 'NASDAQ100', 'category': 'global_equity', 'frequency': 'daily', 'use_frequency': 'weekly', 'transformation': '% return', 'lag_range': '0-2 weeks', 'publication_lag_months': 0, 'years': 15, 'source': 'Yahoo'},
    {'series_name': 'MSCI_EM', 'category': 'global_equity', 'frequency': 'daily', 'use_frequency': 'weekly', 'transformation': '% return', 'lag_range': '0-2 weeks', 'publication_lag_months': 0, 'years': 15, 'source': 'Yahoo'},
    {'series_name': 'CSI300', 'category': 'global_equity', 'frequency': 'daily', 'use_frequency': 'weekly', 'transformation': '% return', 'lag_range': '0-2 weeks', 'publication_lag_months': 0, 'years': 10, 'source': 'Yahoo'},
    # FX
    {'series_name': 'USDINR', 'category': 'global_fx', 'frequency': 'daily', 'use_frequency': 'weekly', 'transformation': '% change', 'lag_range': '0-1 weeks', 'publication_lag_months': 0, 'years': 15, 'source': 'Yahoo'},
    {'series_name': 'DXY', 'category': 'global_fx', 'frequency': 'daily', 'use_frequency': 'weekly', 'transformation': '% change', 'lag_range': '0-1 weeks', 'publication_lag_months': 0, 'years': 15, 'source': 'Yahoo'},
    # Commodities
    {'series_name': 'BRENT', 'category': 'commodities', 'frequency': 'daily', 'use_frequency': 'weekly', 'transformation': '% change', 'lag_range': '0-4 weeks', 'publication_lag_months': 0, 'years': 15, 'source': 'Yahoo'},
    {'series_name': 'GOLD', 'category': 'commodities', 'frequency': 'daily', 'use_frequency': 'weekly', 'transformation': '% change', 'lag_range': '0-4 weeks', 'publication_lag_months': 0, 'years': 15, 'source': 'Yahoo'},
    {'series_name': 'COPPER', 'category': 'commodities', 'frequency': 'daily', 'use_frequency': 'weekly', 'transformation': '% change', 'lag_range': '0-4 weeks', 'publication_lag_months': 0, 'years': 15, 'source': 'Yahoo'},
    # Rates
    {'series_name': 'US10Y', 'category': 'us_rates', 'frequency': 'daily', 'use_frequency': 'monthly', 'transformation': 'Level + Δ', 'lag_range': '0-1 months', 'publication_lag_months': 0, 'years': 15, 'source': 'Yahoo'},
    {'series_name': 'US_YIELD_CURVE', 'category': 'us_rates', 'frequency': 'daily', 'use_frequency': 'monthly', 'transformation': 'Level + Δ', 'lag_range': '0-1 months', 'publication_lag_months': 0, 'years': 15, 'source': 'Yahoo'},
    # UPGRADE 1: Global Liquidity
    {'series_name': 'VIX', 'category': 'global_liquidity', 'frequency': 'daily', 'use_frequency': 'weekly', 'transformation': 'Level + Δ', 'lag_range': '0-1 weeks', 'publication_lag_months': 0, 'years': 15, 'source': 'Yahoo'},
    {'series_name': 'INDIAVIX', 'category': 'global_liquidity', 'frequency': 'daily', 'use_frequency': 'weekly', 'transformation': 'Level + Δ', 'lag_range': '0-1 weeks', 'publication_lag_months': 0, 'years': 10, 'source': 'Yahoo'},
    {'series_name': 'CREDIT_RISK_SENTIMENT', 'category': 'global_liquidity', 'frequency': 'daily', 'use_frequency': 'weekly', 'transformation': 'Level + Δ', 'lag_range': '0-1 weeks', 'publication_lag_months': 0, 'years': 15, 'source': 'Yahoo'},
])

series_registry = pd.concat([series_registry, global_series], ignore_index=True)

print(f"Total series in registry: {len(series_registry)}")
print("\nBy category:")
series_registry.groupby('category').size().sort_values(ascending=False)

Total series in registry: 80

By category:


category
fx                  19
india_indices       13
liquidity           13
inflation           10
rates                8
global_equity        4
global_liquidity     3
commodities          3
global_fx            2
trade                2
us_rates             2
growth               1
dtype: int64

## 6. Export Frequency Constitution

In [8]:
# Save frequency decision table
freq_table.to_parquet(PROCESSED_PATH / 'frequency_constitution.parquet', index=False)
print("✓ Saved frequency_constitution.parquet")

# Save series registry
series_registry.to_parquet(PROCESSED_PATH / 'series_registry.parquet', index=False)
print("✓ Saved series_registry.parquet")

# Also save as CSV for easy viewing
series_registry.to_csv(PROCESSED_PATH / 'series_registry.csv', index=False)
print("✓ Saved series_registry.csv")

# UPGRADE 2: Save publication lag map
lag_df = pd.DataFrame([
    {'series': k, 'lag_months': v['lag_months'], 'typical_release': v['typical_release']}
    for k, v in PUBLICATION_LAG_MAP.items()
])
lag_df.to_parquet(PROCESSED_PATH / 'publication_lag_map.parquet', index=False)
lag_df.to_csv(PROCESSED_PATH / 'publication_lag_map.csv', index=False)
print("✓ Saved publication_lag_map.parquet/csv")

✓ Saved frequency_constitution.parquet


✓ Saved series_registry.parquet
✓ Saved series_registry.csv
✓ Saved publication_lag_map.parquet/csv


## 7. Summary: The Macro Constitution

### Frequency Rules Locked ✓

| Variable Type | Use Frequency | Transformation | Lag Range | **Publication Lag** |
|--------------|---------------|----------------|------------|--------------------|
| Inflation (CPI, WPI) | Monthly | YoY % | 1-3 months | **1 month** |
| Growth (IIP, GDP) | Monthly | YoY/MoM % | 1-2 months | **2 months** |
| Rates (Repo, G-Sec) | Monthly | Level + Δ | 0-1 months | 0 |
| Liquidity (M3, Credit) | Monthly | YoY % | 1-3 months | 0 |
| FX (USD/INR) | Weekly | % change | 0-1 months | 0 |
| Trade Balance | Monthly | YoY % | 1-2 months | **1 month** |
| Global Equity | Weekly | % return | 0-2 weeks | 0 |
| Commodities | Weekly | % change | 0-4 weeks | 0 |
| US Rates | Monthly | Level + Δ | 0-1 months | 0 |
| **VIX / Liquidity** | Weekly | Level + Δ | 0-1 weeks | 0 |

### UPGRADE 2 Applied ✓
- CPI/WPI: **1 month lag** (released ~12th of following month)
- IIP: **2 month lag** (released ~12th of M+2)
- GDP: **3 month lag** (released end of following quarter)
- Trade: **1 month lag**
- All market data: **0 lag** (real-time)

> ⚠️ **CRITICAL**: If you backtest using January CPI to predict January Nifty returns, your backtest will look amazing but you will lose money live.

### Key Outputs
- `frequency_constitution.parquet` — The rules
- `series_registry.parquet` — All approved series with their transformations
- `publication_lag_map.parquet` — Point-in-time lag requirements

**Next notebook:** `01B_index_macro_alignment.ipynb`