# Factor IC Analysis

The **Information Coefficient (IC)** measures the cross-sectional correlation between a factor’s signal and subsequent returns. In factor research, IC quantifies predictive power on each rebalancing date and helps compare factor efficacy.

**Spearman rank correlation** is standard because it is robust to outliers and captures monotonic relationships without assuming linearity or normal distributions, which is common for financial cross sections.

In [None]:
# Load libraries
from pathlib import Path

import numpy as np
import pandas as pd
from scipy.stats import spearmanr

# Project-root-safe path resolution
try:
    ROOT = Path(__file__).resolve().parents[1]
except NameError:
    ROOT = Path.cwd()

data_path = ROOT / '00_data' / 'features' / 'factors.parquet'
df = pd.read_parquet(data_path)

# Basic dataset diagnostics
print('Shape:', df.shape)
print('Columns:', df.columns.tolist())

# Identify key columns
date_col = 'date' if 'date' in df.columns else 'trade_date'
ticker_col = 'ticker' if 'ticker' in df.columns else 'symbol'

df[date_col] = pd.to_datetime(df[date_col])
print('Date range:', df[date_col].min(), '→', df[date_col].max())
print('Number of tickers:', df[ticker_col].nunique())

In [None]:
# Compute daily IC time series per factor
factors = ['mom_20d', 'mom_60d', 'vol_20d', 'zscore_20d_price']
target = 'ret_fwd_5d'

ic_records = []

for date, group in df.groupby(date_col):
    row = {date_col: date}
    for factor in factors:
        sub = group[[factor, target]].dropna()
        # Require a minimum cross section to avoid noisy ICs
        if len(sub) >= 10:
            ic = spearmanr(sub[factor], sub[target]).correlation
        else:
            ic = np.nan
        row[factor] = ic
    ic_records.append(row)

ic_daily = pd.DataFrame(ic_records).set_index(date_col).sort_index()
ic_daily.head()

In [None]:
# Aggregate IC statistics
summary = []

for factor in factors:
    series = ic_daily[factor].dropna()
    n_obs = series.shape[0]
    mean_ic = series.mean()
    std_ic = series.std(ddof=1)
    t_stat = mean_ic / (std_ic / np.sqrt(n_obs)) if n_obs > 1 and std_ic != 0 else np.nan
    summary.append({
        'factor': factor,
        'mean_ic': mean_ic,
        'std_ic': std_ic,
        't_stat': t_stat,
        'n_obs': n_obs,
    })

ic_overall = pd.DataFrame(summary)
ic_overall = ic_overall.reindex(ic_overall['mean_ic'].abs().sort_values(ascending=False).index)
ic_overall

In [None]:
# Temporal stability: monthly average ICs
ic_monthly = ic_daily.resample('M').mean()
ic_monthly.index = ic_monthly.index.to_period('M').astype(str)
ic_monthly.head()

In [None]:
# Save outputs (PARQUET only)
output_overall = ROOT / '05_reports' / 'ic_overall.parquet'
output_monthly = ROOT / '05_reports' / 'ic_by_month.parquet'

ic_overall.to_parquet(output_overall, index=False)
ic_monthly.to_parquet(output_monthly)

print('Saved:', output_overall)
print('Saved:', output_monthly)

## Interpretation

- **Strongest factor:** The factor with the largest absolute mean IC in the summary table above.
- **Weakest factor:** The factor with the smallest absolute mean IC in the summary table above.
- **Sign consistency & stability:** Review monthly ICs to see whether signs are consistent through time or if they drift/flip, indicating instability.
- **Limitations:** These ICs ignore transaction costs, implementation constraints, and portfolio construction effects; they measure only raw cross-sectional predictiveness.