# Step 7: Universe Construction

This notebook:
1. Filters to Russell 3000 universe
2. Applies tradable filters (price, market cap, etc.)
3. Creates monthly tradable universe for backtesting

**Note**: For a full implementation, you would need CRSP data for returns, prices, and market caps.
This notebook provides the framework.


In [1]:
import pandas as pd
import numpy as np
from pathlib import Path
import json

# Load config
BASE_DIR = Path('/Users/david/Desktop/MATH-GA 2707/Moving Target')
CONFIG_DIR = BASE_DIR / 'configs'
INTERMEDIATE_DIR = BASE_DIR / 'data' / 'intermediate'

with open(CONFIG_DIR / 'base.json', 'r') as f:
    config = json.load(f)

for key in config['data']:
    config['data'][key] = Path(config['data'][key])

# Load Russell 3000 tickers
# Header is on line 10 (1-indexed), so skip 9 rows (0-indexed) to keep header
russell_df = pd.read_csv(config['data']['russell_3000_file'], skiprows=9)
# Check if Ticker column exists, if not try alternative column names
if 'Ticker' not in russell_df.columns:
    print(f"Available columns: {list(russell_df.columns)}")
    if len(russell_df.columns) > 0:
        russell_df.rename(columns={russell_df.columns[0]: 'Ticker'}, inplace=True)

russell_df = russell_df[russell_df['Ticker'].notna()].copy()
russell_df['Ticker'] = russell_df['Ticker'].astype(str).str.strip().str.replace('"', '')
russell_df = russell_df[russell_df['Ticker'] != ''].copy()
russell_tickers = set(russell_df['Ticker'].unique())

print(f"Russell 3000 universe: {len(russell_tickers)} tickers")

# Load monthly signal
df_monthly = pd.read_parquet(config['data']['monthly_signal'])
print(f"\nLoaded {len(df_monthly)} monthly signal observations")


Russell 3000 universe: 2611 tickers

Loaded 218052 monthly signal observations


In [2]:
# Filter to Russell 3000
df_universe = df_monthly[df_monthly['ticker'].isin(russell_tickers)].copy()
print(f"After Russell 3000 filter: {len(df_universe)} observations")

# Apply additional filters
# In production, you would:
# 1. Join with CRSP to get prices, market caps
# 2. Filter: price > $5, common shares only, etc.
# 3. Require valid return next month

# For now, we'll just mark which observations have valid MT signal
df_universe['has_valid_signal'] = df_universe['MT_asof'].notna()
df_universe['has_valid_return'] = True  # Placeholder - would check CRSP returns

# Additional filters would go here:
# df_universe = df_universe[df_universe['price'] > config['trading']['min_price']]
# df_universe = df_universe[df_universe['shares_outstanding'] > 0]
# etc.

print(f"\nUniverse Statistics:")
print(f"  Observations with valid MT signal: {df_universe['has_valid_signal'].sum()}")
print(f"  Unique firms: {df_universe['firm_id'].nunique()}")
print(f"  Month range: {df_universe['month_end'].min()} to {df_universe['month_end'].max()}")


After Russell 3000 filter: 218052 observations
After filtering to valid MT signals: 92144 observations

MT Signal Statistics:
  MT mean: 0.8643
  MT median: 1.0000
  MT min: 0.0000
  MT max: 1.0000

  Using MT threshold: 1.0000 (median)
  Observations above threshold: 70617

Final Universe Statistics:
  Total firm-month observations: 92144
  Unique firms: 1411
  Unique tickers: 1411
  Month range: 2010-08-31 00:00:00 to 2026-05-31 00:00:00


In [3]:
# Save universe
output_file = config['data']['universe_monthly']
df_universe.to_parquet(output_file, index=False, engine='pyarrow')
print(f"Saved tradable universe to: {output_file}")
print(f"  Total firm-month observations: {len(df_universe)}")
print(f"  With valid signal: {df_universe['has_valid_signal'].sum()}")


Saved tradable universe to: /Users/david/Desktop/MATH-GA 2707/Moving Target/data/intermediate/universe_monthly.parquet
  Total firm-month observations: 92144
  With valid signal: 92144
