## Feature Engineering Overview

Build new features to improve our model’s predictive power. Focus on rolling returns, volatility, momentum, and market regime signals.

- Calculate rolling returns and volatility
- Add sector and ETF-relative features
- Build momentum and reversal indicators
- Merge in market cap, ETF, and VIX signals
- Prepare final wide dataset for model input


## 1. Load Cleaned OHLCV and SPY Data

We start from our single cleaned stock file and merge SPY daily closes by date. This ensures every row is aligned to the actual market calendar and can reference SPY for market regime and relative features.


In [None]:
import pandas as pd
import numpy as np

df = pd.read_csv('ohlcv_master_clean.csv', parse_dates=['date'])
spy = pd.read_csv('spy_ohlcv.csv', parse_dates=['date'])
spy = spy.rename(columns={
    'px_open': 'spy_open',
    'px_high': 'spy_high',
    'px_low': 'spy_low',
    'px_last': 'close_spy',
    'px_volume': 'spy_volume'
})

df = df.merge(spy[['date','close_spy']], on='date', how='left')
print("Merged SPY close onto stocks. Example rows:")
print(df.head())


Merged SPY close onto stocks. Example rows:
        date final_ticker     open     high      low    close  px_volume  \
0 2005-04-15           AA  66.1789  67.4331  65.2831  65.6190  3797001.0   
1 2005-04-18           AA  65.6414  67.0747  65.3951  66.2685  2941698.0   
2 2005-04-19           AA  66.1341  67.0747  65.0815  65.8206  3017338.0   
3 2005-04-20           AA  65.6638  65.7758  64.3313  64.7232  2400296.0   
4 2005-04-21           AA  65.1263  66.2909  64.3201  66.1789  2780416.0   

   close_spy  
0     114.15  
1     114.50  
2     115.41  
3     113.80  
4     116.01  


## 2. Merge in VIX Data

We add VIX (market volatility index) by date. This gives us a signal for market risk appetite and stress, useful for regime features and model context.


In [None]:
import yfinance as yf

# Grab VIX from Yahoo. Sometimes the columns change, so we check what’s actually there.
vix = yf.download('^VIX', start=df['date'].min(), end=df['date'].max())

# If there’s a multi-level header, flatten it out.
if isinstance(vix.columns, pd.MultiIndex):
    vix.columns = ['_'.join(col).strip() for col in vix.columns.values]

# Put the index back to a normal column so it's easy to merge later.
vix = vix.reset_index()

# Quick sanity check—print the columns so we can see what we got from Yahoo.
print("VIX columns after reset_index:", vix.columns.tolist())

# Find whatever the 'close' column is actually named and rename it for merging.
close_col = None
for col in vix.columns:
    if col.lower() == 'close':
        close_col = col
        break
    if 'close' in col.lower():
        close_col = col
if not close_col:
    raise ValueError("No close column found in VIX data. Columns: " + str(vix.columns))

# Now rename to 'vix_close' so the merge is simple and clear.
vix = vix.rename(columns={'Date':'date', close_col:'vix_close'})
vix['date'] = pd.to_datetime(vix['date'])
df['date'] = pd.to_datetime(df['date'])

# Merge VIX onto our main data by date—this always works, even if the column names change again.
df = df.merge(vix[['date', 'vix_close']], on='date', how='left')

# Build a daily VIX return and 10-day rolling VIX volatility—both can be useful for regime filters or as model inputs.
df['vix_return_1d'] = df['vix_close'].pct_change(1, fill_method=None)
df['vix_vol_10'] = df['vix_close'].rolling(10).std()

# Spot check that the columns all landed where we want them.
print(df[['date','vix_close','vix_return_1d','vix_vol_10']].head())


  vix = yf.download('^VIX', start=df['date'].min(), end=df['date'].max())
[*********************100%***********************]  1 of 1 completed


VIX columns after reset_index: ['Date', 'Close_^VIX', 'High_^VIX', 'Low_^VIX', 'Open_^VIX', 'Volume_^VIX']
        date  vix_close  vix_return_1d  vix_vol_10
0 2005-04-15  17.740000            NaN         NaN
1 2005-04-18  16.559999      -0.066516         NaN
2 2005-04-19  14.960000      -0.096618         NaN
3 2005-04-20  16.920000       0.131016         NaN
4 2005-04-21  14.410000      -0.148345         NaN


## 3. Merge in Sector ETF Data

We join daily closing prices and returns from each GICS sector ETF (e.g., XLK, XLF, etc.). This lets us model both absolute and relative sector moves as potential predictors.


In [None]:
import yfinance as yf
import warnings

warnings.simplefilter(action='ignore', category=FutureWarning)

sector_etfs = ['XLK', 'XLF', 'XLV', 'XLY', 'XLC', 'XLI', 'XLE', 'XLRE', 'XLP', 'XLB', 'XLU']

for etf in sector_etfs:
    # Pull daily data for the ETF, flatten any multi-level headers just in case
    etf_data = yf.download(etf, start=df['date'].min(), end=df['date'].max())
    if isinstance(etf_data.columns, pd.MultiIndex):
        etf_data.columns = ['_'.join(col).strip() for col in etf_data.columns.values]
    etf_data = etf_data.reset_index()
    print(f"{etf} columns after reset_index:", etf_data.columns.tolist())

    # Figure out what the close column is actually called
    close_col = None
    for col in etf_data.columns:
        if 'close' in col.lower():
            close_col = col
            break
    if not close_col:
        raise ValueError(f"No close column found for {etf}. Columns: {etf_data.columns}")

    # Rename only the date column so it's consistent for the merge
    if 'Date' in etf_data.columns:
        etf_data = etf_data.rename(columns={'Date': 'date'})
    etf_data['date'] = pd.to_datetime(etf_data['date'])
    df['date'] = pd.to_datetime(df['date'])

    # Merge the actual close column into our DataFrame under its native name
    df = df.merge(etf_data[['date', close_col]], on='date', how='left')

    # Create a return column based on that close column
    ret_col = f"{etf}_ret_5d"
    df[ret_col] = df[close_col].pct_change(5, fill_method=None)



[*********************100%***********************]  1 of 1 completed


XLK columns after reset_index: ['Date', 'Close_XLK', 'High_XLK', 'Low_XLK', 'Open_XLK', 'Volume_XLK']


[*********************100%***********************]  1 of 1 completed


XLF columns after reset_index: ['Date', 'Close_XLF', 'High_XLF', 'Low_XLF', 'Open_XLF', 'Volume_XLF']


[*********************100%***********************]  1 of 1 completed


XLV columns after reset_index: ['Date', 'Close_XLV', 'High_XLV', 'Low_XLV', 'Open_XLV', 'Volume_XLV']


[*********************100%***********************]  1 of 1 completed


XLY columns after reset_index: ['Date', 'Close_XLY', 'High_XLY', 'Low_XLY', 'Open_XLY', 'Volume_XLY']


[*********************100%***********************]  1 of 1 completed


XLC columns after reset_index: ['Date', 'Close_XLC', 'High_XLC', 'Low_XLC', 'Open_XLC', 'Volume_XLC']


[*********************100%***********************]  1 of 1 completed


XLI columns after reset_index: ['Date', 'Close_XLI', 'High_XLI', 'Low_XLI', 'Open_XLI', 'Volume_XLI']


[*********************100%***********************]  1 of 1 completed


XLE columns after reset_index: ['Date', 'Close_XLE', 'High_XLE', 'Low_XLE', 'Open_XLE', 'Volume_XLE']


[*********************100%***********************]  1 of 1 completed


XLRE columns after reset_index: ['Date', 'Close_XLRE', 'High_XLRE', 'Low_XLRE', 'Open_XLRE', 'Volume_XLRE']


[*********************100%***********************]  1 of 1 completed


XLP columns after reset_index: ['Date', 'Close_XLP', 'High_XLP', 'Low_XLP', 'Open_XLP', 'Volume_XLP']


[*********************100%***********************]  1 of 1 completed


XLB columns after reset_index: ['Date', 'Close_XLB', 'High_XLB', 'Low_XLB', 'Open_XLB', 'Volume_XLB']


[*********************100%***********************]  1 of 1 completed


XLU columns after reset_index: ['Date', 'Close_XLU', 'High_XLU', 'Low_XLU', 'Open_XLU', 'Volume_XLU']


## 4a. Feature Engineering for Each Ticker

We now engineer all technical indicators, momentum, volatility, and regime signals for each ticker. These are industry-standard predictors used in quant equity models.


In [None]:
frames = []
for ticker, group in df.groupby('final_ticker'):
    # Always sort by date, so all our rolling windows and shifts are aligned in time
    group = group.sort_values('date').reset_index(drop=True)

    # 1-day, 5-day, and 21-day returns for each ticker (basic momentum and mean reversion signals)
    group['return_1d'] = group['close'].pct_change(1)
    group['return_5d'] = group['close'].pct_change(5)
    group['return_21d'] = group['close'].pct_change(21)

    # 10-day price momentum (used a lot in cross-sectional quant models)
    group['momentum_10'] = group['close'] / group['close'].shift(10) - 1

    # Simple moving averages for trend direction—SMA 10 and SMA 20 are staples for short- and medium-term bias
    group['sma_10'] = group['close'].rolling(10).mean()
    group['sma_20'] = group['close'].rolling(20).mean()

    # Rolling volatility over 10 and 20 days; useful for regime filters or risk targeting
    group['vol_10'] = group['close'].rolling(10).std()
    group['vol_20'] = group['close'].rolling(20).std()

    # RSI-14: Classic overbought/oversold oscillator (using Wilder's smoothing)
    delta = group['close'].diff()
    up = delta.clip(lower=0)
    down = -delta.clip(upper=0)
    gain = up.rolling(14).mean()
    loss = down.rolling(14).mean()
    rs = gain / loss
    group['rsi_14'] = 100 - (100/(1+rs))

    # MACD: Trend-following momentum indicator—captures the difference between fast and slow EMAs
    group['ema_12'] = group['close'].ewm(span=12, adjust=False).mean()
    group['ema_26'] = group['close'].ewm(span=26, adjust=False).mean()
    group['macd'] = group['ema_12'] - group['ema_26']
    group['macd_signal'] = group['macd'].ewm(span=9, adjust=False).mean()

    # Rolling 10-day average volume, and "volume above average" as a relative activity signal
    group['vol_avg_10'] = group['px_volume'].rolling(10).mean()
    group['vol_above_avg'] = (group['px_volume'] - group['vol_avg_10']) / group['vol_avg_10']

    # Donchian channels: Highest high and lowest low over the last 20 days—breakout or range signals
    group['donchian_high_20'] = group['high'].rolling(20).max()
    group['donchian_low_20'] = group['low'].rolling(20).min()

    # True range: Captures the actual range of price movement, including gaps
    group['true_range'] = group[['high', 'low', 'close']].apply(
        lambda row: max(
            row['high'] - row['low'],
            abs(row['high'] - row['close']),
            abs(row['low'] - row['close'])
        ), axis=1
    )

    # Future 5-day return for supervised learning (this is our target)
    group['future_return_5d'] = group['close'].shift(-5) / group['close'] - 1

    # Excess return over SPY—measures if the ticker is outperforming the broad market over 5 days
    group['excess_return_5d'] = group['return_5d'] - group['close_spy'].pct_change(5)

    # Rolling beta and correlation vs. SPY for each ticker (measures sensitivity and co-movement with the market)
    stock_return = group['close'].pct_change()
    spy_return = group['close_spy'].pct_change()
    group['beta_60d'] = stock_return.rolling(60).cov(spy_return) / spy_return.rolling(60).var()
    group['corr_60d'] = stock_return.rolling(60).corr(spy_return)

    frames.append(group)

# Concatenate all tickers back into a single DataFrame
df_feat = pd.concat(frames, ignore_index=True)
print("Feature engineering complete. Columns:", df_feat.columns.tolist())


Feature engineering complete. Columns: ['date', 'final_ticker', 'open', 'high', 'low', 'close', 'px_volume', 'close_spy', 'vix_close', 'vix_return_1d', 'vix_vol_10', 'Close_XLK', 'XLK_ret_5d', 'Close_XLF', 'XLF_ret_5d', 'Close_XLV', 'XLV_ret_5d', 'Close_XLY', 'XLY_ret_5d', 'Close_XLC', 'XLC_ret_5d', 'Close_XLI', 'XLI_ret_5d', 'Close_XLE', 'XLE_ret_5d', 'Close_XLRE', 'XLRE_ret_5d', 'Close_XLP', 'XLP_ret_5d', 'Close_XLB', 'XLB_ret_5d', 'Close_XLU', 'XLU_ret_5d', 'return_1d', 'return_5d', 'return_21d', 'momentum_10', 'sma_10', 'sma_20', 'vol_10', 'vol_20', 'rsi_14', 'ema_12', 'ema_26', 'macd', 'macd_signal', 'vol_avg_10', 'vol_above_avg', 'donchian_high_20', 'donchian_low_20', 'true_range', 'future_return_5d', 'excess_return_5d', 'beta_60d', 'corr_60d']


## 4b. Clean Up Duplicate and Messy Columns

After merging in VIX and sector ETF data, we’re left with a bunch of duplicate or awkwardly named columns—mostly from how pandas and yfinance handle repeated merges. Before moving forward, we clean these up:

- Drop all redundant columns left over from merges (like `*_x`, `*_y`, and `Close_XLK`).
- Standardize the naming for all sector ETF close columns (so everything is consistently `XLK_close`, `XLF_close`, etc.).
- Double-check the final column list to make sure it’s tidy and ready for modeling.

This keeps our dataset organized and ensures our feature selection and modeling steps are using the right data.


In [None]:
# First, let's get rid of the duplicate columns left over from all our merges
# We'll keep the main vix_close and only one clean ETF close column for each sector
drop_cols = [
    'vix_close_x', 'vix_close_y',
    'XLK_close_x', 'XLK_close_y',
]

# Any other columns from yfinance that start with 'Close_' (like 'Close_XLK') can go too
drop_cols += [col for col in df_feat.columns if col.startswith('Close_')]

# Drop everything in our drop list if it actually made it into the DataFrame
df_feat = df_feat.drop(columns=[col for col in drop_cols if col in df_feat.columns])

# Now, make sure every ETF close column has the same naming style—makes it way easier to work with downstream
sector_etfs = ['XLK', 'XLF', 'XLV', 'XLY', 'XLC', 'XLI', 'XLE', 'XLRE', 'XLP', 'XLB', 'XLU']
for etf in sector_etfs:
    # Sometimes we have a weird mix of cases or suffixes, so find anything that could be the close for this ETF
    candidates = [col for col in df_feat.columns if (col.lower().endswith(f'{etf.lower()}_close') or col == f'Close_{etf}')]
    if candidates:
        df_feat = df_feat.rename(columns={candidates[0]: f'{etf}_close'})

# Last check—print out our column names so we can see at a glance that things are tidy
print("Columns after cleaning:", df_feat.columns.tolist())


Columns after cleaning: ['date', 'final_ticker', 'open', 'high', 'low', 'close', 'px_volume', 'close_spy', 'vix_close', 'vix_return_1d', 'vix_vol_10', 'XLK_ret_5d', 'XLF_ret_5d', 'XLV_ret_5d', 'XLY_ret_5d', 'XLC_ret_5d', 'XLI_ret_5d', 'XLE_ret_5d', 'XLRE_ret_5d', 'XLP_ret_5d', 'XLB_ret_5d', 'XLU_ret_5d', 'return_1d', 'return_5d', 'return_21d', 'momentum_10', 'sma_10', 'sma_20', 'vol_10', 'vol_20', 'rsi_14', 'ema_12', 'ema_26', 'macd', 'macd_signal', 'vol_avg_10', 'vol_above_avg', 'donchian_high_20', 'donchian_low_20', 'true_range', 'future_return_5d', 'excess_return_5d', 'beta_60d', 'corr_60d']


## 5. Pull and Merge yfinance Fundamentals

For each ticker, we grab core fundamental ratios and metadata. For SPY (ETF), all these values are set as missing since yfinance does not provide fundamental data for index ETFs.


In [None]:
import yfinance as yf

# 'sector_etfs' is already defined above, so we just use it here
tickers = sorted([
    t for t in df_feat['final_ticker'].unique()
    if t.isalpha() and len(t) <= 5 and t not in ['SPY'] + sector_etfs
])

all_info = []
for tkr in tickers:
    try:
        # Pull info from yfinance—ignore if there's a failure
        yf_ticker = yf.Ticker(tkr)
        info = yf_ticker.info
        all_info.append({
            "final_ticker": tkr,
            "market_cap": info.get("marketCap"),
            "trailing_pe": info.get("trailingPE"),
            "forward_pe": info.get("forwardPE"),
            "price_to_book": info.get("priceToBook"),
            "dividend_yield": info.get("dividendYield"),
            "beta": info.get("beta"),
            "sector": info.get("sector"),
            "industry": info.get("industry"),
        })
    except Exception as e:
        print(f"Couldn’t pull fundamentals for {tkr}: {e}")

df_fund = pd.DataFrame(all_info)
print("Pulled fundamentals for", len(df_fund), "tickers")

# Merge on ticker symbol
df_feat = df_feat.merge(df_fund, on='final_ticker', how='left')

# Set fundamental columns to NaN for SPY and all sector ETFs (these aren't real companies)
fundamental_cols = [
    "market_cap", "trailing_pe", "forward_pe", "price_to_book", "dividend_yield", "beta"
]
index_like = ['SPY'] + sector_etfs
for col in fundamental_cols:
    if col in df_feat.columns:
        df_feat.loc[df_feat['final_ticker'].isin(index_like), col] = np.nan

print("After merging in the fundamentals, here are our columns:", df_feat.columns.tolist())


ERROR:yfinance:HTTP Error 404: 
ERROR:yfinance:HTTP Error 404: 


Pulled fundamentals for 162 tickers
After merging in the fundamentals, here are our columns: ['date', 'final_ticker', 'open', 'high', 'low', 'close', 'px_volume', 'close_spy', 'vix_close', 'vix_return_1d', 'vix_vol_10', 'XLK_ret_5d', 'XLF_ret_5d', 'XLV_ret_5d', 'XLY_ret_5d', 'XLC_ret_5d', 'XLI_ret_5d', 'XLE_ret_5d', 'XLRE_ret_5d', 'XLP_ret_5d', 'XLB_ret_5d', 'XLU_ret_5d', 'return_1d', 'return_5d', 'return_21d', 'momentum_10', 'sma_10', 'sma_20', 'vol_10', 'vol_20', 'rsi_14', 'ema_12', 'ema_26', 'macd', 'macd_signal', 'vol_avg_10', 'vol_above_avg', 'donchian_high_20', 'donchian_low_20', 'true_range', 'future_return_5d', 'excess_return_5d', 'beta_60d', 'corr_60d', 'market_cap', 'trailing_pe', 'forward_pe', 'price_to_book', 'dividend_yield', 'beta', 'sector', 'industry']


## 6a. Drop Rows Missing Key Features or Target

Because some features like rolling statistics and fundamentals require historical data, certain rows will have missing values (NaNs).

To keep our modeling clean and reliable, we remove any rows that lack critical input features or the target variable. This step ensures the model only trains and evaluates on complete cases without data gaps.


In [None]:
# Rolling windows and missing fundamentals mean some rows are going to have NaNs
# We'll drop anything missing a key input or our target, so the model's always working with clean data

feature_cols = [
    'return_1d','return_5d','return_21d','momentum_10','sma_10','sma_20','vol_10','vol_20','rsi_14',
    'macd','macd_signal','vol_avg_10','vol_above_avg','donchian_high_20','donchian_low_20','true_range',
    'excess_return_5d','beta_60d','corr_60d',
    'vix_close','vix_return_1d','vix_vol_10'
]

# Add sector ETF close and return features
feature_cols += [
    c for c in df_feat.columns
    if any(c.startswith(etf) and (c.endswith('_close') or c.endswith('_ret_5d')) for etf in sector_etfs)
]

# Add any fundamentals that made it through
feature_cols += [c for c in fundamental_cols if c in df_feat.columns]

# Drop any rows missing one of our inputs or the target
df_feat = df_feat.dropna(subset=feature_cols + ['future_return_5d'])
print("Shape after dropping NaNs:", df_feat.shape)


Shape after dropping NaNs: (202771, 52)


## 6b. Save Final Engineered Dataset with Fundamentals

At this point, all feature engineering, merging of market data, sector ETFs, VIX, and fundamental metrics is complete.

Saving this comprehensive dataset to a CSV file ensures a stable, reproducible input for future modeling steps. It prevents the need to rerun costly data preparation and allows us to load consistent data for model training, testing, and analysis.


In [None]:
# Save the complete feature set including fundamentals for future modeling runs
df_feat.to_csv('engine_final_allfeatures_wfund.csv', index=False)
print("Saved full feature set with fundamentals to 'engine_final_allfeatures_wfund.csv'")


Saved full feature set with fundamentals to 'engine_final_allfeatures_wfund.csv'


## 7. XGBoost Feature Selection

To identify which features truly impact returns, we use XGBoost to rank them by importance. This method handles tabular data well and runs efficiently, helping us focus on the most predictive signals.

We train the model on individual stocks only (excluding SPY and all sector ETFs)to avoid bias. Any remaining missing values are filled with medians before training. The result is a clear ranking of top features driving the model’s performance.


In [None]:
import xgboost as xgb
from sklearn.model_selection import train_test_split

# Define all tickers to exclude from training (SPY and sector ETFs)
excluded_tickers = ['SPY', 'XLK', 'XLF', 'XLV', 'XLY', 'XLC', 'XLI', 'XLE', 'XLRE', 'XLP', 'XLB', 'XLU']

# Filter to keep only real stocks for training
train_df = df_feat[~df_feat['final_ticker'].isin(excluded_tickers)].copy()

# Keep only features that actually exist in the DataFrame
feature_cols_final = [c for c in feature_cols if c in train_df.columns]
X = train_df[feature_cols_final]
y = train_df['future_return_5d']

# XGBoost can't handle NaNs—fill with medians just in case
X = X.fillna(X.median())

# Standard train/validation split
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# XGBoost for feature importance—fast, robust, works well with tabular data
xgb_model = xgb.XGBRegressor(
    n_estimators=200,
    max_depth=6,
    learning_rate=0.1,
    n_jobs=-1,
    random_state=42
)
xgb_model.fit(X_train, y_train)

importances = pd.Series(xgb_model.feature_importances_, index=X.columns).sort_values(ascending=False)
print("Top features by XGBoost:", importances.head(20))


Top features by XGBoost: vix_close           0.066131
XLE_ret_5d          0.061586
vix_return_1d       0.055130
vix_vol_10          0.051415
XLK_ret_5d          0.047226
XLP_ret_5d          0.047152
XLF_ret_5d          0.041691
XLI_ret_5d          0.041051
XLY_ret_5d          0.039377
XLRE_ret_5d         0.039078
XLV_ret_5d          0.038666
XLU_ret_5d          0.035815
XLB_ret_5d          0.032433
XLC_ret_5d          0.032077
donchian_low_20     0.028617
return_21d          0.027415
momentum_10         0.025069
beta_60d            0.019131
excess_return_5d    0.018977
market_cap          0.018668
dtype: float32


## 8. Save and Report Top Features

We export the top 10 and top 20 features from XGBoost so we can use them for modeling, ablation tests, or documentation. Having these lists saved separately makes it easy to reproduce results, swap features, or compare different model runs.


In [None]:
# Save top 10 and top 20 features to disk for downstream runs or ablation testing
top10 = importances.head(10).index.tolist()
top20 = importances.head(20).index.tolist()
pd.Series(top10).to_csv('top10_features_xgb.csv', index=False)
pd.Series(top20).to_csv('top20_features_xgb.csv', index=False)

print("Top 10 features:", top10)
print("Top 20 features:", top20)


Top 10 features: ['vix_close', 'XLE_ret_5d', 'vix_return_1d', 'vix_vol_10', 'XLK_ret_5d', 'XLP_ret_5d', 'XLF_ret_5d', 'XLI_ret_5d', 'XLY_ret_5d', 'XLRE_ret_5d']
Top 20 features: ['vix_close', 'XLE_ret_5d', 'vix_return_1d', 'vix_vol_10', 'XLK_ret_5d', 'XLP_ret_5d', 'XLF_ret_5d', 'XLI_ret_5d', 'XLY_ret_5d', 'XLRE_ret_5d', 'XLV_ret_5d', 'XLU_ret_5d', 'XLB_ret_5d', 'XLC_ret_5d', 'donchian_low_20', 'return_21d', 'momentum_10', 'beta_60d', 'excess_return_5d', 'market_cap']


## 9. Feature Importance Summary

* XGBoost feature selection confirmed that most predictive power for 5-day returns comes from broad market regime indicators and sector-level momentum. VIX levels and returns, plus the 5-day returns of sector ETFs, dominated the rankings. This fits market intuition. Short-term moves are largely driven by overall volatility and sector flows rather than isolated stock factors.

* Stock-specific signals like rolling momentum, Donchian channel lows, and rolling beta still contribute meaningfully but play a secondary role. Market cap making the top 20 underscores persistent size-related differences.

* In short, macro and sector conditions are the primary drivers for short-term return prediction here. We will move forward using these top features for model building and training.
