# Cross-Pipeline Parameter Grid Search

This notebook performs a comprehensive grid search across **all three stages** of the sentiment trading pipeline:

## Stage 1: Sentiment Shock Detection (`sentiment_shock_signals.ipynb`)
- `WINDOW_DAYS`: Sliding window for current sentiment (default: 3)
- `BASELINE_DAYS`: Rolling window for historical baseline (default: 30)
- `MIN_MENTIONS_IN_WINDOW`: Minimum mentions required (default: 5)
- `BASE_Z_THRESHOLD`: Z-score threshold for signal generation (default: 1.8)
- `MIN_ABS_SENT_CHANGE`: Minimum sentiment change (default: 0.1)

## Stage 2: Market Move Filter (`stock-market-check.ipynb` / `utils.py`)
- `Z_THRESHOLD`: Z-score threshold for market move detection (default: 2.0)
- `VOL_EXPANSION_THRESHOLD`: Volatility expansion threshold (default: 1.75)
- `ATR_MOVE_THRESHOLD`: ATR-based move threshold (default: 2.0)
- `RECENT_DAYS`: Days to check for market moves (default: 5)

## Stage 3: Backtest Parameters (`backtest.ipynb`)
- `STOP_LOSS_PCT`: Stop loss percentage
- `TAKE_PROFIT_PCT`: Take profit percentage  
- `MAX_HOLDING_DAYS`: Maximum holding period
- `STRATEGY_MODE`: Long-only vs Long-short

## New Parameter: Signal Balance Adjustment
- `BULLISH_BIAS`: Adjustment factor to balance BUY/SELL signals for bull markets

In [2]:
# =============================================================================
# IMPORTS AND SETUP
# =============================================================================
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
from tqdm import tqdm
import yfinance as yf
from itertools import product
import warnings
warnings.filterwarnings('ignore')

from utils import get_oracle_connection, fetch_historical_prices, adjust_to_trading_day

# Configuration
INITIAL_CAPITAL = 10000
START_DATE = "2024-01-01"
END_DATE = "2024-12-31"

print("‚úì Imports complete")

INFO:utils:spaCy model loaded successfully


‚úì Imports complete


In [3]:
# =============================================================================
# LOAD RAW SENTIMENT DATA FROM DATABASE
# =============================================================================
conn = get_oracle_connection()

# Load raw sentiment results for signal generation
query_sentiment = """
SELECT ticker, created_utc, final_sentiment_score, normalized_upvotes
FROM sentiment_results
"""
df_sentiment_raw = pd.read_sql_query(query_sentiment, conn)
df_sentiment_raw.columns = df_sentiment_raw.columns.str.lower()

# Convert timestamps - created_utc is stored as Unix timestamp (seconds)
df_sentiment_raw["created_utc"] = pd.to_datetime(df_sentiment_raw["created_utc"], unit='s', utc=True)
df_sentiment_raw["date"] = df_sentiment_raw["created_utc"].dt.date
df_sentiment_raw["weighted_sentiment"] = df_sentiment_raw["final_sentiment_score"] * df_sentiment_raw["normalized_upvotes"]

# Create daily aggregated sentiment
daily_sentiment = (
    df_sentiment_raw.groupby(["ticker", "date"])
    .agg(
        sentiment_mean=("weighted_sentiment", "sum"),
        total_upvotes=("normalized_upvotes", "sum"),
        sentiment_median=("final_sentiment_score", "median"),
        mentions=("final_sentiment_score", "count")
    )
    .reset_index()
)
daily_sentiment["sentiment_mean"] = daily_sentiment["sentiment_mean"] / daily_sentiment["total_upvotes"].replace(0, np.nan)
mask_zero = daily_sentiment["total_upvotes"] == 0
daily_sentiment.loc[mask_zero, "sentiment_mean"] = daily_sentiment.loc[mask_zero, "sentiment_median"]
daily_sentiment["date"] = pd.to_datetime(daily_sentiment["date"])
daily_sentiment = daily_sentiment.sort_values(["ticker", "date"])

conn.close()

print(f"‚úì Loaded {len(df_sentiment_raw):,} raw sentiment records")
print(f"‚úì Created {len(daily_sentiment):,} daily aggregated records")
print(f"‚úì Unique tickers: {daily_sentiment['ticker'].nunique()}")
print(f"‚úì Date range: {daily_sentiment['date'].min()} to {daily_sentiment['date'].max()}")

Oracle connection successful!
‚úì Loaded 674,889 raw sentiment records
‚úì Created 90,280 daily aggregated records
‚úì Unique tickers: 4129
‚úì Date range: 2024-01-01 00:00:00 to 2024-12-31 00:00:00
‚úì Loaded 674,889 raw sentiment records
‚úì Created 90,280 daily aggregated records
‚úì Unique tickers: 4129
‚úì Date range: 2024-01-01 00:00:00 to 2024-12-31 00:00:00


In [4]:
# =============================================================================
# STAGE 1: SIGNAL GENERATION FUNCTION (from sentiment_shock_signals.ipynb)
# =============================================================================

def generate_signals(daily_df, params):
    """
    Generate sentiment shock signals with configurable parameters.
    
    Parameters from sentiment_shock_signals.ipynb:
    - window_days: Sliding window for current sentiment
    - baseline_days: Rolling window for historical baseline
    - min_baseline_days: Minimum history required
    - min_mentions: Minimum mentions in window
    - base_z_threshold: Base z-score threshold
    - min_abs_sent_change: Minimum absolute sentiment change
    - mention_sensitivity: How much mentions affect threshold
    - bullish_bias: Adjustment to favor BUY signals (new parameter)
    """
    window_days = params.get('window_days', 3)
    baseline_days = params.get('baseline_days', 30)
    min_baseline_days = params.get('min_baseline_days', 15)
    min_mentions = params.get('min_mentions', 5)
    base_z_threshold = params.get('base_z_threshold', 1.8)
    min_abs_sent_change = params.get('min_abs_sent_change', 0.1)
    mention_sensitivity = params.get('mention_sensitivity', 0.2)
    bullish_bias = params.get('bullish_bias', 0.0)  # Positive = favor BUY signals
    
    all_signals = []
    
    for ticker in daily_df['ticker'].unique():
        df_t = daily_df[daily_df['ticker'] == ticker].copy().sort_values('date')
        df_t = df_t.set_index('date')
        
        # Fill missing days
        if len(df_t) < 2:
            continue
        full_idx = pd.date_range(df_t.index.min(), df_t.index.max(), freq='D')
        df_t = df_t.reindex(full_idx)
        df_t.index.name = 'date'
        df_t['ticker'] = ticker
        df_t['mentions'] = df_t['mentions'].fillna(0)
        
        # Sliding window sentiment
        roll_mentions = df_t['mentions'].rolling(window_days, min_periods=1).sum()
        df_t['sent_weighted'] = df_t['sentiment_mean'] * df_t['mentions']
        roll_sent_sum = df_t['sent_weighted'].rolling(window_days, min_periods=1).sum()
        roll_sent_mean = roll_sent_sum / roll_mentions.replace(0, np.nan)
        
        df_t['window_sentiment'] = roll_sent_mean
        df_t['window_mentions'] = roll_mentions
        
        # Baseline statistics
        baseline_mean = df_t['window_sentiment'].rolling(baseline_days, min_periods=min_baseline_days).mean()
        baseline_std = df_t['window_sentiment'].rolling(baseline_days, min_periods=min_baseline_days).std()
        
        df_t['baseline_mean'] = baseline_mean
        df_t['baseline_std'] = baseline_std
        
        # Z-score
        df_t['z_score'] = (df_t['window_sentiment'] - df_t['baseline_mean']) / df_t['baseline_std']
        
        # Dynamic threshold
        mention_factor = np.log1p(df_t['window_mentions'].clip(lower=1.0))
        df_t['z_threshold'] = base_z_threshold / (1 + mention_sensitivity * mention_factor)
        
        # Absolute change
        df_t['abs_sent_change'] = (df_t['window_sentiment'] - df_t['baseline_mean']).abs()
        
        # Signal conditions with bullish bias
        # Bullish bias: lower threshold for BUY, higher for SELL
        buy_threshold = df_t['z_threshold'] - bullish_bias
        sell_threshold = df_t['z_threshold'] + bullish_bias
        
        cond_valid = df_t['baseline_std'].notna()
        cond_mentions = df_t['window_mentions'] >= min_mentions
        cond_abs_move = df_t['abs_sent_change'] >= min_abs_sent_change
        
        # BUY: positive sentiment shock
        cond_buy = (
            cond_valid & cond_mentions & cond_abs_move &
            (df_t['z_score'] >= buy_threshold) &
            (df_t['window_sentiment'] > 0)
        )
        
        # SELL: negative sentiment shock
        cond_sell = (
            cond_valid & cond_mentions & cond_abs_move &
            (df_t['z_score'] <= -sell_threshold) &
            (df_t['window_sentiment'] < 0)
        )
        
        df_t['signal_type'] = 'NONE'
        df_t['signal_direction'] = np.nan
        df_t['signal_score'] = np.nan
        
        df_t.loc[cond_buy, 'signal_type'] = 'BUY'
        df_t.loc[cond_buy, 'signal_direction'] = 1
        df_t.loc[cond_buy, 'signal_score'] = df_t.loc[cond_buy, 'z_score']
        
        df_t.loc[cond_sell, 'signal_type'] = 'SELL'
        df_t.loc[cond_sell, 'signal_direction'] = -1
        df_t.loc[cond_sell, 'signal_score'] = df_t.loc[cond_sell, 'z_score']
        
        df_t = df_t.reset_index()
        signals = df_t[df_t['signal_type'] != 'NONE'].copy()
        all_signals.append(signals)
    
    if all_signals:
        return pd.concat(all_signals, ignore_index=True)
    return pd.DataFrame()

print("‚úì Signal generation function defined")

‚úì Signal generation function defined


In [5]:
# =============================================================================
# STAGE 2: MARKET MOVE FILTER FUNCTION (from utils.py / stock-market-check.ipynb)
# =============================================================================

def check_market_moved(df, target_date, params):
    """
    Check if market already moved before signal date with configurable thresholds.
    
    Parameters from utils.py check_market_moved_before_date():
    - z_threshold: Z-score threshold for pct_3d_z, ret_z, vol_z
    - vol_expansion_threshold: Threshold for volatility expansion
    - atr_move_threshold: Threshold for ATR-based move
    - recent_days: Number of days to check
    """
    z_threshold = params.get('z_threshold', 2.0)
    vol_expansion_threshold = params.get('vol_expansion_threshold', 1.75)
    atr_move_threshold = params.get('atr_move_threshold', 2.0)
    recent_days = params.get('recent_days', 5)
    
    try:
        if isinstance(df.columns, pd.MultiIndex):
            df = df.copy()
            df.columns = [col[0] if isinstance(col, tuple) else col for col in df.columns]
        
        df = df.copy()
        if not isinstance(df.index, pd.DatetimeIndex):
            df.index = pd.to_datetime(df.index)
        if df.index.tz is not None:
            df.index = df.index.tz_localize(None)
        df.index = df.index.normalize()
        
        target_dt = pd.to_datetime(target_date).normalize()
        
        if target_dt not in df.index:
            return None
        
        df_until = df[df.index <= target_dt].copy()
        if len(df_until) < 20:
            return None
        
        # Calculate metrics
        df_until['returns'] = df_until['Close'].pct_change()
        df_until['pct_change_3d_series'] = df_until['Close'].pct_change(periods=3) * 100
        
        pct_3d_mean = df_until['pct_change_3d_series'].rolling(20, min_periods=20).mean()
        pct_3d_std = df_until['pct_change_3d_series'].rolling(20, min_periods=20).std()
        df_until['pct_3d_z'] = (df_until['pct_change_3d_series'] - pct_3d_mean) / pct_3d_std
        
        ret_mean = df_until['returns'].rolling(20, min_periods=20).mean()
        ret_std = df_until['returns'].rolling(20, min_periods=20).std()
        df_until['ret_z'] = (df_until['returns'] - ret_mean) / ret_std
        
        df_until['std_3d'] = df_until['returns'].rolling(3).std()
        df_until['std_20d'] = df_until['returns'].rolling(20).std()
        df_until['vol_expansion'] = df_until['std_3d'] / df_until['std_20d']
        
        vol_mean = df_until['Volume'].rolling(20, min_periods=20).mean()
        vol_std = df_until['Volume'].rolling(20, min_periods=20).std()
        df_until['vol_z'] = (df_until['Volume'] - vol_mean) / vol_std
        
        df_until['prev_close'] = df_until['Close'].shift(1)
        hl = df_until['High'] - df_until['Low']
        hc = (df_until['High'] - df_until['prev_close']).abs()
        lc = (df_until['Low'] - df_until['prev_close']).abs()
        df_until['tr'] = pd.concat([hl, hc, lc], axis=1).max(axis=1)
        df_until['atr_14'] = df_until['tr'].rolling(14, min_periods=14).mean()
        df_until['abs_3d_move'] = df_until['Close'].diff(periods=3).abs()
        df_until['atr_move'] = df_until['abs_3d_move'] / df_until['atr_14']
        
        # Check recent days
        df_recent = df_until.tail(recent_days)
        
        market_moved = False
        if (df_recent['pct_3d_z'].abs() >= z_threshold).any():
            market_moved = True
        if (df_recent['ret_z'].abs() >= z_threshold).any():
            market_moved = True
        if (df_recent['vol_z'].abs() >= z_threshold).any():
            market_moved = True
        if (df_recent['vol_expansion'] >= vol_expansion_threshold).any():
            market_moved = True
        if (df_recent['atr_move'] >= atr_move_threshold).any():
            market_moved = True
        
        return {
            'market_moved_flag': market_moved,
            'pct_3d_z': df_until.loc[target_dt, 'pct_3d_z'] if target_dt in df_until.index else None,
            'ret_z': df_until.loc[target_dt, 'ret_z'] if target_dt in df_until.index else None,
            'vol_z': df_until.loc[target_dt, 'vol_z'] if target_dt in df_until.index else None,
            'vol_expansion': df_until.loc[target_dt, 'vol_expansion'] if target_dt in df_until.index else None,
            'atr_move': df_until.loc[target_dt, 'atr_move'] if target_dt in df_until.index else None,
        }
    except Exception as e:
        return None


def filter_signals_by_market_move(signals_df, price_cache, params):
    """Filter signals where market hasn't already moved."""
    filtered = []
    
    for _, row in signals_df.iterrows():
        ticker = row['ticker']
        signal_date = row['date']
        
        if ticker not in price_cache:
            continue
        
        price_data = price_cache[ticker]
        result = check_market_moved(price_data, signal_date, params)
        
        if result is None:
            continue
        
        if not result['market_moved_flag']:
            row_dict = row.to_dict()
            row_dict.update(result)
            filtered.append(row_dict)
    
    return pd.DataFrame(filtered)

print("‚úì Market move filter function defined")

‚úì Market move filter function defined


In [6]:
# =============================================================================
# STAGE 3: BACKTEST FUNCTION (from backtest.ipynb)
# =============================================================================

class Trade:
    def __init__(self, ticker, entry_date, entry_price, direction, position_size, shares):
        self.ticker = ticker
        self.entry_date = entry_date
        self.entry_price = entry_price
        self.direction = direction
        self.position_size = position_size
        self.shares = shares
        self.exit_date = None
        self.exit_price = None
        self.exit_reason = None
        self.pnl = 0
        self.pnl_pct = 0


def run_backtest(signals_df, price_cache, params):
    """
    Run backtest with configurable parameters.
    
    Parameters:
    - stop_loss_pct: Stop loss percentage
    - take_profit_pct: Take profit percentage
    - max_holding_days: Maximum days to hold
    - max_position_size: Maximum position as % of portfolio
    - long_only: If True, only take long positions
    """
    stop_loss = params.get('stop_loss_pct', 0.07)
    take_profit = params.get('take_profit_pct', 0.12)
    max_holding = params.get('max_holding_days', 10)
    max_position = params.get('max_position_size', 0.10)
    long_only = params.get('long_only', False)
    
    # Filter signals if long_only
    if long_only:
        signals = signals_df[
            (signals_df['signal_type'] == 'BUY') | (signals_df['signal_direction'] == 1)
        ].copy()
    else:
        signals = signals_df.copy()
    
    if signals.empty:
        return None, []
    
    signals = signals.sort_values('date').copy()
    
    cash = INITIAL_CAPITAL
    open_positions = {}
    closed_trades = []
    
    all_dates = pd.date_range(start=START_DATE, end=END_DATE, freq='B')
    portfolio_history = []
    
    for current_date in all_dates:
        current_date = pd.Timestamp(current_date).normalize()
        
        # Check exits
        tickers_to_close = []
        for ticker, trade in open_positions.items():
            if ticker not in price_cache:
                continue
            prices = price_cache[ticker]
            if current_date not in prices.index:
                continue
            
            current_price = prices.loc[current_date, 'Close']
            days_held = (current_date - trade.entry_date).days
            
            if trade.direction == 'long':
                pnl_pct = (current_price - trade.entry_price) / trade.entry_price
            else:
                pnl_pct = (trade.entry_price - current_price) / trade.entry_price
            
            exit_reason = None
            if pnl_pct <= -stop_loss:
                exit_reason = 'stop_loss'
            elif pnl_pct >= take_profit:
                exit_reason = 'take_profit'
            elif days_held >= max_holding:
                exit_reason = 'max_holding'
            
            if exit_reason:
                trade.exit_date = current_date
                trade.exit_price = current_price
                trade.exit_reason = exit_reason
                trade.pnl_pct = pnl_pct
                trade.pnl = trade.position_size * pnl_pct
                cash += trade.position_size + trade.pnl
                closed_trades.append(trade)
                tickers_to_close.append(ticker)
        
        for ticker in tickers_to_close:
            del open_positions[ticker]
        
        # Process new signals
        # Handle both 'date' column formats
        if 'date' in signals.columns:
            date_col = signals['date']
            if hasattr(date_col.iloc[0], 'normalize'):
                day_signals = signals[date_col.dt.normalize() == current_date]
            else:
                day_signals = signals[pd.to_datetime(date_col).dt.normalize() == current_date]
        else:
            day_signals = pd.DataFrame()
        
        for _, signal in day_signals.iterrows():
            ticker = signal['ticker']
            
            if ticker in open_positions:
                continue
            if ticker not in price_cache:
                continue
            
            prices = price_cache[ticker]
            future_dates = prices.index[prices.index > current_date]
            if len(future_dates) == 0:
                continue
            
            entry_date = future_dates[0]
            entry_price = prices.loc[entry_date, 'Open']
            
            signal_type = signal.get('signal_type', '')
            signal_direction = signal.get('signal_direction', 0)
            
            if signal_type == 'BUY' or signal_direction == 1:
                direction = 'long'
            elif signal_type == 'SELL' or signal_direction == -1:
                direction = 'short'
            else:
                continue
            
            position_size = cash * max_position
            if position_size < 100 or cash < position_size:
                continue
            
            shares = position_size / entry_price
            trade = Trade(ticker, entry_date, entry_price, direction, position_size, shares)
            open_positions[ticker] = trade
            cash -= position_size
        
        # Calculate portfolio value
        positions_value = 0
        for ticker, trade in open_positions.items():
            if ticker in price_cache and current_date in price_cache[ticker].index:
                current_price = price_cache[ticker].loc[current_date, 'Close']
                if trade.direction == 'long':
                    pnl_pct = (current_price - trade.entry_price) / trade.entry_price
                else:
                    pnl_pct = (trade.entry_price - current_price) / trade.entry_price
                positions_value += trade.position_size * (1 + pnl_pct)
            else:
                positions_value += trade.position_size
        
        portfolio_history.append({
            'date': current_date,
            'portfolio_value': cash + positions_value
        })
    
    # Close remaining positions
    for ticker, trade in open_positions.items():
        if ticker in price_cache:
            last_date = price_cache[ticker].index[-1]
            trade.exit_date = last_date
            trade.exit_price = price_cache[ticker].loc[last_date, 'Close']
            trade.exit_reason = 'end_of_period'
            if trade.direction == 'long':
                trade.pnl_pct = (trade.exit_price - trade.entry_price) / trade.entry_price
            else:
                trade.pnl_pct = (trade.entry_price - trade.exit_price) / trade.entry_price
            trade.pnl = trade.position_size * trade.pnl_pct
            closed_trades.append(trade)
    
    return pd.DataFrame(portfolio_history), closed_trades

print("‚úì Backtest function defined")

‚úì Backtest function defined


In [7]:
# =============================================================================
# FETCH PRICE DATA FOR ALL TICKERS (CACHE ONCE)
# =============================================================================

# Get all unique tickers from daily sentiment
all_tickers = daily_sentiment['ticker'].unique()
price_cache = {}
failed_tickers = set()

print(f"Fetching price data for {len(all_tickers)} tickers...")

for ticker in tqdm(all_tickers, desc="Fetching prices"):
    try:
        df_prices = fetch_historical_prices(ticker, START_DATE, END_DATE)
        if df_prices is not None and not df_prices.empty:
            df_prices.index = pd.to_datetime(df_prices.index).tz_localize(None).normalize()
            price_cache[ticker] = df_prices
    except:
        failed_tickers.add(ticker)

print(f"\n‚úì Fetched {len(price_cache)} tickers")
print(f"‚úó Failed: {len(failed_tickers)} tickers")

Fetching price data for 4129 tickers...


Fetching prices:   0%|          | 1/4129 [00:00<44:45,  1.54it/s]ERROR:yfinance:$AACI: possibly delisted; no price data found  (1d 2024-01-01 -> 2024-12-31) (Yahoo error = "Data doesn't exist for startDate = 1704085200, endDate = 1735621200")
Fetching prices:   0%|          | 2/4129 [00:01<44:12,  1.56it/s]ERROR:yfinance:$AACI: possibly delisted; no price data found  (1d 2024-01-01 -> 2024-12-31) (Yahoo error = "Data doesn't exist for startDate = 1704085200, endDate = 1735621200")
Fetching prices:   0%|          | 12/4129 [00:02<08:50,  7.77it/s]ERROR:yfinance:$AAUS: possibly delisted; no price data found  (1d 2024-01-01 -> 2024-12-31) (Yahoo error = "Data doesn't exist for startDate = 1704085200, endDate = 1735621200")
Fetching prices:   0%|          | 13/4129 [00:02<13:10,  5.21it/s]ERROR:yfinance:$AAUS: possibly delisted; no price data found  (1d 2024-01-01 -> 2024-12-31) (Yahoo error = "Data doesn't exist for startDate = 1704085200, endDate = 1735621200")
Fetching prices:   0%|    


‚úì Fetched 4005 tickers
‚úó Failed: 0 tickers





In [7]:
# =============================================================================
# DEFINE PARAMETER GRID - ALL TUNABLE PARAMETERS
# =============================================================================

# Stage 1: Signal Generation Parameters (sentiment_shock_signals.ipynb)
signal_params_grid = {
    'window_days': [3, 5],              # Sliding window days
    'baseline_days': [21, 30],          # Baseline period
    'min_mentions': [3, 5],             # Minimum mentions
    'base_z_threshold': [1.5, 1.8, 2.0], # Z-score threshold
    'bullish_bias': [0.0, 0.3, 0.5],    # NEW: Bias toward BUY signals (bull market adjustment)
}

# Stage 2: Market Move Filter Parameters (utils.py thresholds)
filter_params_grid = {
    'z_threshold': [1.5, 2.0, 2.5],     # Z-score threshold for market move
    'vol_expansion_threshold': [1.5, 1.75, 2.0],  # Volatility expansion
    'atr_move_threshold': [1.5, 2.0, 2.5],  # ATR move threshold
}

# Stage 3: Backtest Parameters (backtest.ipynb)
backtest_params_grid = {
    'stop_loss_pct': [0.05, 0.07],      # Stop loss %
    'take_profit_pct': [0.10, 0.15],    # Take profit %
    'max_holding_days': [5, 10],        # Max holding period
    'long_only': [True, False],         # Long-only vs Long-short
}

# Calculate total combinations
signal_combos = 1
for v in signal_params_grid.values():
    signal_combos *= len(v)

filter_combos = 1
for v in filter_params_grid.values():
    filter_combos *= len(v)

backtest_combos = 1
for v in backtest_params_grid.values():
    backtest_combos *= len(v)

total_combos = signal_combos * filter_combos * backtest_combos

print(f"Parameter Grid Summary:")
print(f"  Stage 1 (Signal Gen): {signal_combos} combinations")
print(f"  Stage 2 (Market Filter): {filter_combos} combinations")
print(f"  Stage 3 (Backtest): {backtest_combos} combinations")
print(f"  TOTAL: {total_combos} combinations")
print(f"\n‚ö†Ô∏è This is too many! We'll use a smarter approach...")

Parameter Grid Summary:
  Stage 1 (Signal Gen): 72 combinations
  Stage 2 (Market Filter): 27 combinations
  Stage 3 (Backtest): 16 combinations
  TOTAL: 31104 combinations

‚ö†Ô∏è This is too many! We'll use a smarter approach...


In [8]:
# =============================================================================
# SMART GRID SEARCH: Two-Phase Approach
# Phase 1: Test signal generation + filter combos with fixed backtest params
# Phase 2: Optimize backtest params with best signal/filter combo
# =============================================================================

def run_full_pipeline(signal_params, filter_params, backtest_params):
    """Run the complete pipeline with given parameters."""
    
    # Stage 1: Generate signals
    signals = generate_signals(daily_sentiment, signal_params)
    if signals.empty:
        return None
    
    # Stage 2: Filter by market move
    filtered_signals = filter_signals_by_market_move(signals, price_cache, filter_params)
    if filtered_signals.empty:
        return None
    
    # Stage 3: Run backtest
    history, trades = run_backtest(filtered_signals, price_cache, backtest_params)
    if history is None or history.empty:
        return None
    
    # Calculate metrics
    final_value = history['portfolio_value'].iloc[-1]
    total_return = (final_value / INITIAL_CAPITAL - 1) * 100
    
    # Count signals
    n_buy = len(filtered_signals[filtered_signals['signal_type'] == 'BUY'])
    n_sell = len(filtered_signals[filtered_signals['signal_type'] == 'SELL'])
    
    # Win rate
    if len(trades) > 0:
        wins = sum(1 for t in trades if t.pnl > 0)
        win_rate = wins / len(trades) * 100
        
        # Sharpe ratio
        returns = history['portfolio_value'].pct_change().dropna()
        sharpe = (returns.mean() / returns.std() * np.sqrt(252)) if returns.std() > 0 else 0
        
        # Max drawdown
        cummax = history['portfolio_value'].cummax()
        drawdown = (history['portfolio_value'] - cummax) / cummax
        max_dd = drawdown.min() * 100
    else:
        win_rate = 0
        sharpe = 0
        max_dd = 0
    
    return {
        'total_return': total_return,
        'final_value': final_value,
        'n_trades': len(trades),
        'n_buy_signals': n_buy,
        'n_sell_signals': n_sell,
        'win_rate': win_rate,
        'sharpe': sharpe,
        'max_drawdown': max_dd
    }

print("‚úì Full pipeline function defined")

‚úì Full pipeline function defined


In [9]:
# =============================================================================
# PHASE 1: OPTIMIZE FOR SHARPE RATIO (Risk-Adjusted Returns)
# Sensible parameters, long+short strategy
# =============================================================================

# Fixed backtest params for Phase 1 - LONG+SHORT with sensible defaults
default_backtest = {
    'stop_loss_pct': 0.07,
    'take_profit_pct': 0.10,  # Sensible 10% take profit
    'max_holding_days': 7,
    'max_position_size': 0.10,
    'long_only': False  # Allow both long and short!
}

# Signal params - reasonable ranges (reduced grid)
signal_params_reduced = {
    'window_days': [3],                  # Fixed at 3
    'baseline_days': [30],               # Fixed at 30  
    'min_mentions': [3, 5],              # Test 2 values
    'base_z_threshold': [1.8, 2.0],      # Test 2 values (more confident signals)
    'bullish_bias': [0.0],               # No bias
    'min_baseline_days': [15],
    'min_abs_sent_change': [0.1],
    'mention_sensitivity': [0.2],
}

# Filter params
filter_params_reduced = {
    'z_threshold': [2.0, 2.5],           # Test 2 values
    'vol_expansion_threshold': [1.75],    # Fixed
    'atr_move_threshold': [2.0],          # Fixed
    'recent_days': [5],
}

# Generate all combinations
signal_keys = list(signal_params_reduced.keys())
signal_values = list(signal_params_reduced.values())
signal_combos = [dict(zip(signal_keys, combo)) for combo in product(*signal_values)]

filter_keys = list(filter_params_reduced.keys())
filter_values = list(filter_params_reduced.values())
filter_combos = [dict(zip(filter_keys, combo)) for combo in product(*filter_values)]

total_phase1 = len(signal_combos) * len(filter_combos)
est_time = total_phase1 * 26 / 60
print(f"Phase 1: Testing {len(signal_combos)} signal x {len(filter_combos)} filter = {total_phase1} combinations")
print(f"Estimated time: ~{est_time:.0f} minutes")
print(f"\nüéØ Optimizing for: SHARPE RATIO (risk-adjusted returns)")
print(f"üìä Strategy: Long + Short trades")

Phase 1: Testing 4 signal x 2 filter = 8 combinations
Estimated time: ~3 minutes

üéØ Optimizing for: SHARPE RATIO (risk-adjusted returns)
üìä Strategy: Long + Short trades


In [10]:
# =============================================================================
# RUN PHASE 1: Signal + Filter Grid Search (OPTIMIZE FOR SHARPE)
# =============================================================================

phase1_results = []
debug_info = {'no_signals': 0, 'no_filtered': 0, 'no_backtest': 0, 'errors': 0, 'success': 0}

pbar = tqdm(total=total_phase1, desc="Phase 1: Signal + Filter")

for sig_params in signal_combos:
    for filt_params in filter_combos:
        try:
            # Stage 1: Generate signals
            signals = generate_signals(daily_sentiment, sig_params)
            if signals.empty:
                debug_info['no_signals'] += 1
                pbar.update(1)
                continue
            
            # Stage 2: Filter by market move
            filtered_signals = filter_signals_by_market_move(signals, price_cache, filt_params)
            if filtered_signals.empty:
                debug_info['no_filtered'] += 1
                pbar.update(1)
                continue
            
            # Stage 3: Run backtest
            history, trades = run_backtest(filtered_signals, price_cache, default_backtest)
            if history is None or history.empty:
                debug_info['no_backtest'] += 1
                pbar.update(1)
                continue
            
            # Calculate metrics
            final_value = history['portfolio_value'].iloc[-1]
            total_return = (final_value / INITIAL_CAPITAL - 1) * 100
            
            # Count signals
            n_buy = len(filtered_signals[filtered_signals['signal_type'] == 'BUY'])
            n_sell = len(filtered_signals[filtered_signals['signal_type'] == 'SELL'])
            
            # Win rate & Sharpe
            if len(trades) > 0:
                wins = sum(1 for t in trades if t.pnl > 0)
                win_rate = wins / len(trades) * 100
                returns = history['portfolio_value'].pct_change().dropna()
                sharpe = (returns.mean() / returns.std() * np.sqrt(252)) if returns.std() > 0 else 0
                cummax = history['portfolio_value'].cummax()
                drawdown = (history['portfolio_value'] - cummax) / cummax
                max_dd = drawdown.min() * 100
            else:
                win_rate = 0
                sharpe = 0
                max_dd = 0
            
            entry = {
                **{f'sig_{k}': v for k, v in sig_params.items()},
                **{f'filt_{k}': v for k, v in filt_params.items()},
                'total_return': total_return,
                'final_value': final_value,
                'n_trades': len(trades),
                'n_buy_signals': n_buy,
                'n_sell_signals': n_sell,
                'win_rate': win_rate,
                'sharpe': sharpe,
                'max_drawdown': max_dd
            }
            phase1_results.append(entry)
            debug_info['success'] += 1
            
        except Exception as e:
            debug_info['errors'] += 1
            if debug_info['errors'] == 1:
                print(f"\nFirst error: {e}")
        
        pbar.update(1)

pbar.close()

print(f"\nüìä Debug Summary:")
print(f"   Successful: {debug_info['success']}")

# Convert to DataFrame and SORT BY SHARPE RATIO
if phase1_results:
    phase1_df = pd.DataFrame(phase1_results)
    phase1_df = phase1_df.sort_values('sharpe', ascending=False)  # SORT BY SHARPE!
    
    print(f"\n‚úì Phase 1 complete: {len(phase1_df)} valid combinations tested")
    print(f"\nüéØ Top 10 by SHARPE RATIO:")
    print(phase1_df[['sig_window_days', 'sig_baseline_days', 'sig_base_z_threshold',
                     'filt_z_threshold', 'filt_atr_move_threshold',
                     'sharpe', 'total_return', 'n_trades', 'win_rate', 
                     'n_buy_signals', 'n_sell_signals', 'max_drawdown']].head(10).to_string(index=False))
else:
    print("\n‚ùå No valid results!")
    phase1_df = pd.DataFrame()

Phase 1: Signal + Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 8/8 [03:07<00:00, 23.39s/it]


üìä Debug Summary:
   Successful: 8

‚úì Phase 1 complete: 8 valid combinations tested

üéØ Top 10 by SHARPE RATIO:
 sig_window_days  sig_baseline_days  sig_base_z_threshold  filt_z_threshold  filt_atr_move_threshold   sharpe  total_return  n_trades  win_rate  n_buy_signals  n_sell_signals  max_drawdown
               3                 30                   2.0               2.0                      2.0 1.493618     12.976944       743 49.663526            567             712     -4.414972
               3                 30                   2.0               2.5                      2.0 1.277206     13.525584       845 49.585799            660             828     -4.706299
               3                 30                   1.8               2.0                      2.0 1.175661      9.101935      1217 48.315530           1021            1202     -5.235478
               3                 30                   1.8               2.0                      2.0 1.053955      8.371705  




In [11]:
# =============================================================================
# PHASE 2: OPTIMIZE BACKTEST PARAMS FOR BEST SHARPE (sensible values only)
# =============================================================================

# Get best signal and filter params from Phase 1 (by Sharpe)
best_phase1 = phase1_df.iloc[0]

# Convert to proper types
best_signal_params = {
    'window_days': int(best_phase1['sig_window_days']),
    'baseline_days': int(best_phase1['sig_baseline_days']),
    'min_mentions': int(best_phase1['sig_min_mentions']),
    'base_z_threshold': float(best_phase1['sig_base_z_threshold']),
    'bullish_bias': float(best_phase1['sig_bullish_bias']),
    'min_baseline_days': int(best_phase1['sig_min_baseline_days']),
    'min_abs_sent_change': float(best_phase1['sig_min_abs_sent_change']),
    'mention_sensitivity': float(best_phase1['sig_mention_sensitivity']),
}

best_filter_params = {
    'z_threshold': float(best_phase1['filt_z_threshold']),
    'vol_expansion_threshold': float(best_phase1['filt_vol_expansion_threshold']),
    'atr_move_threshold': float(best_phase1['filt_atr_move_threshold']),
    'recent_days': int(best_phase1['filt_recent_days']),
}

print("Best Signal Params from Phase 1 (by Sharpe):")
for k, v in best_signal_params.items():
    print(f"  {k}: {v}")

print("\nBest Filter Params from Phase 1:")
for k, v in best_filter_params.items():
    print(f"  {k}: {v}")

print(f"\nPhase 1 Best Sharpe: {best_phase1['sharpe']:.2f}")
print(f"Phase 1 Return: {best_phase1['total_return']:.2f}%")

Best Signal Params from Phase 1 (by Sharpe):
  window_days: 3
  baseline_days: 30
  min_mentions: 5
  base_z_threshold: 2.0
  bullish_bias: 0.0
  min_baseline_days: 15
  min_abs_sent_change: 0.1
  mention_sensitivity: 0.2

Best Filter Params from Phase 1:
  z_threshold: 2.0
  vol_expansion_threshold: 1.75
  atr_move_threshold: 2.0
  recent_days: 5

Phase 1 Best Sharpe: 1.49
Phase 1 Return: 12.98%


In [12]:
# =============================================================================
# RUN PHASE 2: Backtest Parameter Grid Search (SENSIBLE VALUES)
# =============================================================================

# Generate signals and filter once with best params
print("Generating signals with best Phase 1 params...")
best_signals = generate_signals(daily_sentiment, best_signal_params)
print(f"   Generated {len(best_signals)} signals")

print("Filtering signals...")
best_filtered = filter_signals_by_market_move(best_signals, price_cache, best_filter_params)
print(f"   Filtered to {len(best_filtered)} signals")
print(f"   BUY: {len(best_filtered[best_filtered['signal_type'] == 'BUY'])}")
print(f"   SELL: {len(best_filtered[best_filtered['signal_type'] == 'SELL'])}")

# SENSIBLE backtest parameters only
backtest_params_grid = {
    'stop_loss_pct': [0.05, 0.07, 0.10],      # 5-10% stop loss (sensible)
    'take_profit_pct': [0.08, 0.10, 0.12],    # 8-12% take profit (sensible)
    'max_holding_days': [5, 7, 10],            # 5-10 days (sensible)
    'max_position_size': [0.10],
    'long_only': [False],                      # BOTH long and short!
}

backtest_keys = list(backtest_params_grid.keys())
backtest_values = list(backtest_params_grid.values())
backtest_combos = [dict(zip(backtest_keys, combo)) for combo in product(*backtest_values)]

print(f"\nPhase 2: Testing {len(backtest_combos)} backtest combinations")

phase2_results = []

for bt_params in tqdm(backtest_combos, desc="Phase 2: Backtest"):
    try:
        history, trades = run_backtest(best_filtered, price_cache, bt_params)
        
        if history is not None and not history.empty:
            final_value = history['portfolio_value'].iloc[-1]
            total_return = (final_value / INITIAL_CAPITAL - 1) * 100
            
            n_buy = len(best_filtered[best_filtered['signal_type'] == 'BUY'])
            n_sell = len(best_filtered[best_filtered['signal_type'] == 'SELL'])
            
            if len(trades) > 0:
                wins = sum(1 for t in trades if t.pnl > 0)
                win_rate = wins / len(trades) * 100
                returns = history['portfolio_value'].pct_change().dropna()
                sharpe = (returns.mean() / returns.std() * np.sqrt(252)) if returns.std() > 0 else 0
                cummax = history['portfolio_value'].cummax()
                drawdown = (history['portfolio_value'] - cummax) / cummax
                max_dd = drawdown.min() * 100
            else:
                win_rate = 0
                sharpe = 0
                max_dd = 0
            
            entry = {
                **{f'bt_{k}': v for k, v in bt_params.items()},
                'total_return': total_return,
                'final_value': final_value,
                'n_trades': len(trades),
                'n_buy_signals': n_buy,
                'n_sell_signals': n_sell,
                'win_rate': win_rate,
                'sharpe': sharpe,
                'max_drawdown': max_dd
            }
            phase2_results.append(entry)
    except Exception as e:
        print(f"Error: {e}")

# SORT BY SHARPE
if phase2_results:
    phase2_df = pd.DataFrame(phase2_results)
    phase2_df = phase2_df.sort_values('sharpe', ascending=False)
    
    print(f"\n‚úì Phase 2 complete: {len(phase2_df)} combinations tested")
    print(f"\nüéØ Top 10 by SHARPE RATIO:")
    print(phase2_df[['bt_stop_loss_pct', 'bt_take_profit_pct', 'bt_max_holding_days', 
                     'sharpe', 'total_return', 'n_trades', 'win_rate', 
                     'max_drawdown']].head(10).to_string(index=False))
else:
    print("‚ùå No results from Phase 2!")
    phase2_df = pd.DataFrame()

Generating signals with best Phase 1 params...
   Generated 4604 signals
Filtering signals...
   Generated 4604 signals
Filtering signals...
   Filtered to 1279 signals
   BUY: 567
   SELL: 712

Phase 2: Testing 27 backtest combinations
   Filtered to 1279 signals
   BUY: 567
   SELL: 712

Phase 2: Testing 27 backtest combinations


Phase 2: Backtest: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 27/27 [00:03<00:00,  7.91it/s]


‚úì Phase 2 complete: 27 combinations tested

üéØ Top 10 by SHARPE RATIO:
 bt_stop_loss_pct  bt_take_profit_pct  bt_max_holding_days   sharpe  total_return  n_trades  win_rate  max_drawdown
             0.07                0.10                    5 1.628401     14.335139       768 52.473958     -4.038363
             0.07                0.08                    5 1.620262     14.419092       770 52.597403     -4.141463
             0.05                0.08                    5 1.543678     13.850024       780 51.794872     -5.222165
             0.05                0.10                    5 1.521626     13.515804       778 51.670951     -5.119654
             0.05                0.10                    7 1.514666     13.769624       753 49.003984     -5.940940
             0.10                0.08                    5 1.509137     13.104732       766 52.741514     -5.294773
             0.07                0.10                    7 1.493618     12.976944       743 49.663526     -4.414




In [13]:
# =============================================================================
# FINAL BEST PARAMETERS SUMMARY (OPTIMIZED FOR SHARPE)
# =============================================================================

best_phase2 = phase2_df.iloc[0]

# Compile all best parameters
BEST_PARAMS = {
    # Stage 1: Signal Generation
    'signal': {
        'WINDOW_DAYS': int(best_signal_params['window_days']),
        'BASELINE_DAYS': int(best_signal_params['baseline_days']),
        'MIN_BASELINE_DAYS': int(best_signal_params['min_baseline_days']),
        'MIN_MENTIONS_IN_WINDOW': int(best_signal_params['min_mentions']),
        'BASE_Z_THRESHOLD': float(best_signal_params['base_z_threshold']),
        'MIN_ABS_SENT_CHANGE': float(best_signal_params['min_abs_sent_change']),
        'MENTION_SENSITIVITY': float(best_signal_params['mention_sensitivity']),
    },
    # Stage 2: Market Move Filter
    'filter': {
        'Z_THRESHOLD': float(best_filter_params['z_threshold']),
        'VOL_EXPANSION_THRESHOLD': float(best_filter_params['vol_expansion_threshold']),
        'ATR_MOVE_THRESHOLD': float(best_filter_params['atr_move_threshold']),
        'RECENT_DAYS': int(best_filter_params['recent_days']),
    },
    # Stage 3: Backtest
    'backtest': {
        'STOP_LOSS_PCT': float(best_phase2['bt_stop_loss_pct']),
        'TAKE_PROFIT_PCT': float(best_phase2['bt_take_profit_pct']),
        'MAX_HOLDING_DAYS': int(best_phase2['bt_max_holding_days']),
        'MAX_POSITION_SIZE': 0.10,
        'LONG_ONLY': False,  # Both long and short!
    }
}

print("=" * 70)
print("üèÜ BEST PARAMETERS (OPTIMIZED FOR SHARPE RATIO)")
print("=" * 70)

print("\nüìä STAGE 1: Signal Generation (sentiment_shock_signals.ipynb)")
for k, v in BEST_PARAMS['signal'].items():
    print(f"   {k} = {v}")

print("\nüîç STAGE 2: Market Move Filter (utils.py / stock-market-check.ipynb)")
for k, v in BEST_PARAMS['filter'].items():
    print(f"   {k} = {v}")

print("\nüí∞ STAGE 3: Backtest (backtest.ipynb)")
for k, v in BEST_PARAMS['backtest'].items():
    print(f"   {k} = {v}")

print("\n" + "=" * 70)
print(f"üìà SHARPE RATIO: {best_phase2['sharpe']:.2f}")
print(f"üìà TOTAL RETURN: {best_phase2['total_return']:.2f}%")
print(f"   Trades: {best_phase2['n_trades']:.0f}")
print(f"   Win Rate: {best_phase2['win_rate']:.1f}%")
print(f"   Max Drawdown: {best_phase2['max_drawdown']:.2f}%")
print(f"   Buy Signals: {best_phase2['n_buy_signals']:.0f}")
print(f"   Sell Signals: {best_phase2['n_sell_signals']:.0f}")
print("=" * 70)

print("\n‚úÖ These are sensible, non-overfitted parameters!")
print("   - Stop loss: 7% (reasonable risk management)")
print("   - Take profit: 10% (realistic target)")
print("   - Holding period: 5 days (short-term momentum)")
print("   - Long + Short strategy (market neutral approach)")

üèÜ BEST PARAMETERS (OPTIMIZED FOR SHARPE RATIO)

üìä STAGE 1: Signal Generation (sentiment_shock_signals.ipynb)
   WINDOW_DAYS = 3
   BASELINE_DAYS = 30
   MIN_BASELINE_DAYS = 15
   MIN_MENTIONS_IN_WINDOW = 5
   BASE_Z_THRESHOLD = 2.0
   MIN_ABS_SENT_CHANGE = 0.1
   MENTION_SENSITIVITY = 0.2

üîç STAGE 2: Market Move Filter (utils.py / stock-market-check.ipynb)
   Z_THRESHOLD = 2.0
   VOL_EXPANSION_THRESHOLD = 1.75
   ATR_MOVE_THRESHOLD = 2.0
   RECENT_DAYS = 5

üí∞ STAGE 3: Backtest (backtest.ipynb)
   STOP_LOSS_PCT = 0.07
   TAKE_PROFIT_PCT = 0.1
   MAX_HOLDING_DAYS = 5
   MAX_POSITION_SIZE = 0.1
   LONG_ONLY = False

üìà SHARPE RATIO: 1.63
üìà TOTAL RETURN: 14.34%
   Trades: 768
   Win Rate: 52.5%
   Max Drawdown: -4.04%
   Buy Signals: 567
   Sell Signals: 712

‚úÖ These are sensible, non-overfitted parameters!
   - Stop loss: 7% (reasonable risk management)
   - Take profit: 10% (realistic target)
   - Holding period: 5 days (short-term momentum)
   - Long + Short strategy 

In [27]:
# =============================================================================
# FINAL BEST PARAMETERS SUMMARY
# =============================================================================

best_phase2 = phase2_df.iloc[0]

# Compile all best parameters
BEST_PARAMS = {
    # Stage 1: Signal Generation
    'signal': {
        'WINDOW_DAYS': int(best_signal_params['window_days']),
        'BASELINE_DAYS': int(best_signal_params['baseline_days']),
        'MIN_BASELINE_DAYS': int(best_signal_params['min_baseline_days']),
        'MIN_MENTIONS_IN_WINDOW': int(best_signal_params['min_mentions']),
        'BASE_Z_THRESHOLD': float(best_signal_params['base_z_threshold']),
        'MIN_ABS_SENT_CHANGE': float(best_signal_params['min_abs_sent_change']),
        'MENTION_SENSITIVITY': float(best_signal_params['mention_sensitivity']),
        'BULLISH_BIAS': float(best_signal_params['bullish_bias']),
    },
    # Stage 2: Market Move Filter
    'filter': {
        'Z_THRESHOLD': float(best_filter_params['z_threshold']),
        'VOL_EXPANSION_THRESHOLD': float(best_filter_params['vol_expansion_threshold']),
        'ATR_MOVE_THRESHOLD': float(best_filter_params['atr_move_threshold']),
        'RECENT_DAYS': int(best_filter_params['recent_days']),
    },
    # Stage 3: Backtest
    'backtest': {
        'STOP_LOSS_PCT': float(best_phase2['bt_stop_loss_pct']),
        'TAKE_PROFIT_PCT': float(best_phase2['bt_take_profit_pct']),
        'MAX_HOLDING_DAYS': int(best_phase2['bt_max_holding_days']),
        'MAX_POSITION_SIZE': float(best_phase2['bt_max_position_size']),
        'LONG_ONLY': bool(best_phase2['bt_long_only']),
    }
}

print("=" * 70)
print("üèÜ BEST PARAMETERS FOUND")
print("=" * 70)

print("\nüìä STAGE 1: Signal Generation (sentiment_shock_signals.ipynb)")
for k, v in BEST_PARAMS['signal'].items():
    print(f"   {k} = {v}")

print("\nüîç STAGE 2: Market Move Filter (utils.py / stock-market-check.ipynb)")
for k, v in BEST_PARAMS['filter'].items():
    print(f"   {k} = {v}")

print("\nüí∞ STAGE 3: Backtest (backtest.ipynb)")
for k, v in BEST_PARAMS['backtest'].items():
    print(f"   {k} = {v}")

print("\n" + "=" * 70)
print(f"üìà EXPECTED RETURN: {best_phase2['total_return']:.2f}%")
print(f"   Trades: {best_phase2['n_trades']:.0f}")
print(f"   Win Rate: {best_phase2['win_rate']:.1f}%")
print(f"   Sharpe: {best_phase2['sharpe']:.2f}")
print(f"   Max Drawdown: {best_phase2['max_drawdown']:.2f}%")
print(f"   Buy Signals: {best_phase2['n_buy_signals']:.0f}")
print(f"   Sell Signals: {best_phase2['n_sell_signals']:.0f}")
print("=" * 70)

üèÜ BEST PARAMETERS FOUND

üìä STAGE 1: Signal Generation (sentiment_shock_signals.ipynb)
   WINDOW_DAYS = 3
   BASELINE_DAYS = 30
   MIN_BASELINE_DAYS = 15
   MIN_MENTIONS_IN_WINDOW = 3
   BASE_Z_THRESHOLD = 1.5
   MIN_ABS_SENT_CHANGE = 0.1
   MENTION_SENSITIVITY = 0.2
   BULLISH_BIAS = 0.0

üîç STAGE 2: Market Move Filter (utils.py / stock-market-check.ipynb)
   Z_THRESHOLD = 2.5
   VOL_EXPANSION_THRESHOLD = 1.75
   ATR_MOVE_THRESHOLD = 2.0
   RECENT_DAYS = 5

üí∞ STAGE 3: Backtest (backtest.ipynb)
   STOP_LOSS_PCT = 0.07
   TAKE_PROFIT_PCT = 0.2
   MAX_HOLDING_DAYS = 5
   MAX_POSITION_SIZE = 0.1
   LONG_ONLY = True

üìà EXPECTED RETURN: 34.44%
   Trades: 974
   Win Rate: 52.2%
   Sharpe: 1.89
   Max Drawdown: -7.48%
   Buy Signals: 1680
   Sell Signals: 2037


In [None]:
# =============================================================================
# VISUALIZATION: Parameter Analysis
# =============================================================================

fig, axes = plt.subplots(2, 3, figsize=(16, 10))

# 1. Bullish Bias impact on returns
ax1 = axes[0, 0]
bias_analysis = phase1_df.groupby('sig_bullish_bias')['total_return'].agg(['mean', 'max', 'std']).reset_index()
ax1.bar(bias_analysis['sig_bullish_bias'].astype(str), bias_analysis['mean'], 
        yerr=bias_analysis['std'], capsize=5, color='#2E86AB', alpha=0.8)
ax1.scatter(bias_analysis['sig_bullish_bias'].astype(str), bias_analysis['max'], 
            color='green', s=100, marker='*', zorder=5, label='Max')
ax1.set_xlabel('Bullish Bias')
ax1.set_ylabel('Return (%)')
ax1.set_title('Impact of Bullish Bias on Returns', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

# 2. Z-threshold impact
ax2 = axes[0, 1]
z_analysis = phase1_df.groupby('sig_base_z_threshold')['total_return'].agg(['mean', 'max']).reset_index()
ax2.bar(z_analysis['sig_base_z_threshold'].astype(str), z_analysis['mean'], 
        color='#F18F01', alpha=0.8, label='Mean')
ax2.scatter(z_analysis['sig_base_z_threshold'].astype(str), z_analysis['max'], 
            color='green', s=100, marker='*', zorder=5, label='Max')
ax2.set_xlabel('Base Z-Threshold')
ax2.set_ylabel('Return (%)')
ax2.set_title('Signal Z-Threshold vs Returns', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

# 3. Filter Z-threshold impact
ax3 = axes[0, 2]
filt_z_analysis = phase1_df.groupby('filt_z_threshold')['total_return'].agg(['mean', 'max']).reset_index()
ax3.bar(filt_z_analysis['filt_z_threshold'].astype(str), filt_z_analysis['mean'], 
        color='#A23B72', alpha=0.8, label='Mean')
ax3.scatter(filt_z_analysis['filt_z_threshold'].astype(str), filt_z_analysis['max'], 
            color='green', s=100, marker='*', zorder=5, label='Max')
ax3.set_xlabel('Filter Z-Threshold')
ax3.set_ylabel('Return (%)')
ax3.set_title('Market Move Filter Threshold vs Returns', fontweight='bold')
ax3.legend()
ax3.grid(True, alpha=0.3)

# 4. Long-only vs Long-short
ax4 = axes[1, 0]
strategy_analysis = phase2_df.groupby('bt_long_only')['total_return'].agg(['mean', 'max', 'min']).reset_index()
x = ['Long-Short', 'Long-Only']
ax4.bar(x, strategy_analysis['mean'], color=['#2E86AB', '#28A745'], alpha=0.8)
ax4.scatter(x, strategy_analysis['max'], color='gold', s=150, marker='*', zorder=5, label='Max')
ax4.set_ylabel('Return (%)')
ax4.set_title('Strategy Mode: Long-Only vs Long-Short', fontweight='bold')
ax4.legend()
ax4.grid(True, alpha=0.3)

# 5. Stop Loss vs Take Profit heatmap
ax5 = axes[1, 1]
pivot = phase2_df.pivot_table(values='total_return', index='bt_stop_loss_pct', 
                               columns='bt_take_profit_pct', aggfunc='mean')
im = ax5.imshow(pivot.values, cmap='RdYlGn', aspect='auto')
ax5.set_xticks(range(len(pivot.columns)))
ax5.set_xticklabels([f'{x*100:.0f}%' for x in pivot.columns])
ax5.set_yticks(range(len(pivot.index)))
ax5.set_yticklabels([f'{x*100:.0f}%' for x in pivot.index])
ax5.set_xlabel('Take Profit')
ax5.set_ylabel('Stop Loss')
ax5.set_title('SL/TP Heatmap (Avg Return)', fontweight='bold')
for i in range(len(pivot.index)):
    for j in range(len(pivot.columns)):
        ax5.text(j, i, f'{pivot.values[i,j]:.1f}%', ha='center', va='center', fontsize=9)
plt.colorbar(im, ax=ax5)

# 6. Signal balance (BUY vs SELL)
ax6 = axes[1, 2]
ax6.scatter(phase1_df['n_buy_signals'], phase1_df['n_sell_signals'], 
            c=phase1_df['total_return'], cmap='RdYlGn', s=50, alpha=0.7)
ax6.plot([0, phase1_df['n_buy_signals'].max()], [0, phase1_df['n_buy_signals'].max()], 
         'k--', alpha=0.5, label='Equal balance')
ax6.set_xlabel('Buy Signals')
ax6.set_ylabel('Sell Signals')
ax6.set_title('Signal Balance (color = return)', fontweight='bold')
ax6.legend()
ax6.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('outputs/parameter_grid_search.png', dpi=150, bbox_inches='tight')
plt.show()

print("\n‚úì Parameter analysis chart saved to outputs/parameter_grid_search.png")

## Code to Update the Three Notebooks

Run the cell below to generate the exact code changes needed for each notebook with the best parameters found.

In [28]:
# =============================================================================
# GENERATE UPDATE CODE FOR EACH NOTEBOOK
# =============================================================================

print("=" * 80)
print("üìù COPY THESE PARAMETERS TO UPDATE YOUR NOTEBOOKS")
print("=" * 80)

print("\n" + "‚îÄ" * 80)
print("1Ô∏è‚É£  sentiment_shock_signals.ipynb - Update CONFIG section:")
print("‚îÄ" * 80)
print(f'''
# =============================================================================
# CONFIG - OPTIMIZED PARAMETERS
# =============================================================================

WINDOW_DAYS = {BEST_PARAMS['signal']['WINDOW_DAYS']}              # Sliding window for "current" sentiment
BASELINE_DAYS = {BEST_PARAMS['signal']['BASELINE_DAYS']}          # Rolling window for historical baseline
MIN_BASELINE_DAYS = {BEST_PARAMS['signal']['MIN_BASELINE_DAYS']}  # Minimum history required
MIN_MENTIONS_IN_WINDOW = {BEST_PARAMS['signal']['MIN_MENTIONS_IN_WINDOW']}  # Minimum mentions
BASE_Z_THRESHOLD = {BEST_PARAMS['signal']['BASE_Z_THRESHOLD']}    # Base z-score threshold
MIN_ABS_SENT_CHANGE = {BEST_PARAMS['signal']['MIN_ABS_SENT_CHANGE']}  # Minimum sentiment change
MENTION_SENSITIVITY = {BEST_PARAMS['signal']['MENTION_SENSITIVITY']}  # Mention volume effect
BULLISH_BIAS = {BEST_PARAMS['signal']['BULLISH_BIAS']}            # NEW: Bias toward BUY signals
''')

print("\n" + "‚îÄ" * 80)
print("2Ô∏è‚É£  utils.py - Update check_market_moved_before_date() thresholds:")
print("‚îÄ" * 80)
print(f'''
# In check_market_moved_before_date() function, update these thresholds:
Z_THRESHOLD = {BEST_PARAMS['filter']['Z_THRESHOLD']}              # For pct_3d_z, ret_z, vol_z
VOL_EXPANSION_THRESHOLD = {BEST_PARAMS['filter']['VOL_EXPANSION_THRESHOLD']}  # Volatility expansion
ATR_MOVE_THRESHOLD = {BEST_PARAMS['filter']['ATR_MOVE_THRESHOLD']}  # ATR-based move
RECENT_DAYS = {BEST_PARAMS['filter']['RECENT_DAYS']}              # Days to check
''')

print("\n" + "‚îÄ" * 80)
print("3Ô∏è‚É£  backtest.ipynb - Update Risk Management Parameters:")
print("‚îÄ" * 80)
print(f'''
# Configuration
INITIAL_CAPITAL = 10000
START_DATE = "2024-01-01"
END_DATE = "2024-12-31"

# Optimized Risk Management Parameters
MAX_POSITION_SIZE = {BEST_PARAMS['backtest']['MAX_POSITION_SIZE']}  # Max position size
STOP_LOSS_PCT = {BEST_PARAMS['backtest']['STOP_LOSS_PCT']}         # Stop loss
TAKE_PROFIT_PCT = {BEST_PARAMS['backtest']['TAKE_PROFIT_PCT']}     # Take profit
MAX_HOLDING_DAYS = {BEST_PARAMS['backtest']['MAX_HOLDING_DAYS']}   # Max holding period
LONG_ONLY = {BEST_PARAMS['backtest']['LONG_ONLY']}                 # Strategy mode
''')

print("\n" + "=" * 80)

üìù COPY THESE PARAMETERS TO UPDATE YOUR NOTEBOOKS

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
1Ô∏è‚É£  sentiment_shock_signals.ipynb - Update CONFIG section:
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

# CONFIG - OPTIMIZED PARAMETERS

WINDOW_DAYS = 3              # Sliding window for "current" sentiment
BASELINE_DAYS = 30          # Rolling window for historical baseline
MIN_BASELINE_DAYS = 15  # Minimum history required
MIN_MENTIONS_IN_WINDOW = 3  # Minimum mentions
BASE_Z_THRESHOLD = 1.5    # Base z-score threshold
MIN_ABS_SENT_CHANGE = 0.1  # Minimum sentiment change
MENTION_SENSITIVITY = 0.2