# What Are We Trading And Why?

This notebook turns raw signals into an explainable thesis card.

It combines:
- Signal anatomy (trend alignment + horizon contributions)
- Trade inventory (which symbols, which side, how strong)
- Evidence of edge (signed forward returns)


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from io import StringIO
from IPython.display import display

pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 6)

from QuantConnect import *
from QuantConnect.Research import QuantBook
from config import TEAM_ID

qb = QuantBook()
print('QuantBook initialized')


def read_csv_from_store(key):
    try:
        if not qb.ObjectStore.ContainsKey(key):
            print(f'ObjectStore key not found: {key}')
            return None
        content = qb.ObjectStore.Read(key)
        if not content:
            print(f'Empty ObjectStore key: {key}')
            return None
        return pd.read_csv(StringIO(content))
    except Exception as e:
        print(f'Error reading {key}: {e}')
        return None


## Data Loading & Attribution Feature Engineering

Loads signal and position data, then engineers several diagnostic columns: ATR-normalized distances from price to each SMA, individual horizon contributions to the composite score, an implied magnitude via tanh to cross-check logged values, and a consensus score measuring how many of the three trend horizons agree in direction. Forward 5-day signed returns are merged in from position prices to enable downstream edge measurement. The head display confirms all engineered columns are present before analysis proceeds.

## Signal Attribution — 4-Panel Explainability Overview

These four plots together tell the story of which trades the strategy selects and why. The top-left bar chart ranks the most signal-active symbols with green/red coloring for net directional bias; the top-right shows which trend horizon (short, medium, long) drives the most composite score on average; the bottom-left breaks signal count down by market setup and direction; and the bottom-right scatter validates the tanh transformation by plotting composite score against logged magnitude. Together they answer the question: are signals coming from where we expect and in the form we intend?

## Edge by Setup, Consensus, and Tier Component Tables

This section tests whether different trend configurations carry different forward return edge by producing three tables: edge by market setup (trend vs pullback vs mixed), edge by consensus bucket (all three horizons agree vs partial vs low), and average component contributions by tier. Each table reports hit rate, mean signed 5-day return, and average signal strength for direct comparison. Comparing setup and consensus buckets reveals whether the strategy's entry conditions are genuinely filtering for higher-quality trend setups.

## Signal Thesis Card

This final section synthesizes the notebook findings into a three-row thesis card answering: what are we trading, why these trades, and where is edge strongest. Supporting evidence includes what fraction of all signals come from full trend-aligned setups and how concentrated the signal book is in the top 10 symbols. The thesis card is designed as a concise one-page summary that can be reviewed before deploying or tuning the strategy.

In [None]:
df_signals = read_csv_from_store(f'{TEAM_ID}/signals.csv')
df_positions = read_csv_from_store(f'{TEAM_ID}/positions.csv')

if df_signals is None:
    raise ValueError('signals.csv is required. Run a backtest with signal logging enabled.')

required_cols = ['date', 'symbol', 'direction', 'magnitude', 'price', 'sma_short', 'sma_medium', 'sma_long', 'atr']
missing = [c for c in required_cols if c not in df_signals.columns]
if missing:
    raise ValueError(f'signals.csv missing required columns: {missing}')

df = df_signals.copy()
df['date'] = pd.to_datetime(df['date'])
for col in ['magnitude', 'price', 'sma_short', 'sma_medium', 'sma_long', 'atr']:
    df[col] = pd.to_numeric(df[col], errors='coerce')

df['direction'] = df['direction'].astype(str).str.title()
df['direction'] = np.where(df['direction'].isin(['Up', 'Down']), df['direction'], np.where(df['magnitude'] >= 0, 'Up', 'Down'))
df['direction_sign'] = np.where(df['direction'].eq('Up'), 1.0, -1.0)
df['abs_magnitude'] = df['magnitude'].abs()
df['tier'] = np.select(
    [df['abs_magnitude'] >= 0.7, df['abs_magnitude'] >= 0.3],
    ['strong', 'moderate'],
    default='weak'
)

safe_atr = df['atr'].replace(0, np.nan)
df['dist_short'] = (df['price'] - df['sma_short']) / safe_atr
df['dist_medium'] = (df['price'] - df['sma_medium']) / safe_atr
df['dist_long'] = (df['price'] - df['sma_long']) / safe_atr

df['contrib_short'] = 0.5 * df['dist_short']
df['contrib_medium'] = 0.3 * df['dist_medium']
df['contrib_long'] = 0.2 * df['dist_long']
df['composite_score'] = df['contrib_short'] + df['contrib_medium'] + df['contrib_long']
df['implied_magnitude'] = np.tanh(df['composite_score'])
df['magnitude_gap'] = df['magnitude'] - df['implied_magnitude']

df['consensus_score'] = np.abs(np.sign(df['dist_short']) + np.sign(df['dist_medium']) + np.sign(df['dist_long']))
df['consensus_bucket'] = np.select(
    [df['consensus_score'] >= 3, df['consensus_score'] >= 2],
    ['full_consensus', 'partial_consensus'],
    default='low_consensus'
)

trend_up = (df['price'] > df['sma_short']) & (df['sma_short'] > df['sma_medium']) & (df['sma_medium'] > df['sma_long'])
trend_down = (df['price'] < df['sma_short']) & (df['sma_short'] < df['sma_medium']) & (df['sma_medium'] < df['sma_long'])
pullback_up = (df['price'] < df['sma_short']) & (df['price'] > df['sma_medium']) & (df['sma_medium'] > df['sma_long'])
pullback_down = (df['price'] > df['sma_short']) & (df['price'] < df['sma_medium']) & (df['sma_medium'] < df['sma_long'])

df['setup'] = np.select(
    [trend_up, trend_down, pullback_up, pullback_down],
    ['trend_up', 'trend_down', 'pullback_up', 'pullback_down'],
    default='mixed'
)

# Build forward 5D returns from price history
price_parts = [
    df[['date', 'symbol', 'price']].assign(source='signals', source_priority=1)
]
if df_positions is not None and {'date', 'symbol', 'price'}.issubset(df_positions.columns):
    px = df_positions[['date', 'symbol', 'price']].copy()
    px['date'] = pd.to_datetime(px['date'])
    px['price'] = pd.to_numeric(px['price'], errors='coerce')
    price_parts.append(px.assign(source='positions', source_priority=0))

prices = pd.concat(price_parts, ignore_index=True)
prices = prices.dropna(subset=['date', 'symbol', 'price'])
prices = prices.sort_values(['symbol', 'date', 'source_priority'])
prices = prices.drop_duplicates(['symbol', 'date'], keep='first')
prices = prices.sort_values(['symbol', 'date'])
prices['fwd_ret_5d'] = prices.groupby('symbol')['price'].shift(-5) / prices['price'] - 1.0

df = df.merge(prices[['date', 'symbol', 'fwd_ret_5d']], on=['date', 'symbol'], how='left')
df['signed_ret_5d'] = df['direction_sign'] * df['fwd_ret_5d']

print(f"signal rows: {len(df):,}")
print(f"symbols: {df['symbol'].nunique()}")
print(f"mean |magnitude gap|: {df['magnitude_gap'].abs().mean():.4f}")
display(df.head())


In [None]:
fig, axes = plt.subplots(2, 2, figsize=(18, 12))

symbol_counts = (
    df.groupby('symbol', as_index=False)
      .agg(signals=('date', 'count'), net_direction=('direction_sign', 'mean'))
      .sort_values('signals', ascending=False)
      .head(12)
)
colors = np.where(symbol_counts['net_direction'] >= 0, '#2ca02c', '#d62728')
axes[0, 0].bar(symbol_counts['symbol'], symbol_counts['signals'], color=colors)
axes[0, 0].set_title('Most Traded Signal Symbols (green=net long, red=net short)')
axes[0, 0].set_ylabel('Signal count')
axes[0, 0].tick_params(axis='x', rotation=35)
axes[0, 0].grid(axis='y', alpha=0.3)

component_strength = pd.DataFrame({
    'component': ['short', 'medium', 'long'],
    'mean_abs_contribution': [
        df['contrib_short'].abs().mean(),
        df['contrib_medium'].abs().mean(),
        df['contrib_long'].abs().mean()
    ]
})
sns.barplot(data=component_strength, x='component', y='mean_abs_contribution', ax=axes[0, 1], palette='Blues_d')
axes[0, 1].set_title('Average Absolute Contribution To Composite Score')
axes[0, 1].set_xlabel('Component')
axes[0, 1].set_ylabel('Contribution units')
axes[0, 1].grid(axis='y', alpha=0.3)

setup_counts = (
    df.groupby(['setup', 'direction'])['symbol']
      .count()
      .reset_index(name='signals')
)
order = ['trend_up', 'pullback_up', 'mixed', 'pullback_down', 'trend_down']
sns.barplot(data=setup_counts, x='setup', y='signals', hue='direction', order=order, ax=axes[1, 0])
axes[1, 0].set_title('Signal Setups By Direction')
axes[1, 0].set_xlabel('Setup')
axes[1, 0].set_ylabel('Signals')
axes[1, 0].tick_params(axis='x', rotation=20)
axes[1, 0].grid(axis='y', alpha=0.3)

sns.scatterplot(data=df, x='composite_score', y='magnitude', hue='tier', alpha=0.7, ax=axes[1, 1])
axes[1, 1].set_title('Composite Score vs Logged Magnitude')
axes[1, 1].set_xlabel('Composite score (weighted ATR distances)')
axes[1, 1].set_ylabel('Logged magnitude')
axes[1, 1].grid(alpha=0.3)

plt.tight_layout()
plt.show()


In [None]:
setup_edge = (
    df.groupby('setup', as_index=False)
      .agg(
          signals=('signed_ret_5d', 'count'),
          hit_rate_5d=('signed_ret_5d', lambda s: (s > 0).mean()),
          avg_signed_ret_5d=('signed_ret_5d', 'mean'),
          median_signed_ret_5d=('signed_ret_5d', 'median'),
          avg_abs_magnitude=('abs_magnitude', 'mean')
      )
      .sort_values('avg_signed_ret_5d', ascending=False)
)
setup_edge['hit_rate_5d'] = 100 * setup_edge['hit_rate_5d']

consensus_edge = (
    df.groupby('consensus_bucket', as_index=False)
      .agg(
          signals=('signed_ret_5d', 'count'),
          hit_rate_5d=('signed_ret_5d', lambda s: (s > 0).mean()),
          avg_signed_ret_5d=('signed_ret_5d', 'mean'),
          avg_abs_magnitude=('abs_magnitude', 'mean')
      )
      .sort_values('avg_signed_ret_5d', ascending=False)
)
consensus_edge['hit_rate_5d'] = 100 * consensus_edge['hit_rate_5d']

component_by_tier = (
    df.groupby('tier', as_index=False)
      .agg(
          avg_contrib_short=('contrib_short', 'mean'),
          avg_contrib_medium=('contrib_medium', 'mean'),
          avg_contrib_long=('contrib_long', 'mean'),
          avg_abs_magnitude=('abs_magnitude', 'mean')
      )
)

print('Edge by setup (5D signed return)')
display(setup_edge)

print('Edge by horizon-consensus bucket')
display(consensus_edge)

print('Average component contributions by tier')
display(component_by_tier)


In [None]:
total_signals = len(df)
setup_share = (
    df['setup'].value_counts(normalize=True)
      .rename_axis('setup')
      .reset_index(name='share')
)
setup_share['share'] = 100 * setup_share['share']

symbol_share = (
    df['symbol'].value_counts(normalize=True)
      .head(10)
      .sum()
)

best_setup = None
if not df['signed_ret_5d'].dropna().empty:
    ranked_setup = setup_edge[setup_edge['signals'] >= 10]
    if not ranked_setup.empty:
        best_setup = ranked_setup.iloc[0]

answer_card = pd.DataFrame([
    {
        'question': 'What are we trading?',
        'answer': 'A concentrated subset of symbols with persistent trend-aligned signals.',
        'evidence': f"Top 10 symbols represent {100 * symbol_share:.1f}% of all signals ({total_signals:,} total)."
    },
    {
        'question': 'Why these trades?',
        'answer': 'Because price/SMA structure and weighted horizon contributions imply directional edge.',
        'evidence': f"Full trend setups share {setup_share.loc[setup_share['setup'].isin(['trend_up', 'trend_down']), 'share'].sum():.1f}% of signals."
    },
    {
        'question': 'Where is edge strongest?',
        'answer': 'Edge is strongest in specific setup/consensus buckets, not uniformly across all signals.',
        'evidence': (
            f"Best eligible setup: {best_setup['setup']} with avg signed 5D return {100 * best_setup['avg_signed_ret_5d']:.2f}% "
            f"and hit rate {best_setup['hit_rate_5d']:.1f}% (n={int(best_setup['signals'])})."
            if best_setup is not None else
            'Not enough 5D return coverage to rank setups yet.'
        )
    }
])

print('Signal Thesis Card')
display(answer_card)

print('Setup share (%)')
display(setup_share)
