# Order Lifecycle

Lifecycle analytics using `{TEAM_ID}/order_events.csv`.

This tracks:
- Submitted, partially filled, filled, canceled counts
- Fill ratio and time-to-final-status
- Final status by execution tier

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from io import StringIO
from IPython.display import display

pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 6)

from QuantConnect import *
from QuantConnect.Research import QuantBook
from config import TEAM_ID

qb = QuantBook()
print('QuantBook initialized')


def read_csv_from_store(key):
    try:
        if not qb.ObjectStore.ContainsKey(key):
            print(f'ObjectStore key not found: {key}')
            return None
        content = qb.ObjectStore.Read(key)
        if not content:
            print(f'Empty ObjectStore key: {key}')
            return None
        return pd.read_csv(StringIO(content))
    except Exception as e:
        print(f'Error reading {key}: {e}')
        return None


## Data Loading — Order Events Log

Loads the order events log from ObjectStore, parses tier and week-ID tags from the order annotation string, and displays the first few rows to confirm the schema. Each row represents one status update (submitted, partially filled, filled, cancelled) for a single order. The `tier` column inferred from tags is the key dimension used to segment all lifecycle statistics that follow.

## Order-Level Summary — Fill Ratio and Days to Final

This cell pivots from status-event rows to one row per order, computing fill ratio, days-to-final-status, and final outcome for each order ID. The resulting `order_summary` DataFrame is the primary analysis unit — one order with its full lifecycle condensed into a single record. The head preview confirms that `fill_ratio` and `days_to_final` are populated before charts are generated.

## Order Lifecycle — 4-Panel Chart

These four panels summarize the order lifecycle from submission to final resolution. Top-left shows counts by final status (filled, cancelled, etc.); top-right shows the fill-ratio distribution to reveal how often orders are only partially filled; bottom-left shows the final status mix as a stacked proportion by tier, testing whether high-tier orders fill more reliably; and bottom-right shows time-to-final-status by tier, checking whether strong-tier market-price limits resolve faster than wide weak-tier limits.

## Cancel Rate and Fill Ratio Scorecard by Tier

This summary table reports cancel rate and average fill ratio for each signal tier, combining lifecycle performance into a compact scorecard. A higher cancel rate for weak-tier orders is expected given their 1.5% limit offset, but if strong-tier orders also show elevated cancels it may indicate the stale-limit cancellation threshold is too aggressive. Compare average fill ratios across tiers to confirm the tiered limit-offset design achieves meaningfully different fill outcomes.

In [None]:
# DIAGNOSTIC: Check what's happening with order_events.csv
print("=== Checking ObjectStore ===")
print(f"Key exists: {qb.ObjectStore.ContainsKey(f'{TEAM_ID}/order_events.csv')}")

if qb.ObjectStore.ContainsKey(f'{TEAM_ID}/order_events.csv'):
    print("\nAttempting to read...")
    try:
        content = qb.ObjectStore.Read(f'{TEAM_ID}/order_events.csv')
        print(f"Content type: {type(content)}")
        print(f"Content length: {len(content) if content else 0}")
        if content:
            print(f"First 200 chars: {content[:200]}")
            # Try parsing
            df_test = pd.read_csv(StringIO(content))
            print(f"\nSuccessfully parsed! Rows: {len(df_test)}, Columns: {list(df_test.columns)}")
    except Exception as e:
        print(f"ERROR: {type(e).__name__}: {e}")
        import traceback
        traceback.print_exc()
else:
    print("\n=== Available keys in ObjectStore: ===")
    # List all keys to see what's actually available
    try:
        all_keys = list(qb.ObjectStore.GetEnumerator())
        print(f"Total keys: {len(all_keys)}")
        wolfpack_keys = [k for k in all_keys if 'wolfpack' in k.lower()]
        print(f"Wolfpack-related keys: {wolfpack_keys}")
    except:
        print("Could not enumerate ObjectStore keys")

In [None]:
import re

df_events = read_csv_from_store(f'{TEAM_ID}/order_events.csv')
if df_events is None:
    raise ValueError('order_events.csv is required. Run a backtest with order-event logging enabled.')

df_events['date'] = pd.to_datetime(df_events['date'])
for col in ['quantity', 'fill_quantity', 'fill_price', 'limit_price']:
    if col in df_events.columns:
        df_events[col] = pd.to_numeric(df_events[col], errors='coerce').fillna(0.0)

def parse_tag_value(tag, key):
    if pd.isna(tag):
        return np.nan
    m = re.search(rf'{key}=([^;]+)', str(tag))
    return m.group(1) if m else np.nan

df_events['tier'] = df_events['tag'].apply(lambda t: parse_tag_value(t, 'tier')).fillna('unknown')
df_events['week_id'] = df_events['tag'].apply(lambda t: parse_tag_value(t, 'week_id')).fillna('')

print(f'order events: {len(df_events):,}')
display(df_events.head())


In [None]:
# Aggregate each order lifecycle
grp = df_events.sort_values('date').groupby('order_id', as_index=False)

order_summary = grp.agg(
    symbol=('symbol', 'first'),
    tier=('tier', 'first'),
    order_type=('order_type', 'first'),
    quantity=('quantity', 'first'),
    submitted_at=('date', 'min'),
    final_at=('date', 'max')
)

final_status = (
    df_events.sort_values('date')
             .groupby('order_id')
             .tail(1)[['order_id', 'status']]
             .rename(columns={'status': 'final_status'})
)

fills = (
    df_events.groupby('order_id', as_index=False)['fill_quantity']
             .sum()
             .rename(columns={'fill_quantity': 'filled_qty'})
)

order_summary = order_summary.merge(final_status, on='order_id', how='left')
order_summary = order_summary.merge(fills, on='order_id', how='left')

order_summary['abs_qty'] = order_summary['quantity'].abs().replace(0, np.nan)
order_summary['fill_ratio'] = (order_summary['filled_qty'].abs() / order_summary['abs_qty']).fillna(0.0).clip(0, 1)
order_summary['days_to_final'] = (order_summary['final_at'] - order_summary['submitted_at']).dt.days.fillna(0)

display(order_summary.head())


In [None]:
fig, axes = plt.subplots(2, 2, figsize=(16, 10))

status_counts = order_summary['final_status'].value_counts().sort_values(ascending=False)
status_counts.plot(kind='bar', ax=axes[0, 0], color='#1f77b4')
axes[0, 0].set_title('Final Order Status Counts')
axes[0, 0].set_ylabel('Count')
axes[0, 0].grid(axis='y', alpha=0.3)

sns.histplot(order_summary['fill_ratio'], bins=20, ax=axes[0, 1], color='#2ca02c')
axes[0, 1].set_title('Fill Ratio Distribution')
axes[0, 1].set_xlabel('Fill ratio')
axes[0, 1].grid(alpha=0.3)

tier_status = pd.crosstab(order_summary['tier'], order_summary['final_status'], normalize='index')
tier_status = tier_status.reindex(['strong', 'moderate', 'weak', 'exit', 'unknown']).dropna(how='all')
tier_status.plot(kind='bar', stacked=True, ax=axes[1, 0], colormap='tab20')
axes[1, 0].set_title('Final Status Mix by Tier')
axes[1, 0].set_ylabel('Share')
axes[1, 0].grid(axis='y', alpha=0.3)

sns.boxplot(data=order_summary, x='tier', y='days_to_final', order=['strong', 'moderate', 'weak', 'exit', 'unknown'], ax=axes[1, 1])
axes[1, 1].set_title('Days to Final Status by Tier')
axes[1, 1].set_xlabel('Tier')
axes[1, 1].set_ylabel('Days')
axes[1, 1].grid(alpha=0.3)

plt.tight_layout()
plt.show()


In [None]:
cancel_rate = (
    order_summary.assign(is_canceled=order_summary['final_status'].astype(str).str.contains('Canceled', case=False, na=False))
                .groupby('tier', as_index=False)
                .agg(orders=('order_id', 'count'),
                     cancel_rate=('is_canceled', 'mean'),
                     avg_fill_ratio=('fill_ratio', 'mean'))
                .sort_values('orders', ascending=False)
)
display(cancel_rate)


## Investigation: Days-to-Cancellation Analysis

Hypothesis: Cancelled orders are being killed at exactly 2 days due to the 2-open-check rule.

In [None]:
# Filter to cancelled orders only
cancelled_orders = order_summary[
    order_summary['final_status'].str.contains('Canceled', case=False, na=False)
].copy()

# Calculate days to cancellation
cancelled_orders['days_to_cancel'] = cancelled_orders['days_to_final']

# Plot distribution by tier
fig, ax = plt.subplots(figsize=(12, 6))
for tier in ['strong', 'moderate', 'weak']:
    tier_data = cancelled_orders[cancelled_orders['tier'] == tier]['days_to_cancel']
    if len(tier_data) > 0:
        ax.hist(tier_data, bins=20, alpha=0.5, label=f'{tier} (n={len(tier_data)})')
ax.set_xlabel('Days to Cancellation')
ax.set_ylabel('Count')
ax.set_title('Days-to-Cancellation Distribution by Tier')
ax.legend()
ax.grid(alpha=0.3)
plt.show()

# Print median/mean by tier
print("\nDays to Cancellation by Tier:")
display(cancelled_orders.groupby('tier')['days_to_cancel'].describe())

## Fill Progression by Order Age

Shows whether fill ratios drop sharply after 2 days (orders cancelled vs naturally unfilled).

In [None]:
# Group orders by days_to_final and tier
fill_by_age = order_summary.groupby(['tier', 'days_to_final'], as_index=False).agg(
    count=('order_id', 'count'),
    avg_fill_ratio=('fill_ratio', 'mean'),
    cancel_rate=('final_status', lambda x: x.str.contains('Canceled', case=False, na=False).mean())
)

# Plot for moderate tier
moderate_data = fill_by_age[fill_by_age['tier'] == 'moderate'].sort_values('days_to_final')

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Fill ratio by age
axes[0].plot(moderate_data['days_to_final'], moderate_data['avg_fill_ratio'],
             marker='o', linewidth=2, markersize=6)
axes[0].set_xlabel('Days to Final Status')
axes[0].set_ylabel('Average Fill Ratio')
axes[0].set_title('Moderate Tier: Fill Ratio by Order Age')
axes[0].grid(alpha=0.3)
axes[0].axvline(x=2, color='red', linestyle='--', label='2-check threshold')
axes[0].legend()

# Cancellation rate by age
axes[1].plot(moderate_data['days_to_final'], moderate_data['cancel_rate'],
             marker='o', linewidth=2, markersize=6, color='red')
axes[1].set_xlabel('Days to Final Status')
axes[1].set_ylabel('Cancellation Rate')
axes[1].set_title('Moderate Tier: Cancellation Rate by Order Age')
axes[1].grid(alpha=0.3)
axes[1].axvline(x=2, color='red', linestyle='--', label='2-check threshold')
axes[1].legend()

plt.tight_layout()
plt.show()

print("\nModerate Tier Order Age Statistics:")
display(moderate_data[['days_to_final', 'count', 'avg_fill_ratio', 'cancel_rate']].head(10))

## Price Movement at Cancellation

Shows whether cancelled orders had favorable price moves but were still cancelled (mechanism issue vs market issue).

In [None]:
# FIXED: Use actual market_price_at_submit and positions.csv for accurate price tracking

# 1. Get actual market price at submission (already logged in order_events)
# Note: Don't select 'symbol' here since cancelled_orders already has it
submit_prices = df_events.groupby('order_id', as_index=False).first()[
    ['order_id', 'date', 'quantity', 'market_price_at_submit']
].rename(columns={'date': 'submit_date', 'quantity': 'submit_qty'})

# 2. Get cancellation dates for cancelled orders
cancel_dates = df_events.groupby('order_id', as_index=False).last()[
    ['order_id', 'date']
].rename(columns={'date': 'cancel_date'})

# 3. Merge with cancelled_orders (which already has symbol from order_summary)
cancelled_with_events = (
    cancelled_orders
    .merge(submit_prices, on='order_id', how='left')
    .merge(cancel_dates, on='order_id', how='left')
)

# 4. Load positions.csv to get market prices at cancellation
df_positions = read_csv_from_store(f'{TEAM_ID}/positions.csv')
if df_positions is not None:
    df_positions['date'] = pd.to_datetime(df_positions['date'])

    # Get price at cancellation by matching (symbol, cancel_date) with positions
    cancel_prices = (
        cancelled_with_events[['order_id', 'symbol', 'cancel_date']]
        .merge(
            df_positions[['symbol', 'date', 'price']],
            left_on=['symbol', 'cancel_date'],
            right_on=['symbol', 'date'],
            how='left'
        )
        .rename(columns={'price': 'cancel_market_price'})
        [['order_id', 'cancel_market_price']]
    )

    cancelled_with_events = cancelled_with_events.merge(cancel_prices, on='order_id', how='left')

    # 5. Calculate price movement in favorable direction
    cancelled_with_events['price_change_pct'] = (
        (cancelled_with_events['cancel_market_price'] - cancelled_with_events['market_price_at_submit'])
        / cancelled_with_events['market_price_at_submit']
        * 100
    )

    # Adjust for direction: for buys, negative is favorable (price dropped)
    # For sells, positive is favorable (price rose)
    cancelled_with_events.loc[cancelled_with_events['submit_qty'] > 0, 'price_change_pct'] *= -1

    # 6. Plot distribution for moderate tier
    moderate_cancelled = cancelled_with_events[cancelled_with_events['tier'] == 'moderate']
    valid_data = moderate_cancelled['price_change_pct'].dropna()

    if len(valid_data) > 0:
        fig, ax = plt.subplots(figsize=(12, 6))
        ax.hist(valid_data, bins=30, alpha=0.7, color='orange')
        ax.axvline(x=0.5, color='green', linestyle='--', linewidth=2, label='0.5% favorable (would fill)')
        ax.axvline(x=-0.5, color='red', linestyle='--', linewidth=2, label='0.5% adverse (would not fill)')
        ax.axvline(x=0, color='gray', linestyle='-', alpha=0.3, label='No movement')
        ax.set_xlabel('Price Change % (Favorable Direction)')
        ax.set_ylabel('Count')
        ax.set_title('Moderate Tier: Price Movement at Cancellation (Fixed - Using Actual Market Prices)')
        ax.legend()
        ax.grid(alpha=0.3)
        plt.show()

        # Print stats
        print("\nModerate Tier Cancelled Orders - Price Movement Stats:")
        print(f"Total cancelled orders analyzed: {len(valid_data):,}")
        print(f"Orders with >0.5% favorable move: {(valid_data > 0.5).sum():,} ({(valid_data > 0.5).mean()*100:.1f}%)")
        print(f"Orders with <-0.5% adverse move: {(valid_data < -0.5).sum():,} ({(valid_data < -0.5).mean()*100:.1f}%)")
        print(f"Orders in neutral range [-0.5%, 0.5%]: {((valid_data >= -0.5) & (valid_data <= 0.5)).sum():,} ({((valid_data >= -0.5) & (valid_data <= 0.5)).mean()*100:.1f}%)")
        print(f"\nMean price movement: {valid_data.mean():.3f}%")
        print(f"Median price movement: {valid_data.median():.3f}%")
    else:
        print("No valid price data available")
else:
    print("positions.csv not found - cannot calculate price movement")

## Scaling Day Analysis

Determine which days of the 5-day scaling window see the most cancellations (requires week_id in order tags).

In [None]:
# Parse week_id and calculate scaling day (requires week_id in tags)
# Check if week_id is available
if 'week_id' in df_events.columns and df_events['week_id'].notna().any():
    print("week_id found in order events - proceeding with analysis")
    
    # Group by week_id and symbol, then rank dates to get day within week
    scaling_analysis = df_events.sort_values('date').copy()
    scaling_analysis['week_day'] = (
        scaling_analysis.groupby(['symbol', 'week_id'])['date']
        .rank(method='dense')
        .astype(int)
    )
    
    # Merge with cancelled_orders to get week_day for each cancelled order
    cancelled_by_day = (
        cancelled_orders.merge(
            scaling_analysis[['order_id', 'week_day']].drop_duplicates(),
            on='order_id',
            how='left'
        )
        .dropna(subset=['week_day'])
        .groupby(['tier', 'week_day'], as_index=False)
        .agg(cancelled_count=('order_id', 'count'))
    )
    
    # Plot by tier
    fig, axes = plt.subplots(1, 2, figsize=(16, 6))
    
    # All tiers together
    total_by_day = cancelled_by_day.groupby('week_day', as_index=False)['cancelled_count'].sum()
    axes[0].bar(total_by_day['week_day'], total_by_day['cancelled_count'], color='steelblue')
    axes[0].set_xlabel('Day in Scaling Window')
    axes[0].set_ylabel('Cancellation Count')
    axes[0].set_title('All Tiers: Cancellations by Day in 5-Day Scaling Window')
    axes[0].set_xticks(range(1, 6))
    axes[0].grid(axis='y', alpha=0.3)
    
    # Moderate tier specifically
    moderate_by_day = cancelled_by_day[cancelled_by_day['tier'] == 'moderate']
    if len(moderate_by_day) > 0:
        axes[1].bar(moderate_by_day['week_day'], moderate_by_day['cancelled_count'], color='orange')
        axes[1].set_xlabel('Day in Scaling Window')
        axes[1].set_ylabel('Cancellation Count')
        axes[1].set_title('Moderate Tier: Cancellations by Day in 5-Day Scaling Window')
        axes[1].set_xticks(range(1, 6))
        axes[1].grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Print summary table
    print("\nCancellations by Scaling Day and Tier:")
    pivot = cancelled_by_day.pivot(index='week_day', columns='tier', values='cancelled_count').fillna(0).astype(int)
    display(pivot)
    
else:
    print("NOTE: week_id not available in order events")
    print("This analysis requires backtest re-run with week_id logging in order tags")
    print("\nCurrent order event columns:")
    print(df_events.columns.tolist())
    print("\nSample tag values:")
    display(df_events[['order_id', 'tag']].head(10))

## Summary: Validating the 2-Check Cancellation Hypothesis

### Expected Results if Hypothesis is **Confirmed**:

1. **Days-to-Cancellation**: Peak at 2 days for moderate tier
2. **Fill Progression**: Sharp cliff at 2 days where cancellation rate spikes
3. **Price Movement**: Some cancelled orders had favorable moves >0.5% (mechanism issue, not market)
4. **Scaling Days**: Early-week orders (days 1-2) cancelled before completion

### If Hypothesis is **Rejected**:
- Different patterns suggest alternative root causes
- May need to investigate: limit pricing, liquidity issues, or other factors

### Next Steps Based on Results:
- **Hypothesis confirmed** → Implement Option 2: signal-aware cancellation (only cancel previous rebalance cycles)
- **Hypothesis rejected** → Investigate alternative approaches (adjust offsets, increase checks, etc.)