# 1-Minute vs 1-Second Bar Data Comparison

This notebook loads and visualizes bar data at different timeframes (1-minute and 1-second) from the Nautilus catalog, allowing you to compare price action across different granularities.

## Configuration

First, set your instrument and date range parameters.

In [None]:
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from pathlib import Path
from nautilus_trader.persistence.catalog import ParquetDataCatalog
from nautilus_trader.model.data import BarType
from nautilus_trader.model.identifiers import InstrumentId
from src.utils.time import convert_utc_to_ny

In [None]:
# Configuration
CATALOG_PATH = Path("/data/quiescence/catalog")
INSTRUMENT = "MSFT.POLYGON"  # Change this to any instrument in your catalog
DATE_START = "2025-01-06"  # Adjust date range as needed
DATE_END = "2025-01-10"

print(f"Loading data for {INSTRUMENT}")
print(f"Date range: {DATE_START} to {DATE_END}")
print(f"Catalog path: {CATALOG_PATH}")

In [None]:
# Initialize catalog
catalog = ParquetDataCatalog(CATALOG_PATH)

# Define instrument ID
instrument_id = InstrumentId.from_str(INSTRUMENT)

print(f"Catalog initialized: {catalog}")
print(f"Instrument ID: {instrument_id}")

In [None]:
# Load 1-minute bars
bar_type_1min = BarType.from_str(f"{INSTRUMENT}-1-MINUTE-LAST-EXTERNAL")

bars_1min = catalog.bars(
    bar_types=[bar_type_1min],
    start=DATE_START,
    end=DATE_END
)

print(f"Loaded {len(bars_1min)} 1-minute bars")
if len(bars_1min) > 0:
    print(f"First bar: {bars_1min[0]}")
    print(f"Last bar: {bars_1min[-1]}")

In [None]:
# Load 1-second bars
bar_type_1sec = BarType.from_str(f"{INSTRUMENT}-1-SECOND-LAST-EXTERNAL")

bars_1sec = catalog.bars(
    bar_types=[bar_type_1sec],
    start=DATE_START,
    end=DATE_END
)

print(f"Loaded {len(bars_1sec)} 1-second bars")
if len(bars_1sec) > 0:
    print(f"First bar: {bars_1sec[0]}")
    print(f"Last bar: {bars_1sec[-1]}")

## CRITICAL: Bar Timestamp Convention

**Important:** Nautilus bars have two timestamps:
- `ts_event`: Bar **OPEN** time (start of period) - **we ignore this**
- `ts_init`: Bar **CLOSE** time (end of period) - **we use this**

Since our cataloger shifts Polygon data to use close-time labeling, we must use `bar.ts_init` when converting bars to DataFrames. Using `ts_event` would give us the wrong timestamps (60 seconds earlier).

In [None]:
# Convert bars to DataFrames for easier manipulation
def bars_to_df(bars):
    """Convert Nautilus bars to pandas DataFrame with NY timezone
    
    IMPORTANT: Uses ts_init (bar CLOSE time) not ts_event (bar OPEN time)
    because our cataloged data stores bars with close-time labels.
    """
    if not bars:
        return pd.DataFrame()
    
    data = {
        'timestamp': [bar.ts_init for bar in bars],  # Use ts_init (CLOSE time), not ts_event
        'open': [float(bar.open) for bar in bars],
        'high': [float(bar.high) for bar in bars],
        'low': [float(bar.low) for bar in bars],
        'close': [float(bar.close) for bar in bars],
        'volume': [float(bar.volume) for bar in bars],
    }
    
    df = pd.DataFrame(data)
    # Convert nanoseconds to seconds, then to NY time
    df['timestamp'] = df['timestamp'].apply(lambda x: convert_utc_to_ny(x / 1e9))
    df.set_index('timestamp', inplace=True)
    return df

df_1min = bars_to_df(bars_1min)
df_1sec = bars_to_df(bars_1sec)

print(f"1-minute bars DataFrame shape: {df_1min.shape}")
print(f"1-second bars DataFrame shape: {df_1sec.shape}")
print(f"\n1-minute bars sample (NY time):")
print(df_1min.head(10))
print(f"\n1-second bars sample (NY time):")
print(df_1sec.head(10))

In [None]:
# Create interactive Plotly chart with subplots
fig = make_subplots(
    rows=2, cols=1,
    shared_xaxes=True,
    vertical_spacing=0.05,
    subplot_titles=('1-Minute Bars', '1-Second Bars'),
    row_heights=[0.5, 0.5]
)

# Add 1-minute candlestick trace
if not df_1min.empty:
    fig.add_trace(
        go.Candlestick(
            x=df_1min.index,
            open=df_1min['open'],
            high=df_1min['high'],
            low=df_1min['low'],
            close=df_1min['close'],
            name='1-Minute',
            increasing_line_color='green',
            decreasing_line_color='red'
        ),
        row=1, col=1
    )

# Add 1-second candlestick trace
if not df_1sec.empty:
    fig.add_trace(
        go.Candlestick(
            x=df_1sec.index,
            open=df_1sec['open'],
            high=df_1sec['high'],
            low=df_1sec['low'],
            close=df_1sec['close'],
            name='1-Second',
            increasing_line_color='lightgreen',
            decreasing_line_color='lightcoral'
        ),
        row=2, col=1
    )

# Update layout
fig.update_layout(
    title=f'{INSTRUMENT} - 1-Minute vs 1-Second Bar Comparison',
    xaxis_rangeslider_visible=False,
    xaxis2_rangeslider_visible=False,
    height=900,
    hovermode='x unified',
    showlegend=True
)

# Update y-axis labels
fig.update_yaxes(title_text="Price", row=1, col=1)
fig.update_yaxes(title_text="Price", row=2, col=1)
fig.update_xaxes(title_text="Time", row=2, col=1)

print("Chart created successfully")

In [None]:
# Display the chart
fig.show()

## Bar Aggregation Diagnostic

Let's manually aggregate 1-second bars into 60-second bars and compare with the native 1-minute bars from Polygon.

In [None]:
# Aggregate 1-second bars to 60-second bars using pandas resample
# UPDATED: Now uses label='right' and closed='right' to match close-time labeling

if not df_1sec.empty:
    # Resample to 60-second bars with CLOSE-TIME labeling
    # This matches how the cataloger stores bars (using ts_init = close time)
    df_60sec_agg = df_1sec.resample('60s', label='right', closed='right').agg({
        'open': 'first',   # First price in the period
        'high': 'max',     # Highest price in the period
        'low': 'min',      # Lowest price in the period
        'close': 'last',   # Last price in the period
        'volume': 'sum'    # Total volume in the period
    }).dropna()
    
    print(f"Aggregated {len(df_1sec)} 1-second bars into {len(df_60sec_agg)} 60-second bars")
    print(f"Native 1-minute bars: {len(df_1min)}")
    print(f"\nAggregated 60-second bars sample:")
    print(df_60sec_agg.head())
    print("\nNOTE: Now using close-time labeling (label='right', closed='right')")
else:
    print("No 1-second bars to aggregate")
    df_60sec_agg = pd.DataFrame()

In [None]:
# Compare aggregated 60-second bars with native 1-minute bars
if not df_60sec_agg.empty and not df_1min.empty:
    # Align the two dataframes by timestamp
    df_comparison = pd.DataFrame({
        '1min_open': df_1min['open'],
        '60sec_open': df_60sec_agg['open'],
        '1min_high': df_1min['high'],
        '60sec_high': df_60sec_agg['high'],
        '1min_low': df_1min['low'],
        '60sec_low': df_60sec_agg['low'],
        '1min_close': df_1min['close'],
        '60sec_close': df_60sec_agg['close'],
        '1min_volume': df_1min['volume'],
        '60sec_volume': df_60sec_agg['volume'],
    })
    
    # Calculate differences
    df_comparison['open_diff'] = df_comparison['1min_open'] - df_comparison['60sec_open']
    df_comparison['high_diff'] = df_comparison['1min_high'] - df_comparison['60sec_high']
    df_comparison['low_diff'] = df_comparison['1min_low'] - df_comparison['60sec_low']
    df_comparison['close_diff'] = df_comparison['1min_close'] - df_comparison['60sec_close']
    df_comparison['volume_diff'] = df_comparison['1min_volume'] - df_comparison['60sec_volume']
    
    # Calculate percentage differences for close price
    df_comparison['close_pct_diff'] = (df_comparison['close_diff'] / df_comparison['1min_close']) * 100
    
    print("\n=== COMPARISON OF 1-MINUTE (POLYGON) vs 60-SECOND (AGGREGATED) ===")
    print(f"Number of matching timestamps: {len(df_comparison)}")
    print(f"\nPrice differences (1-min minus 60-sec aggregated):")
    print(df_comparison[['open_diff', 'high_diff', 'low_diff', 'close_diff']].describe())
    print(f"\nClose price percentage differences:")
    print(df_comparison['close_pct_diff'].describe())
    print(f"\nBars with non-zero close difference: {(df_comparison['close_diff'] != 0).sum()}")
    print(f"Max absolute close difference: ${df_comparison['close_diff'].abs().max():.4f}")
    print(f"Max absolute close % difference: {df_comparison['close_pct_diff'].abs().max():.6f}%")
    
    # Show first few rows with differences
    print(f"\nFirst 10 bars with close price differences:")
    diff_rows = df_comparison[df_comparison['close_diff'].abs() > 0].head(10)
    print(diff_rows[['1min_close', '60sec_close', 'close_diff', 'close_pct_diff']])
else:
    print("Cannot compare - missing data")

In [None]:
# Plot comparison of close prices
if not df_60sec_agg.empty and not df_1min.empty:
    fig_comp = go.Figure()
    
    # Add 1-minute bars (Polygon native)
    fig_comp.add_trace(go.Scatter(
        x=df_1min.index,
        y=df_1min['close'],
        mode='lines+markers',
        name='1-Minute (Polygon)',
        line=dict(color='blue', width=2),
        marker=dict(size=4)
    ))
    
    # Add 60-second aggregated bars
    fig_comp.add_trace(go.Scatter(
        x=df_60sec_agg.index,
        y=df_60sec_agg['close'],
        mode='lines+markers',
        name='60-Second (Aggregated from 1-sec)',
        line=dict(color='red', width=2, dash='dash'),
        marker=dict(size=4)
    ))
    
    # Add difference trace (scaled up for visibility)
    if len(df_comparison) > 0:
        fig_comp.add_trace(go.Scatter(
            x=df_comparison.index,
            y=df_comparison['close_diff'] * 1000,  # Scale by 1000 for visibility
            mode='lines',
            name='Difference x1000 (1min - 60sec)',
            line=dict(color='green', width=1),
            yaxis='y2'
        ))
    
    fig_comp.update_layout(
        title='Close Price Comparison: 1-Minute vs 60-Second Aggregated',
        xaxis_title='Time',
        yaxis_title='Close Price ($)',
        yaxis2=dict(
            title='Difference x1000',
            overlaying='y',
            side='right'
        ),
        height=600,
        hovermode='x unified',
        legend=dict(x=0.01, y=0.99)
    )
    
    fig_comp.show()
else:
    print("Cannot plot comparison - missing data")

## Analysis Results

**Root Cause Identified:**
The divergence was caused by using the **wrong timestamp field** when converting Nautilus bars to DataFrames.

**The Bug:**
- Originally used `bar.ts_event` (bar OPEN time) instead of `bar.ts_init` (bar CLOSE time)
- This created a 60-second offset because our cataloger stores bars with close-time labels in `ts_init`

**The Fix:**
1. Changed `bars_to_df()` to use `bar.ts_init` (close time) instead of `bar.ts_event` (open time)
2. Updated aggregation to use `label='right', closed='right'` for close-time labeling

**Results After Fix:**
- ✅ **Timestamps align perfectly** - 1-minute native bars and 60-second aggregated bars now use matching timestamps
- ✅ **OHLCV values match** - Check `df_comparison` above to verify zero/minimal differences
- ✅ **Manual aggregation correct** - Pandas resample with close-time labeling replicates expected behavior

**If P&L Divergence Persists in Backtests:**
Potential causes to investigate:
- **Nautilus TimeBarAggregator**: Verify internal aggregation uses close-time labeling consistently
- **Bar availability timing**: `-INTERNAL` aggregated bars may not be ready at same moment as `-EXTERNAL` native bars
- **Data coverage gaps**: Some 1-second bars might be missing from catalog
- **First/last bar handling**: Edge cases at session boundaries
- **Volume/price precision**: Rounding differences in OHLCV calculations

**Next Steps:**
Log actual bars received by strategy in `on_bar()` to compare timestamps, OHLCV values, and bar counts between 1-MINUTE-EXTERNAL and 60-SECOND-INTERNAL runs.