# Day 2: Pandas TimeSeries & Point-in-Time Data

## Week 1 - Python for Quantitative Finance

### üéØ Learning Objectives
- Master pandas DatetimeIndex for financial time series
- Understand and prevent look-ahead bias
- Learn resampling, alignment, and data cleaning techniques
- Handle missing data appropriately for backtesting

### ‚è±Ô∏è Time Allocation
- Theory review: 30 min
- Guided exercises: 90 min
- Practice problems: 60 min
- Interview prep: 30 min

---

> ‚ö†Ô∏è **CRITICAL CONCEPT**: Point-in-time (PIT) data management is what separates amateur backtests from professional ones. Getting this wrong will lead to false alpha signals.

**Author**: ML Quant Finance Mastery  
**Difficulty**: Foundation  
**Prerequisites**: Day 1 - NumPy

## 1. Setup and Data Loading

In [1]:
import pandas as pd
import numpy as np
import yfinance as yf
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Download real market data using yfinance
tickers = ['AAPL', 'MSFT', 'GOOGL', 'SPY', 'JPM']
end_date = datetime.now()
start_date = end_date - timedelta(days=5*365)

print("üì• Downloading data from Yahoo Finance...")
data = yf.download(tickers, start=start_date, end=end_date, progress=False, auto_adjust=True)
df = data['Close'].dropna()

print(f"‚úÖ Data loaded: {df.shape[0]} days, {len(tickers)} stocks")
print(f"üìÖ Index type: {type(df.index).__name__}")
print(f"üìÖ Date range: {df.index[0].strftime('%Y-%m-%d')} to {df.index[-1].strftime('%Y-%m-%d')}")
df.head()

üì• Downloading data from Yahoo Finance...
‚úÖ Data loaded: 1255 days, 5 stocks
üìÖ Index type: DatetimeIndex
üìÖ Date range: 2021-01-25 to 2026-01-22


Ticker,AAPL,GOOGL,JPM,MSFT,SPY
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2021-01-25,139.125839,94.003731,116.348389,220.243134,358.819
2021-01-26,139.359436,94.682121,115.872849,222.929871,358.258881
2021-01-27,138.28862,90.264984,112.596916,223.476791,349.50296
2021-01-28,133.450562,91.965141,114.578331,229.262817,352.508606
2021-01-29,128.456741,90.682831,113.310211,222.57486,345.451599


## 2. DatetimeIndex Fundamentals

The DatetimeIndex is what makes pandas powerful for financial time series. It enables:
- Automatic date alignment between different series
- Easy slicing by date ranges
- Resampling to different frequencies
- Business day awareness

In [2]:
# DatetimeIndex properties
print("üìÖ DATETIMEINDEX PROPERTIES")
print("=" * 50)
print(f"Index dtype:     {df.index.dtype}")
print(f"Frequency:       {df.index.freq}")  # None = irregular (daily trading)
print(f"Timezone:        {df.index.tz}")     # None = naive
print(f"Is monotonic:    {df.index.is_monotonic_increasing}")

# Date slicing - multiple ways
print("\nüìä DATE SLICING EXAMPLES")
print("-" * 50)

# Method 1: String slicing using .loc (required for index-based slicing)
data_2023 = df.loc['2023']
print(f"2023 data:       {len(data_2023)} days")

# Method 2: Range slicing
data_q1_2023 = df.loc['2023-01':'2023-03']
print(f"Q1 2023:         {len(data_q1_2023)} days")

# Method 3: Specific date
specific_day = df.loc['2023-06-15']
print(f"June 15, 2023:   {specific_day['AAPL']:.2f} (AAPL)")

# Method 4: Date range with loc
data_range = df.loc['2023-06-01':'2023-06-30']
print(f"June 2023:       {len(data_range)} days")

üìÖ DATETIMEINDEX PROPERTIES
Index dtype:     datetime64[ns]
Frequency:       None
Timezone:        None
Is monotonic:    True

üìä DATE SLICING EXAMPLES
--------------------------------------------------
2023 data:       250 days
Q1 2023:         62 days
June 15, 2023:   183.78 (AAPL)
June 2023:       21 days


## 3. Point-in-Time (PIT) Data: The Critical Concept

### What is Look-Ahead Bias?

Look-ahead bias occurs when your backtest uses information that would **not have been available** at the time the trading decision was made.

**Example**: Using today's price to calculate a moving average, then generating today's signal based on that average.

### Why Does It Matter?

> "The most common mistake in backtesting is look-ahead bias. It makes strategies look much better than they actually perform." - Marcos L√≥pez de Prado

In [3]:
# DEMONSTRATION: Look-ahead bias in moving averages

# Calculate returns
returns = df['AAPL'].pct_change()

# ‚ùå WRONG: SMA includes today's data
sma_20_wrong = df['AAPL'].rolling(20).mean()

# ‚úÖ CORRECT: SMA only uses data available before today
sma_20_correct = df['AAPL'].rolling(20).mean().shift(1)

# Generate signals
signal_wrong = (df['AAPL'] > sma_20_wrong).astype(int)
signal_correct = (df['AAPL'] > sma_20_correct).astype(int)

# Compare performance
def calculate_strategy_return(returns, signal):
    """Calculate strategy returns (signal applied next day)"""
    # Signal from day t applied to return from t to t+1
    strategy_returns = signal.shift(1) * returns
    return strategy_returns.dropna()

wrong_returns = calculate_strategy_return(returns, signal_wrong)
correct_returns = calculate_strategy_return(returns, signal_correct)

print("üìä LOOK-AHEAD BIAS DEMONSTRATION")
print("=" * 60)
print(f"\n‚ùå WRONG (with look-ahead):")
print(f"   Annual Return: {wrong_returns.mean() * 252 * 100:.2f}%")
print(f"   Sharpe Ratio:  {wrong_returns.mean() / wrong_returns.std() * np.sqrt(252):.2f}")

print(f"\n‚úÖ CORRECT (point-in-time):")
print(f"   Annual Return: {correct_returns.mean() * 252 * 100:.2f}%")
print(f"   Sharpe Ratio:  {correct_returns.mean() / correct_returns.std() * np.sqrt(252):.2f}")

# Check how many signals differ
diff_signals = (signal_wrong != signal_correct).sum()
print(f"\n‚ö†Ô∏è Number of different signals: {diff_signals} ({diff_signals/len(signal_wrong)*100:.1f}%)")

üìä LOOK-AHEAD BIAS DEMONSTRATION

‚ùå WRONG (with look-ahead):
   Annual Return: 16.60%
   Sharpe Ratio:  0.93

‚úÖ CORRECT (point-in-time):
   Annual Return: 14.37%
   Sharpe Ratio:  0.80

‚ö†Ô∏è Number of different signals: 18 (1.4%)


## 4. Resampling: Changing Data Frequency

Financial data often needs to be converted between frequencies:
- Daily ‚Üí Weekly (reduce noise)
- Daily ‚Üí Monthly (risk reporting)
- Minute ‚Üí Daily (OHLCV aggregation)

In [4]:
# Resampling examples

# Daily to Weekly (business week end)
weekly_close = df['AAPL'].resample('W-FRI').last()
weekly_return = df['AAPL'].resample('W-FRI').last().pct_change()

# Daily to Monthly
monthly_close = df['AAPL'].resample('ME').last()  # Month End
monthly_ohlc = df['AAPL'].resample('ME').agg({
    'open': 'first',
    'high': 'max', 
    'low': 'min',
    'close': 'last'
}.get('close', 'last'))  # Just close for this example

# Daily to Quarterly
quarterly_close = df['AAPL'].resample('QE').last()

print("üìä RESAMPLING EXAMPLES")
print("=" * 50)
print(f"Daily data:     {len(df)} observations")
print(f"Weekly data:    {len(weekly_close)} observations")
print(f"Monthly data:   {len(monthly_close)} observations")
print(f"Quarterly data: {len(quarterly_close)} observations")

print("\nüìÖ Monthly closing prices (last 6 months):")
print(monthly_close.tail(6).round(2).to_string())

üìä RESAMPLING EXAMPLES
Daily data:     1255 observations
Weekly data:    261 observations
Monthly data:   61 observations
Quarterly data: 21 observations

üìÖ Monthly closing prices (last 6 months):
Date
2025-08-31    231.92
2025-09-30    254.38
2025-10-31    270.11
2025-11-30    278.85
2025-12-31    271.86
2026-01-31    250.07
Freq: ME


## 5. Handling Missing Data

Missing data handling is critical in finance. Common causes:
- Market holidays
- Trading halts
- Delistings
- Data provider issues

**Key Principle**: Forward-fill (ffill) is point-in-time safe. Backward-fill (bfill) creates look-ahead bias!

In [5]:
# Create sample data with missing values
df_with_gaps = df['AAPL'].copy()

# Introduce some artificial gaps
np.random.seed(42)
missing_idx = np.random.choice(len(df_with_gaps), size=20, replace=False)
df_with_gaps.iloc[missing_idx] = np.nan

print("üìä MISSING DATA HANDLING")
print("=" * 50)
print(f"Original: {df_with_gaps.notna().sum()} valid, {df_with_gaps.isna().sum()} missing")

# ‚úÖ Forward fill (point-in-time safe)
df_ffill = df_with_gaps.ffill()

# ‚ùå Backward fill (creates look-ahead bias!)
df_bfill = df_with_gaps.bfill()

# ‚úÖ Linear interpolation (use with caution)
df_interpolate = df_with_gaps.interpolate(method='linear')

# Compare at a missing point
missing_example_idx = missing_idx[0]
actual_value = df['AAPL'].iloc[missing_example_idx]

print(f"\nüìÖ Example at index {missing_example_idx}:")
print(f"   Actual value:     {actual_value:.2f}")
print(f"   ‚úÖ Forward fill:  {df_ffill.iloc[missing_example_idx]:.2f}")
print(f"   ‚ùå Backward fill: {df_bfill.iloc[missing_example_idx]:.2f}")
print(f"   ‚ö†Ô∏è Interpolate:   {df_interpolate.iloc[missing_example_idx]:.2f}")

# Forward fill limit
df_ffill_limit = df_with_gaps.ffill(limit=3)  # Only fill up to 3 consecutive NaNs
print(f"\nüí° Best practice: Use ffill with limit to avoid stale data")

üìä MISSING DATA HANDLING
Original: 1235 valid, 20 missing

üìÖ Example at index 1196:
   Actual value:     268.74
   ‚úÖ Forward fill:  268.55
   ‚ùå Backward fill: 269.44
   ‚ö†Ô∏è Interpolate:   268.99

üí° Best practice: Use ffill with limit to avoid stale data


## 6. Rolling Windows & Expanding Windows

In [6]:
# Rolling window statistics
returns = df['AAPL'].pct_change()

# Rolling mean and std
rolling_mean = returns.rolling(window=20).mean()
rolling_std = returns.rolling(window=20).std()
rolling_sharpe = rolling_mean / rolling_std * np.sqrt(252)

# Expanding window (cumulative from start)
expanding_mean = returns.expanding().mean()
expanding_std = returns.expanding().std()

# Exponentially weighted (recent data weighted more)
ewm_mean = returns.ewm(span=20).mean()
ewm_std = returns.ewm(span=20).std()

print("üìä WINDOW STATISTICS")
print("=" * 50)
print(f"\n{'Statistic':<25} {'Rolling 20d':<15} {'Expanding':<15} {'EWM 20':<15}")
print("-" * 70)
print(f"{'Mean (last value)':<25} {rolling_mean.iloc[-1]*100:.4f}%    {expanding_mean.iloc[-1]*100:.4f}%    {ewm_mean.iloc[-1]*100:.4f}%")
print(f"{'Std (last value)':<25} {rolling_std.iloc[-1]*100:.4f}%    {expanding_std.iloc[-1]*100:.4f}%    {ewm_std.iloc[-1]*100:.4f}%")

print(f"\nüí° Use case guidance:")
print(f"   Rolling:   Fixed lookback (technical indicators)")
print(f"   Expanding: Growing history (all-time metrics)")
print(f"   EWM:       Recent-weighted (adaptive to regime changes)")

üìä WINDOW STATISTICS

Statistic                 Rolling 20d     Expanding       EWM 20         
----------------------------------------------------------------------
Mean (last value)         -0.3957%    0.0619%    -0.4172%
Std (last value)          0.9976%    1.7441%    1.1668%

üí° Use case guidance:
   Rolling:   Fixed lookback (technical indicators)
   Expanding: Growing history (all-time metrics)
   EWM:       Recent-weighted (adaptive to regime changes)


## 7. Multi-Asset Alignment

When working with multiple assets, ensure proper alignment:
- Different trading calendars (US vs UK)
- Different data start dates
- Missing data on different days

In [7]:
# Demonstrate alignment
aapl = df['AAPL']
spy = df['SPY']

# Calculate correlation with different alignment methods
# Method 1: Inner join (default) - only common dates
correlation_inner = aapl.corr(spy)

# Method 2: What happens with unaligned data?
# Create shifted series to simulate misalignment
aapl_shifted = aapl.shift(1)  # Yesterday's AAPL with today's SPY

# This correlation is WRONG (compares different days)
correlation_wrong = aapl_shifted.corr(spy)

print("üìä ALIGNMENT DEMONSTRATION")
print("=" * 50)
print(f"Correct correlation (same-day): {correlation_inner:.4f}")
print(f"Wrong correlation (misaligned):  {correlation_wrong:.4f}")

print(f"\nüí° Key point: pandas auto-aligns by index")
print(f"   Always verify your data is properly aligned!")

# Useful alignment functions
print(f"\nüìã Common alignment operations:")
print(f"   df.align()    - Align two DataFrames")
print(f"   df.reindex()  - Align to a specific index")
print(f"   df.dropna()   - Remove rows with any NaN")
print(f"   df.dropna(how='all') - Remove only all-NaN rows")

üìä ALIGNMENT DEMONSTRATION
Correct correlation (same-day): 0.9310
Wrong correlation (misaligned):  0.9287

üí° Key point: pandas auto-aligns by index
   Always verify your data is properly aligned!

üìã Common alignment operations:
   df.align()    - Align two DataFrames
   df.reindex()  - Align to a specific index
   df.dropna()   - Remove rows with any NaN
   df.dropna(how='all') - Remove only all-NaN rows


## 8. Practice: Build a Point-in-Time Signal Generator

Create a function that generates trading signals without any look-ahead bias.

In [8]:
def generate_pit_signals(prices: pd.Series, 
                          short_window: int = 10,
                          long_window: int = 50) -> pd.DataFrame:
    """
    Generate point-in-time safe trading signals using moving average crossover.
    
    All indicators are shifted by 1 day to ensure no look-ahead bias.
    
    Parameters:
    -----------
    prices : pd.Series
        Price series with DatetimeIndex
    short_window : int
        Short moving average period
    long_window : int  
        Long moving average period
        
    Returns:
    --------
    pd.DataFrame with columns: price, sma_short, sma_long, signal, position
    """
    result = pd.DataFrame(index=prices.index)
    result['price'] = prices
    
    # Calculate indicators and SHIFT to make point-in-time
    result['sma_short'] = prices.rolling(short_window).mean().shift(1)
    result['sma_long'] = prices.rolling(long_window).mean().shift(1)
    
    # Generate signal based on yesterday's indicators
    result['signal'] = 0
    result.loc[result['sma_short'] > result['sma_long'], 'signal'] = 1
    result.loc[result['sma_short'] < result['sma_long'], 'signal'] = -1
    
    # Position is signal shifted (we trade at open after signal)
    result['position'] = result['signal']
    
    # Calculate returns
    result['returns'] = result['price'].pct_change()
    result['strategy_returns'] = result['position'].shift(1) * result['returns']
    
    return result

# Apply to AAPL
signals = generate_pit_signals(df['AAPL'], short_window=10, long_window=50)

# Evaluate
total_return = (1 + signals['strategy_returns'].dropna()).prod() - 1
buy_hold_return = (1 + signals['returns'].dropna()).prod() - 1
sharpe = signals['strategy_returns'].mean() / signals['strategy_returns'].std() * np.sqrt(252)

print("üìä POINT-IN-TIME SIGNAL PERFORMANCE")
print("=" * 50)
print(f"Strategy Return: {total_return*100:.2f}%")
print(f"Buy & Hold:      {buy_hold_return*100:.2f}%")
print(f"Strategy Sharpe: {sharpe:.2f}")
print(f"\n‚úÖ All signals are point-in-time safe!")

üìä POINT-IN-TIME SIGNAL PERFORMANCE
Strategy Return: 13.89%
Buy & Hold:      79.74%
Strategy Sharpe: 0.23

‚úÖ All signals are point-in-time safe!


## 9. Summary & Key Takeaways

### ‚úÖ What You Learned Today

1. **DatetimeIndex** enables powerful date slicing and alignment
2. **Point-in-time data** is critical - always use `.shift(1)` for indicators
3. **Forward-fill** is safe, **backward-fill** creates look-ahead bias
4. **Resampling** converts between frequencies (daily ‚Üí weekly ‚Üí monthly)
5. **Rolling vs Expanding vs EWM** - each has specific use cases
6. **Alignment** is automatic but must be verified

### üéØ Interview Tips

- Know the difference between `.shift()` and `.diff()`
- Explain look-ahead bias and how to prevent it
- Understand when to use ffill vs dropna
- Be able to resample and aggregate data correctly

### üìö Tomorrow's Preview

**Day 3: Returns, Volatility & Risk Metrics**
- Deep dive into return calculations
- Volatility modeling (GARCH preview)
- VaR and Expected Shortfall
- Drawdown analysis

## üî¥ PROS & CONS: Pandas for Financial Time Series

### ‚úÖ PROS

| Advantage | Details | Real-World Use |
|-----------|---------|----------------|
| **DatetimeIndex** | Native date handling, auto-alignment | Essential for multi-asset analysis |
| **Point-in-Time Safe** | `.shift()` prevents look-ahead bias | Critical for backtesting integrity |
| **Resampling** | Easy frequency conversion | Risk reports, signal generation |
| **Missing Data** | Built-in `ffill`, `dropna` | Handle market holidays cleanly |
| **Rolling Windows** | `.rolling()`, `.expanding()`, `.ewm()` | Technical indicators, vol estimation |
| **SQL-like Operations** | Merge, join, groupby | Factor analysis, portfolio grouping |

### ‚ùå CONS

| Limitation | Details | Workaround |
|------------|---------|------------|
| **Memory Intensive** | DataFrame copies on operations | Use `inplace=True` or work with views |
| **Slow for Large Data** | Single-threaded | Use Polars, Dask, or Vaex |
| **Timezone Confusion** | Naive vs aware datetimes | Always use `tz_localize` and `tz_convert` |
| **Chained Indexing** | `df[x][y]` warnings | Use `.loc[]` or `.iloc[]` |
| **Learning Curve** | Many ways to do same thing | Follow consistent patterns |

### üéØ Real-World Usage

**WHERE PANDAS IS USED:**
- ‚úÖ All quant research teams (Two Sigma, D.E. Shaw, etc.)
- ‚úÖ Risk management systems
- ‚úÖ Alpha research & backtesting
- ‚úÖ Data cleaning pipelines
- ‚úÖ Regulatory reporting

**THIS IS NOT JUST THEORY:**
Pandas is the #1 tool for quant data manipulation. The `.shift(1)` pattern for point-in-time safety is used in every professional backtest.

## üöÄ TODAY'S TRADING SIGNAL (Point-in-Time Analysis)

In [9]:
# =============================================================================
# TODAY'S TRADING SIGNAL - Moving Average Crossover (Point-in-Time Safe)
# =============================================================================

print("=" * 70)
print("üìä TODAY'S MA CROSSOVER SIGNALS - Point-in-Time Safe")
print("=" * 70)
print(f"Analysis Date: {datetime.now().strftime('%Y-%m-%d %H:%M')}")
print()

def generate_today_signal(prices, ticker, short=10, long=50):
    """Generate today's trading signal using PIT-safe moving averages"""
    sma_short = prices.rolling(short).mean()
    sma_long = prices.rolling(long).mean()
    
    # Yesterday's values (what we'd know at market open today)
    short_yesterday = sma_short.iloc[-2]
    long_yesterday = sma_long.iloc[-2]
    
    # Current position
    current_price = prices.iloc[-1]
    
    # Trend direction
    trend = "BULLISH" if short_yesterday > long_yesterday else "BEARISH"
    
    # Signal strength (how far above/below)
    spread = (short_yesterday - long_yesterday) / long_yesterday * 100
    
    return {
        'ticker': ticker,
        'price': current_price,
        'sma_short': short_yesterday,
        'sma_long': long_yesterday,
        'trend': trend,
        'spread': spread
    }

print("üìà CURRENT PRICES & MA STATUS:")
print("-" * 70)
print(f"{'Ticker':<8} {'Price':>10} {'SMA(10)':>12} {'SMA(50)':>12} {'Trend':>10} {'Spread':>10}")
print("-" * 70)

signals_today = []
for ticker in tickers:
    sig = generate_today_signal(df[ticker], ticker)
    signals_today.append(sig)
    trend_emoji = "üü¢" if sig['trend'] == "BULLISH" else "üî¥"
    print(f"{sig['ticker']:<8} ${sig['price']:>9.2f} ${sig['sma_short']:>11.2f} ${sig['sma_long']:>11.2f} {trend_emoji} {sig['trend']:<8} {sig['spread']:>+9.2f}%")

print("\n" + "=" * 70)
print("üéØ TRADING RECOMMENDATIONS FOR TODAY")
print("=" * 70)

for sig in signals_today:
    print(f"\n{'='*25} {sig['ticker']} {'='*25}")
    
    if sig['trend'] == "BULLISH" and sig['spread'] > 2:
        print(f"   Signal: üü¢ STRONG BUY")
        print(f"   Action: Consider CALL options or long shares")
        print(f"   Confidence: HIGH (SMA10 well above SMA50)")
    elif sig['trend'] == "BULLISH" and sig['spread'] > 0:
        print(f"   Signal: üü° WEAK BUY")
        print(f"   Action: Consider small position or wait for pullback")
        print(f"   Confidence: MODERATE (Recently crossed)")
    elif sig['trend'] == "BEARISH" and sig['spread'] < -2:
        print(f"   Signal: üî¥ STRONG SELL")
        print(f"   Action: Consider PUT options or exit longs")
        print(f"   Confidence: HIGH (SMA10 well below SMA50)")
    else:
        print(f"   Signal: üü† WEAK SELL / CAUTIOUS")
        print(f"   Action: Reduce position or stay on sidelines")
        print(f"   Confidence: MODERATE")
    
    print(f"   Reasoning: {sig['spread']:+.2f}% spread between SMA10/SMA50")

print("\n" + "=" * 70)
print("‚ö†Ô∏è DISCLAIMER: This is educational analysis using moving average crossovers.")
print("   MA crossover is a trend-following strategy - it lags and can whipsaw.")
print("   Always combine with other indicators and proper risk management.")
print("=" * 70)

üìä TODAY'S MA CROSSOVER SIGNALS - Point-in-Time Safe
Analysis Date: 2026-01-22 23:48

üìà CURRENT PRICES & MA STATUS:
----------------------------------------------------------------------
Ticker        Price      SMA(10)      SMA(50)      Trend     Spread
----------------------------------------------------------------------
AAPL     $   250.07 $     256.81 $     270.59 üî¥ BEARISH      -5.09%
MSFT     $   451.64 $     466.32 $     482.55 üî¥ BEARISH      -3.36%
GOOGL    $   331.07 $     329.28 $     310.91 üü¢ BULLISH      +5.91%
SPY      $   690.40 $     689.93 $     680.26 üü¢ BULLISH      +1.42%
JPM      $   305.55 $     315.57 $     313.49 üü¢ BULLISH      +0.67%

üéØ TRADING RECOMMENDATIONS FOR TODAY

   Signal: üî¥ STRONG SELL
   Action: Consider PUT options or exit longs
   Confidence: HIGH (SMA10 well below SMA50)
   Reasoning: -5.09% spread between SMA10/SMA50

   Signal: üî¥ STRONG SELL
   Action: Consider PUT options or exit longs
   Confidence: HIGH (SMA10 well