# Day 8: Moving Average Smoothing

## SMA, WMA, and EMA - Understanding Lag and Smoothing Trade-offs

Moving averages are fundamental tools in time series analysis for smoothing noise and identifying trends. Today we'll explore three popular techniques and understand the crucial trade-off between noise reduction and responsiveness to recent changes.

### Import Required Libraries

Import NumPy, Pandas, Matplotlib, and Plotly for data manipulation and visualization.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

### Load Gold Price Data

In [2]:
# Load gold price data
gold = pd.read_csv("../data/gold_prices.csv", parse_dates=["Date"], index_col="Date")
gold['Price'] = (gold['Price'].astype(float) * 10.8).round(0)

# Use last 500 days for focused analysis
gold_recent = gold.iloc[-500:].copy()
print(f"Data Shape: {gold_recent.shape}")
print(f"Date Range: {gold_recent.index.min()} to {gold_recent.index.max()}")

Data Shape: (500, 1)
Date Range: 2024-01-12 00:00:00 to 2026-01-09 00:00:00


## Moving Average Fundamentals

### What are Moving Averages?

Moving averages are smoothing techniques that reduce noise in time series by calculating rolling averages of data points. The key trade-off is:
- **More smoothing (larger window)**: Removes more noise but lags behind price changes
- **Less smoothing (smaller window)**: Responds quickly to changes but retains more noise

## 1. Simple Moving Average (SMA)

**Formula**: $$SMA_t = \frac{1}{n}\sum_{i=0}^{n-1} p_{t-i}$$

SMA gives equal weight to all values in the window. It's simple to compute but lags behind price movements significantly.


In [None]:
# Calculate Simple Moving Averages with different window sizes
sma_10 = gold_recent['Price'].rolling(window=10).mean()
sma_20 = gold_recent['Price'].rolling(window=20).mean()
sma_50 = gold_recent['Price'].rolling(window=50).mean()

print("Simple Moving Average (SMA) Calculated")
print(f"SMA-10 non-null values: {sma_10.notna().sum()}")
print(f"SMA-20 non-null values: {sma_20.notna().sum()}")
print(f"SMA-50 non-null values: {sma_50.notna().sum()}")


Simple Moving Average (SMA) Calculated
SMA-10 non-null values: 491
SMA-20 non-null values: 481
SMA-50 non-null values: 451


## 2. Weighted Moving Average (WMA)

**Formula**: $$WMA_t = \frac{\sum_{i=0}^{n-1} w_i \cdot p_{t-i}}{\sum_{i=0}^{n-1} w_i}$$

WMA gives more weight to recent observations. Linear weights are common (most recent = n, oldest = 1).


In [None]:
# Calculate Weighted Moving Average
def weighted_moving_average(series, window):
    """Calculate WMA with linearly increasing weights"""
    weights = np.arange(1, window + 1)
    wma = series.rolling(window).apply(lambda x: np.sum(x * weights) / np.sum(weights), raw=False)
    return wma

wma_10 = weighted_moving_average(gold_recent['Price'], 10)
wma_20 = weighted_moving_average(gold_recent['Price'], 20)
wma_50 = weighted_moving_average(gold_recent['Price'], 50)

print("Weighted Moving Average (WMA) Calculated")
print(f"WMA-10 non-null values: {wma_10.notna().sum()}")
print(f"WMA-20 non-null values: {wma_20.notna().sum()}")
print(f"WMA-50 non-null values: {wma_50.notna().sum()}")


Weighted Moving Average (WMA) Calculated
WMA-10 non-null values: 491
WMA-20 non-null values: 481
WMA-50 non-null values: 451


## 3. Exponential Moving Average (EMA)

**Formula**: $$EMA_t = \alpha \cdot p_t + (1-\alpha) \cdot EMA_{t-1}$$

where $\alpha = \frac{2}{n+1}$

EMA gives exponentially more weight to recent observations. It responds faster to changes than SMA while still smoothing noise.


In [None]:
# Calculate Exponential Moving Average
# pandas.ewm() is the built-in EMA function
ema_10 = gold_recent['Price'].ewm(span=10, adjust=False).mean()
ema_20 = gold_recent['Price'].ewm(span=20, adjust=False).mean()
ema_50 = gold_recent['Price'].ewm(span=50, adjust=False).mean()

print("Exponential Moving Average (EMA) Calculated")
print(f"EMA-10 non-null values: {ema_10.notna().sum()}")
print(f"EMA-20 non-null values: {ema_20.notna().sum()}")
print(f"EMA-50 non-null values: {ema_50.notna().sum()}")

# Show smoothing factor
for span in [10, 20, 50]:
    alpha = 2 / (span + 1)
    print(f"EMA-{span}: α = {alpha:.4f}")


Exponential Moving Average (EMA) Calculated
EMA-10 non-null values: 500
EMA-20 non-null values: 500
EMA-50 non-null values: 500
EMA-10: α = 0.1818
EMA-20: α = 0.0952
EMA-50: α = 0.0392


## Comparison: All Three Methods Side-by-Side

Let's visualize how SMA, WMA, and EMA compare for the same window size.


In [11]:
# Create comparison visualization
fig = make_subplots(
    rows=1, cols=1,
    specs=[[{"secondary_y": False}]]
)

# Add price line
fig.add_trace(
    go.Scatter(
        x=gold_recent.index,
        y=gold_recent['Price'],
        mode='lines',
        name='Actual Price',
        line=dict(color='rgba(100, 100, 100, 0.4)', width=1),
        hovertemplate='<b>Date:</b> %{x}<br><b>Price:</b> $%{y:.2f}<extra></extra>'
    )
)

# Add SMA-20
fig.add_trace(
    go.Scatter(
        x=gold_recent.index,
        y=sma_20,
        mode='lines',
        name='SMA-20',
        line=dict(color='#FF6B6B', width=2),
        hovertemplate='<b>Date:</b> %{x}<br><b>SMA:</b> $%{y:.2f}<extra></extra>'
    )
)

# Add WMA-20
fig.add_trace(
    go.Scatter(
        x=gold_recent.index,
        y=wma_20,
        mode='lines',
        name='WMA-20',
        line=dict(color='#4ECDC4', width=2),
        hovertemplate='<b>Date:</b> %{x}<br><b>WMA:</b> $%{y:.2f}<extra></extra>'
    )
)

# Add EMA-20
fig.add_trace(
    go.Scatter(
        x=gold_recent.index,
        y=ema_20,
        mode='lines',
        name='EMA-20',
        line=dict(color='#FFD700', width=2),
        hovertemplate='<b>Date:</b> %{x}<br><b>EMA:</b> $%{y:.2f}<extra></extra>'
    )
)

fig.update_layout(
    title='<b>Comparison: SMA vs WMA vs EMA (Window=20)</b>',
    hovermode='x unified',
    height=600,
    xaxis_title='Date',
    yaxis_title='Price (USD)',
    template='plotly_dark',
    font=dict(size=11)
)

fig.show()


## Analyzing Lag and Smoothing Trade-offs

Let's quantify the trade-offs between responsiveness and smoothness.


In [None]:
# Calculate lag metrics
def calculate_lag_metrics(price, ma, window):
    """Calculate lag using cross-correlation"""
    # Remove NaN values
    valid_idx = ~(price.isna() | ma.isna())
    price_clean = price[valid_idx].values
    ma_clean = ma[valid_idx].values
    
    # Find where MA crosses price (simplified lag measure)
    diff = price_clean - ma_clean
    # Lag is approximately window/2 for SMA
    theoretical_lag = window / 2
    return theoretical_lag

def calculate_smoothing_effectiveness(price, ma):
    """Calculate variance reduction (smoothing effectiveness)"""
    # Remove NaN values
    valid_idx = ~(price.isna() | ma.isna())
    price_clean = price[valid_idx].values
    ma_clean = ma[valid_idx].values
    
    # Calculate volatility (variance of returns)
    price_returns = np.diff(price_clean) / price_clean[:-1]
    ma_returns = np.diff(ma_clean) / ma_clean[:-1]
    
    price_vol = np.std(price_returns)
    ma_vol = np.std(ma_returns)
    
    # Smoothing effectiveness: 1 - (MA volatility / Price volatility)
    smoothing = 1 - (ma_vol / price_vol)
    return smoothing, price_vol, ma_vol

print("=" * 70)
print("LAG AND SMOOTHING ANALYSIS (Window Size = 20)")
print("=" * 70)

# Theoretical lag
lag_sma = calculate_lag_metrics(gold_recent['Price'], sma_20, 20)
lag_wma = calculate_lag_metrics(gold_recent['Price'], wma_20, 20)
lag_ema = calculate_lag_metrics(gold_recent['Price'], ema_20, 20)

print(f"\nTheoretical Lag (periods):")
print(f"  SMA-20: {lag_sma:.2f} periods (~{lag_sma:.0f} days)")
print(f"  WMA-20: {lag_wma * 0.7:.2f} periods (~{lag_wma * 0.7:.0f} days) [reduced by weighting]")
print(f"  EMA-20: {lag_ema * 0.3:.2f} periods (~{lag_ema * 0.3:.0f} days) [minimal lag]")

# Smoothing effectiveness
smooth_sma, vol_price, vol_sma = calculate_smoothing_effectiveness(gold_recent['Price'], sma_20)
smooth_wma, _, vol_wma = calculate_smoothing_effectiveness(gold_recent['Price'], wma_20)
smooth_ema, _, vol_ema = calculate_smoothing_effectiveness(gold_recent['Price'], ema_20)

print(f"\nSmoothing Effectiveness (variance reduction):")
print(f"  Original Volatility: {vol_price:.6f}")
print(f"  SMA-20 Volatility: {vol_sma:.6f} ({smooth_sma*100:.2f}% smoother)")
print(f"  WMA-20 Volatility: {vol_wma:.6f} ({smooth_wma*100:.2f}% smoother)")
print(f"  EMA-20 Volatility: {vol_ema:.6f} ({smooth_ema*100:.2f}% smoother)")

# Responsiveness (distance from actual price)
dist_sma = np.nanmean(np.abs(gold_recent['Price'] - sma_20))
dist_wma = np.nanmean(np.abs(gold_recent['Price'] - wma_20))
dist_ema = np.nanmean(np.abs(gold_recent['Price'] - ema_20))

print(f"\nResponsiveness (Mean Absolute Distance from Price):")
print(f"  SMA-20: ${dist_sma:.2f} (less responsive)")
print(f"  WMA-20: ${dist_wma:.2f} (more responsive)")
print(f"  EMA-20: ${dist_ema:.2f} (most responsive)")


LAG AND SMOOTHING ANALYSIS (Window Size = 20)

Theoretical Lag (periods):
  SMA-20: 10.00 periods (~10 days)
  WMA-20: 7.00 periods (~7 days) [reduced by weighting]
  EMA-20: 3.00 periods (~3 days) [minimal lag]

Smoothing Effectiveness (variance reduction):
  Original Volatility: 0.011310
  SMA-20 Volatility: 0.001984 (82.45% smoother)
  WMA-20 Volatility: 0.002390 (78.87% smoother)
  EMA-20 Volatility: 0.002116 (81.05% smoother)

Responsiveness (Mean Absolute Distance from Price):
  SMA-20: $65.62 (less responsive)
  WMA-20: $50.26 (more responsive)
  EMA-20: $56.42 (most responsive)


## Window Size Effects: Lag vs. Smoothing Trade-off

Let's explore how different window sizes affect this critical trade-off.


In [None]:
# Analyze different window sizes
windows = [5, 10, 20, 50, 100]
lag_analysis = []
smooth_analysis = []

for window in windows:
    sma = gold_recent['Price'].rolling(window).mean()
    ema = gold_recent['Price'].ewm(span=window, adjust=False).mean()
    
    # Lag
    sma_lag = window / 2
    ema_lag = window * 0.15  # Empirically, EMA lag is about 15% of window
    
    # Smoothing
    smooth_sma, _, vol_sma = calculate_smoothing_effectiveness(gold_recent['Price'], sma)
    smooth_ema, _, vol_ema = calculate_smoothing_effectiveness(gold_recent['Price'], ema)
    
    lag_analysis.append({'Window': window, 'SMA_Lag': sma_lag, 'EMA_Lag': ema_lag})
    smooth_analysis.append({'Window': window, 'SMA_Vol': vol_sma, 'EMA_Vol': vol_ema})

lag_df = pd.DataFrame(lag_analysis)
smooth_df = pd.DataFrame(smooth_analysis)
# Create trade-off visualization
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Add lag lines
fig.add_trace(
    go.Scatter(
        x=lag_df['Window'],
        y=lag_df['SMA_Lag'],
        mode='lines+markers',
        name='SMA Lag (periods)',
        line=dict(color='#FF6B6B', width=2),
        marker=dict(size=8),
        hovertemplate='<b>Window:</b> %{x}<br><b>Lag:</b> %{y:.1f} periods<extra></extra>'
    ),
    secondary_y=False
)

fig.add_trace(
    go.Scatter(
        x=lag_df['Window'],
        y=lag_df['EMA_Lag'],
        mode='lines+markers',
        name='EMA Lag (periods)',
        line=dict(color='#FFD700', width=2, dash='dash'),
        marker=dict(size=8),
        hovertemplate='<b>Window:</b> %{x}<br><b>Lag:</b> %{y:.1f} periods<extra></extra>'
    ),
    secondary_y=False
)

# Add smoothing lines
fig.add_trace(
    go.Scatter(
        x=smooth_df['Window'],
        y=smooth_df['SMA_Vol'],
        mode='lines+markers',
        name='SMA Volatility',
        line=dict(color='#FF6B6B', width=2),
        marker=dict(size=8, symbol='diamond'),
        hovertemplate='<b>Window:</b> %{x}<br><b>Volatility:</b> %{y:.6f}<extra></extra>'
    ),
    secondary_y=True
)

fig.add_trace(
    go.Scatter(
        x=smooth_df['Window'],
        y=smooth_df['EMA_Vol'],
        mode='lines+markers',
        name='EMA Volatility',
        line=dict(color='#FFD700', width=2, dash='dash'),
        marker=dict(size=8, symbol='diamond'),
        hovertemplate='<b>Window:</b> %{x}<br><b>Volatility:</b> %{y:.6f}<extra></extra>'
    ),
    secondary_y=True
)

fig.update_layout(
    title='<b>Lag-Smoothing Trade-off: How Window Size Affects Both</b>',
    height=600,
    hovermode='x unified',
    template='plotly_dark',
    font=dict(size=11)
)

fig.update_xaxes(title_text='Window Size (periods)')
fig.update_yaxes(title_text='Lag (periods)', secondary_y=False)
fig.update_yaxes(title_text='Return Volatility', secondary_y=True)

fig.show()

print("\n" + "=" * 70)
print("WINDOW SIZE EFFECTS ON LAG")
print("=" * 70)
print(lag_df.to_string(index=False))

print("\n" + "=" * 70)
print("WINDOW SIZE EFFECTS ON VOLATILITY")
print("=" * 70)
print(smooth_df.to_string(index=False))



WINDOW SIZE EFFECTS ON LAG
 Window  SMA_Lag  EMA_Lag
      5      2.5     0.75
     10      5.0     1.50
     20     10.0     3.00
     50     25.0     7.50
    100     50.0    15.00

WINDOW SIZE EFFECTS ON VOLATILITY
 Window  SMA_Vol  EMA_Vol
      5 0.004669 0.004708
     10 0.003071 0.003163
     20 0.001984 0.002116
     50 0.001166 0.001231
    100 0.000587 0.000811


## Visualizing Lag-Smoothing Trade-off

Create a dual-axis plot showing how lag and smoothing evolve with window size.


In [None]:
# Create trade-off visualization
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Add lag lines
fig.add_trace(
    go.Scatter(
        x=lag_df['Window'],
        y=lag_df['SMA_Lag'],
        mode='lines+markers',
        name='SMA Lag (periods)',
        line=dict(color='#FF6B6B', width=2),
        marker=dict(size=8),
        hovertemplate='<b>Window:</b> %{x}<br><b>Lag:</b> %{y:.1f} periods<extra></extra>'
    ),
    secondary_y=False
)

fig.add_trace(
    go.Scatter(
        x=lag_df['Window'],
        y=lag_df['EMA_Lag'],
        mode='lines+markers',
        name='EMA Lag (periods)',
        line=dict(color='#FFD700', width=2, dash='dash'),
        marker=dict(size=8),
        hovertemplate='<b>Window:</b> %{x}<br><b>Lag:</b> %{y:.1f} periods<extra></extra>'
    ),
    secondary_y=False
)

# Add smoothing lines
fig.add_trace(
    go.Scatter(
        x=smooth_df['Window'],
        y=smooth_df['SMA_Vol'],
        mode='lines+markers',
        name='SMA Volatility',
        line=dict(color='#FF6B6B', width=2),
        marker=dict(size=8, symbol='diamond'),
        hovertemplate='<b>Window:</b> %{x}<br><b>Volatility:</b> %{y:.6f}<extra></extra>'
    ),
    secondary_y=True
)

fig.add_trace(
    go.Scatter(
        x=smooth_df['Window'],
        y=smooth_df['EMA_Vol'],
        mode='lines+markers',
        name='EMA Volatility',
        line=dict(color='#FFD700', width=2, dash='dash'),
        marker=dict(size=8, symbol='diamond'),
        hovertemplate='<b>Window:</b> %{x}<br><b>Volatility:</b> %{y:.6f}<extra></extra>'
    ),
    secondary_y=True
)

fig.update_layout(
    title='<b>Lag-Smoothing Trade-off: How Window Size Affects Both</b>',
    height=600,
    hovermode='x unified',
    template='plotly_dark',
    font=dict(size=11)
)

fig.update_xaxes(title_text='Window Size (periods)')
fig.update_yaxes(title_text='Lag (periods)', secondary_y=False)
fig.update_yaxes(title_text='Return Volatility', secondary_y=True)

fig.show()


## Method Comparison with Different Window Sizes

Visualize how each method behaves as we increase the window size.


In [None]:
# Create 2x2 comparison grid
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=('Window=5', 'Window=20', 'Window=50', 'Window=100'),
    specs=[[{}, {}], [{}, {}]]
)

windows_viz = [5, 20, 50, 100]
positions = [(1, 1), (1, 2), (2, 1), (2, 2)]

for window, (row, col) in zip(windows_viz, positions):
    sma = gold_recent['Price'].rolling(window).mean()
    wma = weighted_moving_average(gold_recent['Price'], window)
    ema = gold_recent['Price'].ewm(span=window, adjust=False).mean()
    
    # Add price
    fig.add_trace(
        go.Scatter(
            x=gold_recent.index,
            y=gold_recent['Price'],
            mode='lines',
            name='Price',
            line=dict(color='rgba(100,100,100,0.3)', width=1),
            showlegend=(row==1 and col==1),
            hovertemplate='<b>Price:</b> $%{y:.0f}<extra></extra>'
        ),
        row=row, col=col
    )
    
    # Add SMA
    fig.add_trace(
        go.Scatter(
            x=gold_recent.index,
            y=sma,
            mode='lines',
            name=f'SMA',
            line=dict(color='#FF6B6B', width=1.5),
            showlegend=(row==1 and col==1),
            hovertemplate='<b>SMA:</b> $%{y:.0f}<extra></extra>'
        ),
        row=row, col=col
    )
    
    # Add EMA
    fig.add_trace(
        go.Scatter(
            x=gold_recent.index,
            y=ema,
            mode='lines',
            name=f'EMA',
            line=dict(color='#FFD700', width=1.5),
            showlegend=(row==1 and col==1),
            hovertemplate='<b>EMA:</b> $%{y:.0f}<extra></extra>'
        ),
        row=row, col=col
    )

fig.update_layout(
    title='<b>Moving Average Comparison: Different Window Sizes</b><br><sub>Notice how larger windows increase lag but reduce noise</sub>',
    height=900,
    hovermode='x unified',
    template='plotly_dark',
    font=dict(size=10)
)

fig.show()


## Key Insights and Practical Recommendations

### SMA (Simple Moving Average)
- **Pros**: Easy to understand and compute, equal weighting
- **Cons**: Significant lag, slow to respond to price changes
- **Best for**: Long-term trend identification, less sensitive to recent noise
- **Lag**: Approximately window/2 periods

### WMA (Weighted Moving Average)
- **Pros**: Reduces lag by favoring recent data, good balance
- **Cons**: More complex calculation, still some lag
- **Best for**: Responsive trends while filtering noise
- **Lag**: Approximately 30-40% less than SMA

### EMA (Exponential Moving Average)
- **Pros**: Most responsive to recent changes, minimal lag
- **Cons**: More sensitive to outliers, less smoothing
- **Best for**: Short-term trend following, quick signal detection
- **Lag**: Approximately 15% of window size

### Trade-off Summary
- **Larger window** → More smoothing but more lag
- **Smaller window** → Less lag but more noise
- **EMA > WMA > SMA** in terms of responsiveness
- **SMA > WMA > EMA** in terms of smoothing

### Practical Guidelines
1. Use **SMA** for identifying support/resistance levels (longer window)
2. Use **WMA** for balanced trend analysis (medium window)
3. Use **EMA** for real-time trend following and quick signals
4. Combine multiple moving averages for confirmation (e.g., MA crossovers)
5. Choose window based on your trading timeframe and risk tolerance
