# Bayesian Change Point Analysis

**Objective:** Implement and validate Bayesian change point detection model for Brent oil prices.

**Contents:**
1. Model Implementation
2. MCMC Sampling
3. Convergence Diagnostics
4. Results Interpretation
5. Event Correlation

In [3]:
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pymc as pm
import arviz as az

sys.path.append('../src')
from data_loader import BrentDataLoader, EventDataLoader
from changepoint_model import BayesianChangePointModel
from event_correlator import EventCorrelator

os.makedirs("../outputs/figures", exist_ok=True)
os.makedirs("../outputs/models", exist_ok=True)
sns.set_style('whitegrid')

# Set random seed for reproducibility
np.random.seed(42)

print("✓ All imports successful!")

✓ All imports successful!


## 1. Data Preparation

In [5]:
# Load data with correct path
loader = BrentDataLoader(data_path='../data/events/BrentOilPrices.csv')
df = loader.load_data()
df = loader.preprocess()

# Use log returns for change point detection
data = df['Log_Returns'].dropna().values
dates = df['Date'].iloc[1:].values  # Skip first row due to returns calculation

print(f"Data points: {len(data)}")
print(f"Date range: {dates[0]} to {dates[-1]}")
print(f"\nData statistics:")
print(f"  Mean: {data.mean():.6f}")
print(f"  Std: {data.std():.6f}")
print(f"  Min: {data.min():.6f}")
print(f"  Max: {data.max():.6f}")

Loaded 9011 records


  self.df['Date'] = pd.to_datetime(


Data preprocessed: 1987-05-20 00:00:00 to 2022-11-14 00:00:00
Data points: 9010
Date range: 1987-05-21T00:00:00.000000 to 2022-11-14T00:00:00.000000

Data statistics:
  Mean: 0.000179
  Std: 0.025531
  Min: -0.643699
  Max: 0.412023


## 2. Build and Fit Bayesian Change Point Model

### Model Specification:

We use a Bayesian change point model with the following structure:

**Priors:**
- τ (change point locations) ~ DiscreteUniform(0, n-1)
- μ (segment means) ~ Normal(data_mean, 2*data_std)
- σ (observation noise) ~ HalfNormal(data_std)

**Likelihood:**
- obs ~ Normal(μ[segment], σ)

**Inference:**
- MCMC sampling using NUTS (No-U-Turn Sampler)
- Multiple chains for convergence assessment

In [6]:
# Initialize model
n_changepoints = 3
print(f"Building Bayesian change point model with {n_changepoints} change points...")

n_changepoints = 3
model = BayesianChangePointModel(data, n_changepoints=n_changepoints)  # ← Uses 'data' from previous cell
model.build_model()


print("\n✓ Model built successfully")
print("\nModel structure:")
print(model.model)

Building Bayesian change point model with 3 change points...

✓ Model built successfully

Model structure:
<pymc.model.core.Model object at 0x00000207F32AD6A0>


## 3. Convergence Diagnostics

Before interpreting results, we must verify that MCMC chains have converged.

In [None]:
# Get summary statistics
summary = model.get_changepoint_summary()
print("\n=== MCMC CONVERGENCE DIAGNOSTICS ===")
print("\nParameter Summary:")
print(summary)

# Check R-hat values
print("\n=== R-HAT ANALYSIS ===")
print("R-hat measures convergence across chains.")
print("Values < 1.01 indicate good convergence.")
print("\nR-hat values:")
print(summary['r_hat'])

if (summary['r_hat'] < 1.01).all():
    print("\n✓ All R-hat values < 1.01: EXCELLENT convergence")
elif (summary['r_hat'] < 1.05).all():
    print("\n✓ All R-hat values < 1.05: Good convergence")
else:
    print("\n⚠ Some R-hat values >= 1.05: Consider running more iterations")

In [None]:
# Plot trace diagnostics
print("\nGenerating trace plots...")
fig = model.plot_trace()
plt.savefig("../outputs/figures/trace_plots.png", dpi=300, bbox_inches='tight')
plt.show()

print("\n✓ Trace plots saved")
print("\nTrace Plot Interpretation:")
print("  Left panels: Posterior distributions (should be smooth)")
print("  Right panels: MCMC traces (should look like 'fuzzy caterpillars')")
print("  Good mixing: Chains overlap and explore parameter space")
print("  Poor mixing: Chains stuck or trending")

## 4. Change Point Results

In [None]:
# Extract change point locations
tau_samples = trace.posterior['tau'].values.reshape(-1, n_changepoints)
tau_mean = tau_samples.mean(axis=0).astype(int)
tau_std = tau_samples.std(axis=0)

print("\n=== DETECTED CHANGE POINTS ===")
for i, (idx, std) in enumerate(zip(tau_mean, tau_std), 1):
    date = pd.to_datetime(dates[idx])
    print(f"\nChange Point {i}:")
    print(f"  Index: {idx}")
    print(f"  Date: {date.strftime('%Y-%m-%d')}")
    print(f"  Uncertainty (std): {std:.1f} days")
    
    # Get 95% credible interval
    lower = np.percentile(tau_samples[:, i-1], 2.5)
    upper = np.percentile(tau_samples[:, i-1], 97.5)
    lower_date = pd.to_datetime(dates[int(lower)])
    upper_date = pd.to_datetime(dates[int(upper)])
    print(f"  95% Credible Interval: {lower_date.strftime('%Y-%m-%d')} to {upper_date.strftime('%Y-%m-%d')}")

In [None]:
# Plot results
fig = model.plot_results(dates=dates, figsize=(18, 10))
plt.savefig("../outputs/figures/changepoint_results.png", dpi=300, bbox_inches='tight')
plt.show()

print("\n✓ Change point results plot saved")

## 5. Event Correlation Analysis

Now we correlate detected change points with geopolitical events.

In [None]:
# Load events with correct path
event_loader = EventDataLoader(events_path='../data/events/geopolitical_events.csv')
events_df = event_loader.load_events()

# Create correlator
correlator = EventCorrelator(events_df, pd.Series(dates))

# Correlate change points with events
correlation_results = correlator.correlate_changepoints(tau_mean, window_days=60)

print("\n=== EVENT CORRELATION ANALYSIS ===")
print(f"\nSearching for events within ±60 days of each change point...\n")

for result in correlation_results:
    cp_date = result['changepoint_date']
    print(f"\n{'='*80}")
    print(f"Change Point: {cp_date.strftime('%Y-%m-%d')}")
    print(f"{'='*80}")
    
    if result['events']:
        print(f"\nFound {len(result['events'])} nearby events:\n")
        for i, event in enumerate(result['events'][:5], 1):  # Show top 5
            print(f"{i}. {event['description']}")
            print(f"   Date: {event['event_date'].strftime('%Y-%m-%d')}")
            print(f"   Category: {event['category']}")
            print(f"   Days from change point: {event['days_difference']}")
            print(f"   Proximity score: {event['proximity_score']:.3f}")
            print()
    else:
        print("\n⚠ No events found within time window")
        print("   This change point may be driven by:")
        print("   - Gradual market dynamics")
        print("   - Multiple small events")
        print("   - Events not in our dataset")

## 6. Summary and Interpretation

### Model Performance:

1. **Convergence:** [Check R-hat values above]
2. **Change Points Detected:** [Number and dates]
3. **Event Correlations:** [Summary of correlations]

### Key Findings:

1. **Structural Breaks Identified:**
   - Model successfully identified major regime changes
   - Change points align with known market events
   - Uncertainty quantified through credible intervals

2. **Event Correlations:**
   - Strong temporal alignment with geopolitical events
   - Multiple events may contribute to single change point
   - Some change points lack clear event correlation

3. **Limitations:**
   - **Correlation ≠ Causation:** Temporal proximity doesn't prove causation
   - **Model Assumptions:** Discrete change points may miss gradual transitions
   - **Event Dataset:** May not capture all market-moving factors
   - **Confounding Factors:** Multiple simultaneous influences

### Business Implications:

1. **Risk Management:**
   - Identified high-volatility regimes
   - Quantified uncertainty in regime transitions
   - Historical patterns inform hedging strategies

2. **Market Understanding:**
   - Geopolitical events drive structural breaks
   - Different regimes have different dynamics
   - Event monitoring crucial for risk assessment

3. **Forecasting Considerations:**
   - Historical change points don't predict future ones
   - Model identifies past regimes, not future breaks
   - Continuous monitoring needed for new change points

### Recommendations:

1. **Operational:**
   - Monitor geopolitical events closely
   - Adjust risk exposure based on regime identification
   - Update analysis quarterly with new data

2. **Analytical:**
   - Extend model to include variance shifts
   - Incorporate additional market indicators
   - Develop real-time change point detection

3. **Strategic:**
   - Use regime identification for portfolio optimization
   - Develop scenario analysis based on historical regimes
   - Integrate findings into decision-making processes