# Task 1: Data Exploration and Workflow Definition

## Objectives
1. Define data analysis workflow
2. Understand the model and data
3. Analyze time series properties
4. Research and compile event data

In [3]:
import sys
import os
system_path = os.path.abspath('..')
if system_path not in sys.path:
    sys.path.append(system_path)
%load_ext autoreload
%autoreload 2

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from src.utils.data_loader import load_brent_data, load_events_data
from src.analysis.time_series_analysis import TimeSeriesAnalyzer

plt.style.use('seaborn-v0_8')

## 1. Data Analysis Workflow

### Workflow Steps:
1. **Data Collection & Preprocessing**
   - Load Brent oil price data
   - Research and compile major events
   - Clean and prepare data

2. **Exploratory Data Analysis**
   - Analyze time series properties
   - Check for stationarity
   - Identify trends and volatility patterns

3. **Change Point Modeling**
   - Implement Bayesian change point detection
   - Identify structural breaks
   - Validate model convergence

4. **Event Association**
   - Match change points with events
   - Quantify impact
   - Statistical validation

5. **Results Communication**
   - Interactive dashboard
   - Executive summary
   - Technical report

In [9]:
# Load data (assuming you have the Brent oil data file)
df = load_brent_data('../data/raw/BrentOilPrices.csv')
events_df = load_events_data('../data/events_data.csv')

print("Events Data:")
print(events_df.head())
print(f"\nTotal events: {len(events_df)}")

Events Data:
        Date                     Event      Category  \
0 1990-08-02           Gulf War Begins  Geopolitical   
1 1991-01-17     Gulf War Air Campaign  Geopolitical   
2 2001-09-11              9/11 Attacks  Geopolitical   
3 2003-03-20           Iraq War Begins  Geopolitical   
4 2008-09-15  Lehman Brothers Collapse      Economic   

                                       Description Impact_Expected  
0          Iraq invades Kuwait leading to Gulf War            High  
1  Coalition forces begin air strikes against Iraq            High  
2           Terrorist attacks in the United States            High  
3                   US-led invasion of Iraq begins            High  
4              Global financial crisis intensifies            High  

Total events: 15


In [10]:
df

Unnamed: 0,Date,Price,log_returns
0,1987-05-20,18.63,
1,1987-05-21,18.45,-0.009709
2,1987-05-22,18.55,0.005405
3,1987-05-25,18.60,0.002692
4,1987-05-26,18.63,0.001612
...,...,...,...
9006,2022-11-08,96.85,-0.030706
9007,2022-11-09,93.05,-0.040026
9008,2022-11-10,94.25,0.012814
9009,2022-11-11,96.37,0.022244


## 2. Assumptions and Limitations

### Key Assumptions:
- Oil prices follow a time series with potential structural breaks
- Major geopolitical/economic events can cause structural changes
- Log returns are more stationary than raw prices
- Change points occur at discrete time points

### Limitations:
- **Correlation vs Causation**: Statistical correlation in time does not prove causal impact
- Model assumes single change point (can be extended)
- External factors not captured in the model
- Market efficiency may already price in anticipated events

## 3. Change Point Models Purpose

Change point models help identify:
- **Structural breaks** in time series data
- **Regime changes** in market behavior
- **Timing** of significant shifts
- **Magnitude** of parameter changes

### Expected Outputs:
- Change point dates with confidence intervals
- Before/after parameter estimates (mean, volatility)
- Probability distributions for all parameters
- Model diagnostics and convergence metrics