# Task 1: Laying the Foundation for Analysis
## Change Point Analysis and Statistical Modeling of Brent Oil Prices

**Objective**: Define the data analysis workflow and develop a thorough understanding of the model and data.

**Due Date**: Interim Submission - Sunday, 08 Feb 2026, 8:00 PM UTC

In [None]:
# Cell 1: Import Libraries and Configure Logging
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import logging
import sys
from pathlib import Path

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.StreamHandler(sys.stdout),
        logging.FileHandler('task_1_analysis.log')
    ]
)

logger = logging.getLogger(__name__)
logger.info('Task 1: Laying the Foundation - Analysis Started')
logger.info(f'Timestamp: {datetime.now()}')

# Set visualization style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 6)

## Part 1: Define the Data Analysis Workflow

### Step 1.1: Document Analysis Pipeline

In [None]:
# Cell 2: Define Analysis Workflow
class DataAnalysisWorkflow:
    """
    Modularized workflow for Brent oil price analysis.
    Follows data science best practices with structured steps.
    """
    
    def __init__(self):
        self.logger = logging.getLogger(__name__)
        self.workflow_steps = []
    
    def document_workflow(self):
        """
        Define and document the complete analysis pipeline.
        """
        self.workflow_steps = [
            {
                'step': 1,
                'phase': 'Data Loading & Preparation',
                'tasks': [
                    'Load Brent oil price CSV',
                    'Convert date column to datetime format',
                    'Handle missing values',
                    'Sort data chronologically',
                    'Validate data quality'
                ],
                'dependencies': None,
                'tools': ['pandas', 'numpy']
            },
            {
                'step': 2,
                'phase': 'Exploratory Data Analysis (EDA)',
                'tasks': [
                    'Analyze time series properties (trend, seasonality)',
                    'Test stationarity (ADF test)',
                    'Calculate log returns',
                    'Analyze volatility patterns',
                    'Visualize price movements and shocks'
                ],
                'dependencies': [1],
                'tools': ['statsmodels', 'matplotlib', 'seaborn']
            },
            {
                'step': 3,
                'phase': 'Research & Event Compilation',
                'tasks': [
                    'Research major geopolitical events',
                    'Research OPEC policy decisions',
                    'Research economic/sanctions events',
                    'Compile structured event dataset',
                    'Create event CSV with dates and descriptions'
                ],
                'dependencies': None,
                'tools': ['csv', 'pandas']
            },
            {
                'step': 4,
                'phase': 'Model Understanding',
                'tasks': [
                    'Study Bayesian change point theory',
                    'Review PyMC documentation',
                    'Understand switch points and priors',
                    'Plan model architecture',
                    'Document assumptions and limitations'
                ],
                'dependencies': [2],
                'tools': ['PyMC', 'documentation']
            },
            {
                'step': 5,
                'phase': 'Change Point Modeling',
                'tasks': [
                    'Build Bayesian change point model in PyMC',
                    'Define priors and likelihood',
                    'Run MCMC sampling',
                    'Check convergence diagnostics',
                    'Extract and interpret posterior distributions'
                ],
                'dependencies': [1, 4],
                'tools': ['PyMC', 'arviz']
            },
            {
                'step': 6,
                'phase': 'Event Association & Interpretation',
                'tasks': [
                    'Map detected change points to dates',
                    'Match change points with historical events',
                    'Quantify price impact per event',
                    'Calculate percentage changes',
                    'Formulate hypotheses about causation'
                ],
                'dependencies': [3, 5],
                'tools': ['pandas', 'numpy', 'visualization']
            },
            {
                'step': 7,
                'phase': 'Dashboard Development',
                'tasks': [
                    'Design Flask API endpoints',
                    'Create React frontend components',
                    'Build interactive visualizations',
                    'Implement event highlighting',
                    'Add date range filtering'
                ],
                'dependencies': [5, 6],
                'tools': ['Flask', 'React', 'Recharts']
            },
            {
                'step': 8,
                'phase': 'Reporting & Communication',
                'tasks': [
                    'Create comprehensive report',
                    'Generate summary statistics',
                    'Produce final visualizations',
                    'Write executive summary',
                    'Document limitations and future work'
                ],
                'dependencies': [5, 6, 7],
                'tools': ['pandas', 'matplotlib', 'documentation']
            }
        ]
        
        self.logger.info(f'Workflow documented with {len(self.workflow_steps)} major phases')
        return self.workflow_steps
    
    def display_workflow(self):
        """
        Display the workflow in a readable format.
        """
        print('\n' + '='*80)
        print('DATA ANALYSIS WORKFLOW - BRENT OIL PRICES')
        print('='*80 + '\n')
        
        for step in self.workflow_steps:
            print(f"\nStep {step['step']}: {step['phase'].upper()}")
            print('-' * 60)
            for task in step['tasks']:
                print(f"  • {task}")
            print(f"  Tools: {', '.join(step['tools'])}")
            if step['dependencies']:
                print(f"  Dependencies: Steps {step['dependencies']}")
        print('\n' + '='*80)

# Execute workflow documentation
workflow = DataAnalysisWorkflow()
workflow.document_workflow()
workflow.display_workflow()

### Step 1.2: Load and Validate Data

In [None]:
# Cell 3: Data Loading Module
class DataLoader:
    """
    Modular class for loading and validating Brent oil price data.
    """
    
    def __init__(self, filepath):
        self.filepath = filepath
        self.logger = logging.getLogger(__name__)
        self.df = None
    
    def load_data(self):
        """
        Load CSV data and perform initial validation.
        """
        try:
            self.logger.info(f'Loading data from {self.filepath}')
            self.df = pd.read_csv(self.filepath)
            self.logger.info(f'Data loaded successfully. Shape: {self.df.shape}')
            return self.df
        except Exception as e:
            self.logger.error(f'Error loading data: {str(e)}')
            raise
    
    def preprocess_data(self):
        """
        Convert date column and handle missing values.
        """
        try:
            # Convert date column
            self.logger.info('Converting Date column to datetime format')
            self.df['Date'] = pd.to_datetime(self.df['Date'], format='%d-%b-%y')
            
            # Sort by date
            self.logger.info('Sorting data by date')
            self.df = self.df.sort_values('Date').reset_index(drop=True)
            
            # Convert price to numeric
            self.df['Price'] = pd.to_numeric(self.df['Price'], errors='coerce')
            
            # Check for missing values
            missing_count = self.df.isnull().sum().sum()
            if missing_count > 0:
                self.logger.warning(f'Found {missing_count} missing values')
                self.df = self.df.dropna()
                self.logger.info(f'After removing missing values: {self.df.shape}')
            else:
                self.logger.info('No missing values found')
            
            return self.df
        except Exception as e:
            self.logger.error(f'Error preprocessing data: {str(e)}')
            raise
    
    def validate_data(self):
        """
        Validate data quality and consistency.
        """
        self.logger.info('Validating data quality')
        
        validation_results = {
            'total_records': len(self.df),
            'date_range': f"{self.df['Date'].min().date()} to {self.df['Date'].max().date()}",
            'price_range': f"${self.df['Price'].min():.2f} - ${self.df['Price'].max():.2f}",
            'null_values': self.df.isnull().sum().to_dict(),
            'data_types': self.df.dtypes.to_dict()
        }
        
        print('\nDATA VALIDATION RESULTS')
        print('=' * 50)
        for key, value in validation_results.items():
            print(f'{key}: {value}')
        
        self.logger.info('Data validation completed successfully')
        return validation_results

# Execute data loading
data_loader = DataLoader('BrentOilPrices.csv')
df = data_loader.load_data()
df = data_loader.preprocess_data()
validation = data_loader.validate_data()

print('\nFirst few records:')
print(df.head(10))
print('\nLast few records:')
print(df.tail(10))

### Step 1.3: Research and Compile Major Events

In [None]:
# Cell 4: Event Compilation Module
class EventCompiler:
    """
    Module for compiling major geopolitical and economic events
    that may have impacted Brent oil prices.
    """
    
    def __init__(self):
        self.logger = logging.getLogger(__name__)
        self.events = []
    
    def compile_events(self):
        """
        Compile a comprehensive list of major events from 1987-2022.
        Focus on: Geopolitical, OPEC decisions, sanctions, and economic shocks.
        """
        self.events = [
            {
                'date': '1990-08-02',
                'event_name': 'Iraq Invasion of Kuwait',
                'category': 'Geopolitical Conflict',
                'description': 'Iraqi invasion triggers major oil supply concerns and price spike'
            },
            {
                'date': '1991-01-17',
                'event_name': 'Gulf War Begins',
                'category': 'Armed Conflict',
                'description': 'Operation Desert Storm commences, oil markets volatile'
            },
            {
                'date': '1997-07-02',
                'event_name': 'Asian Financial Crisis',
                'category': 'Economic Crisis',
                'description': 'Currency collapse spreads across Asia, demand shock for commodities'
            },
            {
                'date': '2001-09-11',
                'event_name': 'September 11 Attacks',
                'category': 'Terrorism/Geopolitical',
                'description': 'Terrorist attacks in US impact global markets and economic outlook'
            },
            {
                'date': '2003-03-20',
                'event_name': 'Iraq War Begins',
                'category': 'Armed Conflict',
                'description': 'US-led invasion of Iraq, supply disruptions expected'
            },
            {
                'date': '2004-01-01',
                'event_name': 'Oil Prices Begin Sustained Rise',
                'category': 'Market Trend',
                'description': 'Start of the 2004-2008 oil boom, driven by supply constraints and demand'
            },
            {
                'date': '2008-07-11',
                'event_name': 'Oil Prices Peak',
                'category': 'Price Peak',
                'description': 'Brent crude reaches all-time high of ~$145/barrel'
            },
            {
                'date': '2008-09-15',
                'event_name': 'Lehman Brothers Collapse',
                'category': 'Financial Crisis',
                'description': 'Global financial crisis triggers demand collapse and oil price crash'
            },
            {
                'date': '2011-03-15',
                'event_name': 'Libyan Civil War',
                'category': 'Geopolitical Conflict',
                'description': 'Uprising in Libya disrupts major oil production, prices spike'
            },
            {
                'date': '2014-06-01',
                'event_name': 'Oil Price Decline Begins',
                'category': 'Market Trend',
                'description': 'Saudi Arabia increases production, leading to sustained price decline'
            },
            {
                'date': '2014-11-27',
                'event_name': 'OPEC Abandons Production Cuts',
                'category': 'OPEC Policy',
                'description': 'OPEC decision to maintain production accelerates oil price collapse'
            },
            {
                'date': '2016-02-11',
                'event_name': 'Oil Prices Hit Bottom',
                'category': 'Price Low',
                'description': 'Brent crude falls to ~$26/barrel during global supply glut'
            },
            {
                'date': '2016-11-30',
                'event_name': 'OPEC Announces Production Cuts',
                'category': 'OPEC Policy',
                'description': 'OPEC agrees to reduce production to support prices'
            },
            {
                'date': '2020-02-01',
                'event_name': 'COVID-19 Pandemic Begins',
                'category': 'Health Crisis',
                'description': 'Global pandemic triggers economic shutdown and oil demand collapse'
            },
            {
                'date': '2020-04-20',
                'event_name': 'Oil Prices Turn Negative',
                'category': 'Price Anomaly',
                'description': 'May crude futures turn negative for first time in history'
            }
        ]
        
        self.logger.info(f'Compiled {len(self.events)} major events')
        return self.events
    
    def export_to_csv(self, filepath='major_events.csv'):
        """
        Export compiled events to CSV for reference.
        """
        try:
            events_df = pd.DataFrame(self.events)
            events_df.to_csv(filepath, index=False)
            self.logger.info(f'Events exported to {filepath}')
            return events_df
        except Exception as e:
            self.logger.error(f'Error exporting events: {str(e)}')
            raise
    
    def display_events(self):
        """
        Display events in a formatted table.
        """
        events_df = pd.DataFrame(self.events)
        print('\nMAJOR EVENTS IMPACTING BRENT OIL PRICES (1987-2022)')
        print('=' * 100)
        print(events_df.to_string(index=False))
        print('=' * 100)
        return events_df

# Execute event compilation
event_compiler = EventCompiler()
events = event_compiler.compile_events()
events_df = event_compiler.display_events()
events_df = event_compiler.export_to_csv()

# Summary statistics
print('\nEVENT CATEGORY SUMMARY')
print(events_df['category'].value_counts())

### Step 1.4: Document Assumptions and Limitations

In [None]:
# Cell 5: Assumptions and Limitations Documentation
class AssumptionsAndLimitations:
    """
    Document key assumptions and limitations of the analysis.
    """
    
    def __init__(self):
        self.logger = logging.getLogger(__name__)
    
    def document(self):
        doc = {
            'assumptions': [
                {
                    'area': 'Data Quality',
                    'assumption': 'Brent oil price data is accurate and complete',
                    'rationale': 'Data sourced from established financial databases',
                    'risk': 'Low - industry standard data source'
                },
                {
                    'area': 'Model Specification',
                    'assumption': 'A single change point exists in the mean price',
                    'rationale': 'Simplified model for interpretability; extended models can test multiple change points',
                    'risk': 'Medium - real data may have multiple regime shifts'
                },
                {
                    'area': 'Statistical Independence',
                    'assumption': 'Daily price changes are conditionally independent given the regime',
                    'rationale': 'Simplifying assumption for tractable inference',
                    'risk': 'Medium - oil prices show autocorrelation and clustering'
                },
                {
                    'area': 'Normality',
                    'assumption': 'Log returns approximately follow a normal distribution',
                    'rationale': 'Standard assumption in financial modeling',
                    'risk': 'Medium - market data often exhibits heavy tails'
                },
                {
                    'area': 'Event Timing',
                    'assumption': 'Events occur on documented dates with immediate market impact',
                    'rationale': 'Market-efficient hypothesis assumption',
                    'risk': 'High - market reactions may lag or anticipate events'
                },
                {
                    'area': 'Causal Attribution',
                    'assumption': 'Detected change points can be attributed to identified events',
                    'rationale': 'For hypothesis generation and interpretation',
                    'risk': 'High - correlation does not imply causation'
                }
            ],
            'limitations': [
                {
                    'category': 'Temporal Scope',
                    'limitation': 'Analysis covers 1987-2022; pre-1987 dynamics may differ',
                    'impact': 'Results apply primarily to modern energy markets'
                },
                {
                    'category': 'Univariate Analysis',
                    'limitation': 'Only examines Brent oil prices; ignores other commodities and macroeconomic variables',
                    'impact': 'Cannot capture spillovers or multivariate relationships'
                },
                {
                    'category': 'Event Data Quality',
                    'limitation': 'Event dates are approximate; exact market impact timing is uncertain',
                    'impact': 'Change point may lead or lag reported event date'
                },
                {
                    'category': 'Model Simplicity',
                    'limitation': 'Bayesian change point model assumes constant variance within regimes',
                    'impact': 'Cannot capture regime-specific volatility changes'
                },
                {
                    'category': 'Inference Uncertainty',
                    'limitation': 'MCMC sampling introduces uncertainty in posterior estimates',
                    'impact': 'Results should be interpreted probabilistically, not deterministically'
                },
                {
                    'category': 'Confounding Factors',
                    'limitation': 'Cannot isolate individual event effects when multiple events occur simultaneously',
                    'impact': 'Attribution to specific causes becomes ambiguous'
                }
            ],
            'correlation_vs_causation': {
                'key_distinction': 'A detected change point coinciding with an event does NOT prove causation',
                'correlation': 'Temporal association between change point and event',
                'causation': 'Event directly caused the price regime shift',
                'requirements_for_causation': [
                    'Temporal precedence (event must occur before effect)',
                    'Covariation (change point timing matches event)',
                    'No plausible alternative explanations',
                    'Mechanism (clear economic rationale)',
                    'Dose-response relationship (larger shocks produce larger effects)'
                ],
                'approach': 'This analysis identifies correlations and formulates hypotheses; causation requires additional evidence (e.g., IV models, natural experiments, expert validation)'
            }
        }
        
        return doc
    
    def export_to_file(self, doc, filepath='assumptions_and_limitations.txt'):
        """
        Export detailed documentation to file.
        """
        with open(filepath, 'w') as f:
            f.write('ASSUMPTIONS AND LIMITATIONS DOCUMENTATION\n')
            f.write('='*80 + '\n\n')
            
            f.write('KEY ASSUMPTIONS\n')
            f.write('-'*80 + '\n')
            for i, assumption in enumerate(doc['assumptions'], 1):
                f.write(f"\n{i}. {assumption['area']}\n")
                f.write(f"   Assumption: {assumption['assumption']}\n")
                f.write(f"   Rationale: {assumption['rationale']}\n")
                f.write(f"   Risk Level: {assumption['risk']}\n")
            
            f.write('\n\nKEY LIMITATIONS\n')
            f.write('-'*80 + '\n')
            for i, limitation in enumerate(doc['limitations'], 1):
                f.write(f"\n{i}. {limitation['category']}\n")
                f.write(f"   Limitation: {limitation['limitation']}\n")
                f.write(f"   Impact: {limitation['impact']}\n")
            
            f.write('\n\nCORRELATION VS CAUSATION\n')
            f.write('-'*80 + '\n')
            cv = doc['correlation_vs_causation']
            f.write(f"\nKey Distinction: {cv['key_distinction']}\n\n")
            f.write(f"Correlation: {cv['correlation']}\n")
            f.write(f"Causation: {cv['causation']}\n\n")
            f.write("Requirements for Establishing Causation:\n")
            for j, req in enumerate(cv['requirements_for_causation'], 1):
                f.write(f"  {j}. {req}\n")
            f.write(f"\nApproach: {cv['approach']}\n")
        
        self.logger.info(f'Assumptions and limitations exported to {filepath}')

# Execute documentation
doc_module = AssumptionsAndLimitations()
doc_content = doc_module.document()
doc_module.export_to_file(doc_content)

# Display summary
print('\nKEY ASSUMPTIONS SUMMARY')
print('='*80)
for assumption in doc_content['assumptions']:
    print(f"\n{assumption['area']}:")
    print(f"  • {assumption['assumption']}")
    print(f"  • Risk: {assumption['risk']}")

print('\n\nCORRELATION VS CAUSATION - KEY POINT')
print('='*80)
print(doc_content['correlation_vs_causation']['key_distinction'])
print(f"\nThis analysis: {doc_content['correlation_vs_causation']['approach']}")

## Part 2: Time Series Properties Analysis

In [None]:
# Cell 6: Time Series Properties Analysis
class TimeSeriesAnalyzer:
    """
    Analyze key properties of the Brent oil price time series.
    """
    
    def __init__(self, data):
        self.data = data
        self.logger = logging.getLogger(__name__)
    
    def calculate_log_returns(self):
        """
        Calculate log returns for stationarity analysis.
        """
        self.logger.info('Calculating log returns')
        self.data['Log_Returns'] = np.log(self.data['Price'] / self.data['Price'].shift(1))
        self.data['Log_Returns'] = self.data['Log_Returns'].fillna(0)
        return self.data
    
    def plot_price_series(self):
        """
        Visualize raw price series over time.
        """
        plt.figure(figsize=(16, 6))
        plt.plot(self.data['Date'], self.data['Price'], linewidth=1.5, color='darkblue')
        plt.title('Brent Crude Oil Prices (1987-2022)', fontsize=14, fontweight='bold')
        plt.xlabel('Year')
        plt.ylabel('Price (USD per barrel)')
        plt.grid(True, alpha=0.3)
        plt.tight_layout()
        plt.savefig('01_brent_price_series.png', dpi=300, bbox_inches='tight')
        plt.show()
        self.logger.info('Price series plot saved')
    
    def plot_log_returns(self):
        """
        Visualize log returns and volatility clustering.
        """
        fig, axes = plt.subplots(2, 1, figsize=(16, 10))
        
        # Log returns
        axes[0].plot(self.data['Date'], self.data['Log_Returns'], linewidth=0.5, color='darkgreen')
        axes[0].set_title('Daily Log Returns of Brent Oil Prices', fontsize=12, fontweight='bold')
        axes[0].set_ylabel('Log Return')
        axes[0].grid(True, alpha=0.3)
        
        # Rolling volatility
        rolling_vol = self.data['Log_Returns'].rolling(window=30).std()
        axes[1].plot(self.data['Date'], rolling_vol, linewidth=1, color='darkred')
        axes[1].set_title('30-Day Rolling Volatility', fontsize=12, fontweight='bold')
        axes[1].set_ylabel('Standard Deviation')
        axes[1].set_xlabel('Year')
        axes[1].grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.savefig('02_log_returns_volatility.png', dpi=300, bbox_inches='tight')
        plt.show()
        self.logger.info('Log returns and volatility plot saved')
    
    def summary_statistics(self):
        """
        Calculate descriptive statistics.
        """
        self.logger.info('Calculating summary statistics')
        
        stats = {
            'Price': {
                'Mean': self.data['Price'].mean(),
                'Median': self.data['Price'].median(),
                'Std Dev': self.data['Price'].std(),
                'Min': self.data['Price'].min(),
                'Max': self.data['Price'].max(),
                'Range': self.data['Price'].max() - self.data['Price'].min(),
                'CV': (self.data['Price'].std() / self.data['Price'].mean()) * 100
            },
            'Log_Returns': {
                'Mean': self.data['Log_Returns'].mean(),
                'Median': self.data['Log_Returns'].median(),
                'Std Dev': self.data['Log_Returns'].std(),
                'Min': self.data['Log_Returns'].min(),
                'Max': self.data['Log_Returns'].max(),
                'Skewness': self.data['Log_Returns'].skew(),
                'Kurtosis': self.data['Log_Returns'].kurtosis()
            }
        }
        
        print('\nTIME SERIES SUMMARY STATISTICS')
        print('='*60)
        print('\nPRICE STATISTICS:')
        for key, value in stats['Price'].items():
            if key == 'CV':
                print(f'  {key}: {value:.2f}%')
            else:
                print(f'  {key}: {value:.4f}')
        
        print('\nLOG RETURNS STATISTICS:')
        for key, value in stats['Log_Returns'].items():
            print(f'  {key}: {value:.6f}')
        
        return stats

# Execute time series analysis
ts_analyzer = TimeSeriesAnalyzer(df.copy())
df = ts_analyzer.calculate_log_returns()
ts_analyzer.plot_price_series()
ts_analyzer.plot_log_returns()
stats = ts_analyzer.summary_statistics()

### Step 1.5: Stationarity Testing

In [None]:
# Cell 7: Stationarity Testing
from statsmodels.tsa.stattools import adfuller, kpss

class StationarityTester:
    """
    Test for stationarity using ADF and KPSS tests.
    """
    
    def __init__(self, data):
        self.data = data
        self.logger = logging.getLogger(__name__)
    
    def adf_test(self, series, name):
        """
        Perform Augmented Dickey-Fuller test.
        Null hypothesis: unit root (non-stationary)
        """
        self.logger.info(f'Running ADF test on {name}')
        result = adfuller(series.dropna(), autolag='AIC')
        
        return {
            'name': name,
            'test': 'ADF',
            'test_statistic': result[0],
            'p_value': result[1],
            'critical_values': result[4],
            'stationary': result[1] < 0.05
        }
    
    def kpss_test(self, series, name):
        """
        Perform KPSS test.
        Null hypothesis: stationarity
        """
        self.logger.info(f'Running KPSS test on {name}')
        result = kpss(series.dropna(), regression='c')
        
        return {
            'name': name,
            'test': 'KPSS',
            'test_statistic': result[0],
            'p_value': result[1],
            'critical_values': result[3],
            'stationary': result[1] > 0.05
        }
    
    def perform_tests(self):
        """
        Perform all stationarity tests.
        """
        results = []
        
        # Test on price series
        results.append(self.adf_test(self.data['Price'], 'Price Series'))
        results.append(self.kpss_test(self.data['Price'], 'Price Series'))
        
        # Test on log returns
        results.append(self.adf_test(self.data['Log_Returns'], 'Log Returns'))
        results.append(self.kpss_test(self.data['Log_Returns'], 'Log Returns'))
        
        return results
    
    def display_results(self, results):
        """
        Display test results in readable format.
        """
        print('\nSTATIONARITY TEST RESULTS')
        print('='*80)
        
        for result in results:
            print(f"\n{result['name']} - {result['test']} Test")
            print('-'*60)
            print(f"  Test Statistic: {result['test_statistic']:.6f}")
            print(f"  P-value: {result['p_value']:.6f}")
            print(f"  Stationary (α=0.05): {result['stationary']}")
            
            if result['test'] == 'ADF':
                print(f"  Interpretation: Reject H0 (non-stationary) at 5% level" if result['stationary'] 
                      else f"  Interpretation: Fail to reject H0 - likely non-stationary")
            else:
                print(f"  Interpretation: Fail to reject H0 (stationary) at 5% level" if result['stationary']
                      else f"  Interpretation: Reject H0 - likely non-stationary")
        
        print('\n' + '='*80)
        print('\nSUMMARY FOR MODELING:')
        print('  • Price Series: Non-stationary (as expected for prices))')
        print('  • Log Returns: Stationary (suitable for modeling)')
        print('  → Recommend using log returns or differencing in change point model')

# Execute stationarity tests
stationarity_tester = StationarityTester(df)
test_results = stationarity_tester.perform_tests()
stationarity_tester.display_results(test_results)

## Summary and Next Steps

In [None]:
# Cell 8: Task 1 Summary and Deliverables Checklist
print('\n' + '='*80)
print('TASK 1 COMPLETION SUMMARY')
print('='*80)

print(f'''
✓ DELIVERABLES COMPLETED:

1. Data Analysis Workflow
   - 8-phase comprehensive workflow documented
   - All steps, tools, and dependencies defined
   - Ready for implementation

2. Data Preparation & Validation
   - Data loaded: {len(df)} records
   - Date range: {df['Date'].min().date()} to {df['Date'].max().date()}
   - Price range: ${df['Price'].min():.2f} - ${df['Price'].max():.2f}
   - Missing values: Handled
   - Data quality: Validated ✓

3. Major Events Compilation
   - 15 key geopolitical and economic events compiled
   - Categories: Conflicts, OPEC decisions, financial crises
   - Exported to: major_events.csv

4. Assumptions & Limitations Documentation
   - 6 core assumptions documented with risk levels
   - 6 limitations identified with impact assessment
   - Correlation vs Causation framework explained
   - File: assumptions_and_limitations.txt

5. Time Series Analysis
   - Summary statistics calculated
   - Stationarity tests completed
   - Visualizations generated:
     * Price series plot
     * Log returns and volatility plots

6. Ready for Next Phase:
   - Data prepared and validated
   - Event data compiled and structured
   - Assumptions documented
   - Properties understood for modeling

NEXT STEPS:
→ Move to Task 2: Bayesian Change Point Modeling
→ Build PyMC model with identified properties
→ Run MCMC sampling and analyze results

FILES GENERATED:
- major_events.csv
- assumptions_and_limitations.txt
- 01_brent_price_series.png
- 02_log_returns_volatility.png
- task_1_analysis.log
''')

print('='*80)
logger.info('Task 1: Foundation Analysis - COMPLETED')