# Oil Price Analysis

- Loading Oil Price Data

- Understanding the Data

- Simple Summary Statistic

- Handling Missing Values

- Handling Duplicates

- Handle Outliers if any

- Data Visualization

Import neccessary libraries and modules

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os, sys
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller
# Add the 'scripts' directory to the Python path for module imports
sys.path.append(os.path.abspath(os.path.join('..', 'scripts')))

# Set max rows and columns to display
pd.set_option('display.max_columns', 200)
pd.set_option('display.max_rows', 200)

# Configure logging
from logger import SetupLogger
# Assuming this class is defined in scripts/
from data_preprocessor import DataPreprocessor
from data_visualizer import DataVisualizer

logger = SetupLogger(log_file='../logs/notebooks.log').get_logger()

# Set default figure size for all plots
plt.rcParams['figure.figsize'] = (14, 7)

: 

**Data Collection**

- Download the brent oil price data

In [None]:
# Set google drive url for loading the dataset
url = 'https://drive.google.com/file/d/1dJfhjX57bjvFnc1HHYVUhyW939QjIQE5/view?usp=drive_link'

# Setup the data preprocessor class
processor = DataPreprocessor(url, logger=logger)
# Load the data
price_data = processor.load_data()

**Note**: always check <a href="../logs/notebooks.log">Log</a> file for any log messages

In [None]:
# Explore the first 10 rows
price_data.head(10)

In [None]:
# Explore the last 10 rows
price_data.tail(10)

### Data Cleaning and Preprocessing

Inspect the dataset for completeness and structure

In [None]:
processor.inspect(price_data)

### Exploratory Data Analysis

In [None]:
# Create instances of the DataVisualizer class
visualizer = DataVisualizer(price_data, logger = logger)

**Detect Outliers**

- Box plot

In [None]:
visualizer.plot_box()

- No outliers detected in the price data

- Descriptive Statistics

In [None]:
# Visualize the price distribution
visualizer.plot_price_distribution()


**Time Series Analysis**
- Overall price trend

    - Create a line graph to visualize price trends over time.

In [None]:
visualizer.plot_price_over_time()


- **Seasonality Analysis**
    - Aggregate prices by year and visualize seasonal patterns.

In [None]:
visualizer.plot_yearly_average()

- **Rolling Volatility Analysis**

    - Rolling standard deviation (e.g., 30-day) line plot of prices.

In [None]:
# 30-Day Volatile
visualizer.plot_rolling_volatility(30)

In [None]:
# 7-Day volatile
visualizer.plot_rolling_volatility(7)

**Seasonal Decomposition**

In [None]:
# Apply seasonal decomposition (use model='multiplicative' for data with proportional seasonal effects)
result = seasonal_decompose(price_data['Price'], model='multiplicative', period=365)  # Adjust period if necessary

# Plot decomposition
fig, axes = plt.subplots(4, 1, figsize=(15, 12), sharex=True)
components = ['Original Series', 'Trend', 'Seasonal', 'Residual']
series = [price_data['Price'], result.trend, result.seasonal, result.resid]

for ax, comp, ser in zip(axes, components, series):
    ax.plot(price_data.index, ser, label=comp, color='b' if comp == 'Original Series' else 'purple')
    ax.set_title(f'{comp} of Price Series')
    ax.set_ylabel('Price' if comp == 'Original Series' else '')
    ax.grid(True)
    ax.legend(loc='upper left')

# Improve layout and display the plot
fig.suptitle('Seasonal Decomposition of Brent Oil Price', fontsize=16)
plt.tight_layout(rect=[0, 0, 1, 0.96])  # Adjust layout to fit the main title
plt.show()

**Stationaly Analysis of Time Series Data**

- Apply the Augmented Dickey-Fuller (ADF) Test

In [None]:
def test_stationarity(series, title, label, alpha=0.05):
    adf_result = adfuller(series)
    
    # Print ADF results
    print('ADF Statistic:', adf_result[0])
    print('p-value:', adf_result[1])
    print('Critical Values:')
    for key, value in adf_result[4].items():
        print(f'   {key}: {value}')
    
    # Interpretation
    if adf_result[1] < alpha:
        print("The ADF test suggests the series is stationary.")
    else:
        print("The ADF test suggests the series is not stationary.")

    # Plot the differenced series
    plt.figure(figsize=(10, 6))
    plt.plot(series, label=label)
    plt.title(title)
    plt.xlabel('Time')
    plt.ylabel('Differenced Price')
    plt.legend()
    plt.grid()
    plt.show()
    
    return adf_result[0], adf_result[1]  # Returning ADF statistic and p-value for further use

# Assuming 'price_data' is your DataFrame with a 'Price' column
data = price_data['Price']

# First differencing
data_diff = data.diff(12).dropna()
test_stationarity(data_diff, title='First Differenced Brent Oil Prices', label='First Differenced Series')

In [None]:
# Log transformation
log_data = np.log(price_data['Price'])
log_data_diff = log_data.diff().dropna()
test_stationarity(log_data_diff, title='Log Differenced Brent Oil Prices', label='Log Differenced Series')
