# Task 3: Forecasting Brent Oil Prices

In this notebook, we will develop a forecasting model for Brent oil prices. This involves:
1. Checking for stationarity in the time series data.
2. Preparing the data for model training and testing.
3. Building a time series forecasting model, starting with ARIMA.

In [None]:
# Import necessary libraries
import pandas as pd
from statsmodels.tsa.stattools import adfuller
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, mean_absolute_error
import os
from statsmodels.tsa.arima.model import ARIMA
from pmdarima import auto_arima
os.chdir(r'c:\users\ermias.tadesse\10x\Oil-Price-Insights')  # Set the working directory to the project root
# Load the Brent Oil Prices dataset
file_path = 'Data/Raw/BrentOilPrices.csv'

# Load the Brent Oil Prices dataset
# data = pd.read_csv('../Data/BrentOilPrice.csv', index_col='Date', parse_dates=True)
data = pd.read_csv(file_path)
data = data[['Price']].dropna()


In [4]:
# Perform the ADF test for stationarity
result = adfuller(data['Price'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])

# Interpretation of the result
if result[1] > 0.05:
    print("The data is likely non-stationary. Consider differencing for stationary models.")
else:
    print("The data is likely stationary.")


ADF Statistic: -1.993856011392467
p-value: 0.2892735048934032
The data is likely non-stationary. Consider differencing for stationary models.


In [5]:
# Split data into train and test sets (e.g., 80% train, 20% test)
train_size = int(len(data) * 0.8)
train, test = data[:train_size], data[train_size:]

print(f"Training data points: {len(train)}, Testing data points: {len(test)}")


Training data points: 7208, Testing data points: 1803


### Step 3: Building the ARIMA Model

We will build an ARIMA model for forecasting Brent oil prices. The steps include:
1. Using auto_arima to find the best (p, d, q) parameters.
2. Fitting the ARIMA model with the training data.
3. Forecasting and evaluating the model performance on the test data.


In [None]:
# Use auto_arima to find the best parameters for the ARIMA model
stepwise_model = auto_arima(train, start_p=1, start_q=1,
                            max_p=3, max_q=3, m=1,
                            start_P=0, seasonal=False,
                            d=None, trace=True,
                            error_action='ignore',  
                            suppress_warnings=True, 
                            stepwise=True)

print(stepwise_model.summary())


ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

In [7]:
# Fit the ARIMA model with optimal parameters
p, d, q = stepwise_model.order
arima_model = ARIMA(train, order=(p, d, q))
arima_model_fit = arima_model.fit()

# Forecasting
forecast = arima_model_fit.forecast(steps=len(test))
forecast.index = test.index

# Plotting the forecast against actual prices
plt.figure(figsize=(12, 6))
plt.plot(train, label='Training Data')
plt.plot(test, label='Actual Prices')
plt.plot(forecast, label='Forecasted Prices', color='red')
plt.title("ARIMA Model Forecast vs Actual Prices")
plt.xlabel("Date")
plt.ylabel("Price")
plt.legend()
plt.show()


NameError: name 'stepwise_model' is not defined

In [8]:
# Evaluate the model's performance
mse = mean_squared_error(test, forecast)
mae = mean_absolute_error(test, forecast)

print(f"Mean Squared Error: {mse}")
print(f"Mean Absolute Error: {mae}")


NameError: name 'forecast' is not defined