# Time Series Forecasting with ARIMA

This notebook demonstrates a step-by-step approach to forecasting time series data using the ARIMA model. 
We cover data preparation, stationarity checking, parameter selection, model training, forecasting, 
and evaluation against actual data for 2023.


In [1]:
import pandas as pd
import numpy as np
from statsmodels.tsa.stattools import adfuller, acf, pacf
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, mean_absolute_error


## Data Preparation

Load and prepare the time series data for analysis.


In [2]:
df_2019 = pd.read_csv('2019.csv')
df_2020 = pd.read_csv('2020.csv')
df_2021 = pd.read_csv('2021.csv')
df_2022 = pd.read_csv('2022.csv')
df_2023 = pd.read_csv('2023.csv')

df_train = pd.concat([df_2019, df_2020, df_2021, df_2022], ignore_index=True)
df_train.fillna(method='ffill', inplace=True)
df_2023.fillna(method='ffill', inplace=True)
    

In [3]:
# Assuming data is loaded into a DataFrame named 'combined_data'
# This should be replaced with actual data loading code
data_agg = df_train.groupby('Date')['Value'].sum()


KeyError: 'Date'

## Stationarity Check

Check if the time series data is stationary.


In [None]:
def check_stationarity(timeseries):
    result = adfuller(timeseries, autolag='AIC')
    print('ADF Statistic:', result[0])
    print('p-value:', result[1])
    print('Critical Values:')
    for key, value in result[4].items():
        print(f'    {key}: {value}')

check_stationarity(data_agg)


## Parameter Selection

Utilize ACF and PACF plots to choose ARIMA parameters.


In [None]:
plt.figure(figsize=(12, 6))
plt.subplot(121)
plot_acf(data_agg.diff().dropna(), ax=plt.gca(), lags=20)
plt.subplot(122)
plot_pacf(data_agg.diff().dropna(), ax=plt.gca(), lags=20)
plt.tight_layout()
plt.show()


## Model Training and Forecasting

Train the ARIMA model and forecast future values.


In [None]:
model = ARIMA(data_agg, order=(1, 1, 1))
model_fit = model.fit()
forecast = model_fit.forecast(steps=12)

plt.figure(figsize=(14, 7))
plt.plot(data_agg.index, data_agg, label='Historical Aggregate Value')
plt.plot(pd.date_range(data_agg.index[-1], periods=13, closed='right'), forecast, label='Forecasted Value', marker='o')
plt.title('ARIMA Forecast')
plt.xlabel('Date')
plt.ylabel('Aggregated Value')
plt.legend()
plt.show()


## Evaluation

Compare the forecasted values against actual data for 2023.


In [None]:
# Assuming 'actual_2023' contains the actual values for 2023
mse = mean_squared_error(actual_2023, forecast[:len(actual_2023)])
mae = mean_absolute_error(actual_2023, forecast[:len(actual_2023)])

print(f'MSE: {mse}')
print(f'MAE: {mae}')
