<a target="_blank" href="https://colab.research.google.com/github/ZHAW-ZAV/TSO-FS25-students/blob/main/03_forecasting/03_03_arima.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
import statsmodels.api as sm

import sys
import os

IN_COLAB = "google.colab" in sys.modules

The code above loads the data, do not modify.

***

# TSO Semester Week 4: ARIMA Models

In this exercise, we will focus on all topics mentioned in the **TSO forecasting script**, Sections *"ARIMA Models"*. Consequently, this exercise focuses on working with time series data, handling processing, and explore fit *ARIMA* and *SARIMA* models for prediction.

This exercise consists of the following eight parts:
1. Importing and Processing Time Series Data
2. Time Series Visualization
3. Autocorrelation Plot (ACF)
4. ARIMA model
5. SARIMA model

***
## PART 1: Importing and Processing Time Series Data


### Tasks:
1. Import the *CO2* data set available in the *statsmodels* library, which represent monthly observations of CO2 levels starting from 1959-01.
2. Transform the *pandas* dataframe such that:
- Create a *Month* column that is a datetime describing the timestamp of the time series. Declare it as index.
- Drop the initial time column that is a float object and not a datetime object.

### Import CO2 dataset

In [None]:
# Load the CO2 dataset
data = sm.datasets.get_rdataset('co2', 'datasets').data
data.head()

### Transform the dataframe

In [None]:
# Transform the time column to timestamp and declare it as index

#Visualize the dataframe

***
## PART 2: Time Series Visualization

### Tasks:
1. Add a column to the data for the *rolling mean* over 12 months.
2. Add a column to the data for the *rolling standard deviation* over 12 months.
3. Plot the original time series and the rolling mean.
4. Add the standard deviation using matplotlib *fill_between* function.
5. Add appropriate labels and grid lines to enhance readability of your plots.

In [None]:
# Calculate the rolling mean and rolling standard deviation over 12 months

# Plot the original time series, the rolling mean and the rolling standard deviation
plt.figure(figsize=(20, 6))
### PLot TS
### Plot Rollling mean
### Plot Rolling std
plt.title("Original Time Series of Co2 Concentration")
plt.xlabel('Time')
plt.ylabel('CO2 Concentration in ppmv')
plt.grid(True)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.legend()
plt.show()

### Questions:
1. Does the time series has a trend?
2. Does the time series has a seasonal component?
3. Does the variance depends on the time of observation?
4. Is the time series stationary?

*** 
## PART 3: Autocorrelation Plot (ACF)

### Tasks:
1. Use the *statsmodels* function called *plot_acf* to plot the autocorrelation plot of the time series.
2. Add appropriate labels and grid lines to enhance readability of your plots.


In [None]:
# import the autocorrelation function plot
from statsmodels.graphics.tsaplots import plot_acf

# Plot the autocorrelogram
fig, ax = plt.subplots(figsize=(12, 6))
### Plot ACF

plt.xlabel('Lags h', fontsize = 14)
plt.title('Autocorrelation Function with Confidence Interval', fontsize = 14)
plt.grid(True)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.show()

### Questions - Can you conlcude about:
- The Trend?
- The Seasonality?
- The Stationarity?

### Make assumptions about the ype of model to use:
- AR models / ARMA models (without the integrate part) / ARIMA models / SARIMA models?

*** 
## PART 4: ARIMA Model

### Tasks:
1. Use the *auto_arima* function form the *pmdarima* library to fit an ARIMA model on the time series.
2. Print the summary of the fitting.
3. Plot the initial time series, the fitted model, the prediction for the next 48 months and the 95% confidence interval for the prediction.
4. Use the *plot_diagnostic* method to evaluate the residuals of the fitted model.

In [None]:
# import package
import pmdarima as pm

# Automatically select the best ARIMA model
model_arima = 

#Print the results of the fitting
print(model_arima.summary())

In [None]:
forecast_steps = 48
#Make a prediction

# Extend the index for plotting
forecast_index = pd.date_range(start=data.index[-1] + pd.Timedelta(days=30),
                               periods=forecast_steps, freq='M')

# Plot the original data and forecasted data
plt.figure(figsize=(12, 6))
### Plot original TS
### Plot ARIMA Fitted values
### Plot forecast
### Plot confidence interval using fill between
plt.title(f"Non seasonal ARIMA{model_arima.order} Model Forecast")
plt.legend()
plt.xlabel("Time")
plt.ylabel("Values")
plt.grid(True)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.grid(True)

In [None]:
# Evaluate the residuals of the model with plot_diagnostic
###Plot diagnostoc
plt.show()

### Questions:
- Which ARIMA order are found? Does it makes sense? Do you think the model is suitable?
- What can you say about the fit of the model regarding the true time series? What can you say about the prediction?
- What can you say about the white noise nature of the residuals? Should we explore a more complex model?

***
## PART 5: SARIMA Model

### Tasks:
1. Use the *auto_arima* function form the *pmdarima* library to fit a SARIMA model on the time series.
2. Print the summary of the fitting.
3. Plot the initial time series, the fitted model, the prediction for the next 48 months and the 95% confidence interval for the prediction.
4. Use the *plot_diagnostic* method to evaluate the residuals of the fitted model.

In [None]:
# Fit the best SARIMA model
model_sarima = 

# Print the results of the fitting
print(model_sarima.summary())


In [None]:
forecast_steps = 48
# Make a prediction

# Extend the index for plotting
forecast_index = pd.date_range(start=data.index[-1] + pd.Timedelta(days=30),
                               periods=forecast_steps, freq='M')

# Plot the original data and forecast
plt.figure(figsize=(12, 6))
### Plot original TS
### Plot Fitted values
### Plot forecast
### Plot confidence interval
plt.title(f"SARIMA{model_sarima.order}{model_sarima.seasonal_order} Model Forecast")
plt.legend()
plt.xlabel("Time")
plt.ylabel("Values")
plt.grid(True)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.grid(True)

In [None]:
# Analyse the residuals of the model with plot_diagnostic

plt.show()

### Questions:
- What are the orders for the non-seasonal and the seasonal part of the SARIMA? How do they compare to the fitted ARIMA?
- What can you say about the fitted model and the prediction? How do you compare the results with the non-seasonal ARIMA?
- What can you say about the residuals?