<a target="_blank" href="https://colab.research.google.com/github/ZHAW-ZAV/TSO-FS25-students/blob/main/03_forecasting/03_02_stationarity.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
import statsmodels.api as sm

import sys
import os

IN_COLAB = "google.colab" in sys.modules

The code above loads the data, do not modify.

***

# TSO Semester Week 4: Stationarity

In this exercise, we will focus on all topics mentioned in the **TSO forecasting script**, Sections *"Stationarity, Autocorrelation, and Differencing"*. Consequently, this exercise focuses on working with time series data, handling processing, and explore the stationarity property.

This exercise consists of the following eight parts:
1. Importing and Processing Time Series Data
2. Time Series Visualization
3. Autocorrelation Plot (ACF)
4. Differencing
5. Seasonal Differencing
6. Transforming into Stationary Time Series
7. Testing Stationarity

***
## PART 1: Importing and Processing Time Series Data


### Tasks:
1. Import the *El Nino - Sea Surface Temperatures* data set available in the *statsmodels* library.
2. Transform the *pandas* dataframe such that:
- Transform the *YEAR* column to a datetime object
- Set the dataframe index to be the *YEAR* column
- Stack the data to have on single columns for the Time Series values
- Transform the multi-index to a single index by aggregating year and month
- The final form of the dataframe should be one index column descrbibing the year and month, and one column describing the associated values.
3. Display the first few rows of each data set to verify the transformation.

### Import El Nino Dataset

In [None]:
data = sm.datasets.elnino.load_pandas().datadf_GOOG.head()
data.head()

### Transform the El Nino dataframe

In [None]:
#Transform YEAR column to datetime

#Set the index to the YEAR column

#Stack the data to have one single column of values
data = pd.DataFrame(data.stack())

#Transform the multi-index to a single index by aggregating year and month
months_dict = {"JAN":"01", "FEB":"02", "MAR":"03", "APR":"04", "MAY":"05", "JUN":"06", "JUL":"07", "AUG":"08", "SEP":"09", "OCT":"10", "NOV":"11", "DEC":"12"}
data.index = data.index.map(lambda x: x[0].strftime('%Y') + '-' + months_dict[x[1]])
data.index = pd.to_datetime(data.index, format='%Y-%m')

#Rename the index to Month and the column to Temp

#Visualize the data

***
## PART 2: Time Series Visualization

### Tasks:
1. Add a column to the data for the *rolling mean* over 12 months.
2. Add a column to the data for the *rolling standard deviation* over 12 months.
3. Plot the original time series and the rolling mean.
4. Add the standard deviation using matplotlib *fill_between* function.
5. Add appropriate labels and grid lines to enhance readability of your plots.

In [None]:
# Calculate the rolling mean and rolling standard deviation over 12 months

# Plot the time series, the rolling mean and the rolling standard deviation
plt.figure(figsize=(20, 6))
###Temp
###Rolling Mean
###Rolling std
plt.title("Original Time Series of Sea Surface Temperature")
plt.xlabel('Time')
plt.ylabel('Temperature in °C')
plt.grid(True)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.legend()
plt.show()

### Questions:
1. Does the time series has a trend?
2. Does the time series has a seasonal component?
3. Does the variance depends on the time of observation?
4. Is the time series stationary?

*** 
## PART 3: Autocorrelation Plot (ACF)

### Tasks:
1. Use the *statsmodels* function called *plot_acf* to plot the autocorrelation plot of the time series.
2. Add appropriate labels and grid lines to enhance readability of your plots.


In [None]:
# import the autocorrelation function plot
from statsmodels.graphics.tsaplots import plot_acf

# Plot the autocorrelogram
fig, ax = plt.subplots(figsize=(12, 6))
###plot acf

# Enhance the plot
plt.xlabel('Lags h', fontsize = 14)
plt.title('Autocorrelation Function with Confidence Interval', fontsize = 14)
plt.grid(True)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.show()

### Questions:Can you conlcude about:
1. The Trend?
3. The Seasonality?
4. The Stationarity?

*** 
## PART 4: Differencing

### Tasks:
1. Difference to the first order the time series.
2. Plot the differenced time series and its rolling average over 12 months.
3. Plot the ACF plot of the differenced time series.


In [None]:
# Difference the Time Series

# Plot the differenced time series and its rolling mean
plt.figure(figsize=(20, 6))
###Differenced TS
###Rolling mean of differenced TS
plt.title("Differenced Sea Surface Temperature")
plt.xlabel('Time')
plt.ylabel('Temperature in °C')
plt.grid(True)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.legend()
plt.show()

In [None]:
# Plot the autocorrelogram of the differenced time series
fig, ax = plt.subplots(figsize=(12, 6))
###Plot ACF

#Enchance the plot
plt.xlabel('Lags h', fontsize = 14)
plt.title('Autocorrelation Function with Confidence Interval', fontsize = 14)
plt.grid(True)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.show()

### Questions:
- Is the Trend removed? The Seasonality?

### Task:
1. Difference again until the trend is removed. Use the ACF plot.

In [None]:
# Difference multiple times the data again until the trend is removed

# Plot the autocorrelogram of the differenced time series
fig, ax = plt.subplots(figsize=(12, 6))
###Plot ACF

plt.xlabel('Lags h', fontsize = 14)
plt.title('Autocorrelation Function with Confidence Interval', fontsize = 14)
plt.grid(True)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.show()

### Questions:
- How many order of differencing is needed to remove the trend?
- Is the data stationary?

***
## PART 5: Seasonal Differencing

### Tasks:
1. Perfrom a seasonal differencing to remove the seasonality.
2. Plot the transformed time series.
3. Plot the ACF.

In [None]:

# Difference the original data get rid of seasonality

# Plot the seasonally differenced time series and its rolling mean
plt.figure(figsize=(20, 6))
### Plot differenced TS
### Plot rolling mean
plt.title("Differenced Sea Surface Temperature")
plt.xlabel('Time')
plt.ylabel('Temperature in °C')
plt.grid(True)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.legend()
plt.show()

In [None]:
# Plot the autocorrelogram of the seasonally differenced time series
fig, ax = plt.subplots(figsize=(12, 6))
### Plot ACF

plt.xlabel('Lags h', fontsize = 14)
plt.title('Autocorrelation Function with Confidence Interval', fontsize = 14)
plt.grid(True)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.show()

### Questions:
- Is the seasonality removed?

### Tasks:
1. Perform several seasonal differencings until the seasonality is removed. Plot the ACF.

In [None]:
# Seasonally difference the data again until the seasonality seems removed

# Plot the autocorrelogram of the seasonally differenced time series
fig, ax = plt.subplots(figsize=(12, 6))
### Plot ACF

plt.xlabel('Lags h', fontsize = 14)
plt.title('Autocorrelation Function with Confidence Interval', fontsize = 14)
plt.grid(True)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.show()

### Questions:
- What phenomenon do you observe?
- Which seasonal difference order do you select?

***
## PART 6: Make the time series stationary

### Tasks:
1. Combine the differencing and the sesonal differencing to remove both the trend and de seasonality to make the data stationary.
2. Plot the stationarized Time Series and its rolling mean over 12 months.
3. Plot the ACF.


In [None]:
# Combine the differencing and seasonally differencing to remove both trend and seasonality and make the data stationary

# Plot the stationarized time series and its rolling mean
plt.figure(figsize=(20, 6))
### Plot stationnary TS
### Plot Rolling mean of stationnary TS
plt.title("Differenced Sea Surface Temperature")
plt.xlabel('Time')
plt.ylabel('Temperature in °C')
plt.grid(True)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.legend()
plt.show()

### Questions:
- Does it look stationary? 

***
## PART 7: Test for Stationarity

### Tasks:
1. Use the KPSS test from *statsmodels* to test check stationarity of the transformed time series.

In [None]:
#Import KPSS test
from statsmodels.tsa.stattools import kpss

#Run KPSS test on the stationarized data


print(f"KPSS Statistic: {stat}")
print(f"p-value: {p_value}")

# Interpret the p-value
if p_value < 0.05:
    print("You tell me")
else:
    print("You tell me")