#About Unit 3

Welcome back to the Marquette University AIM time series analysis curriculum! In this unit you will learn about stationarity and unit roots, two very important characteristics of time series. We will work through several examples to break down what stationarity is prior to the next unit where we start to model and predict future values.

# Getting Started

**Import Packages**

Run the following code to bring the necessary packages into your environment. Ensure you are running a python kernel >=3.0.0. We will not need to import any data for this week's unit since we can simply generate a random series.

In [None]:
!pip install arch

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import yfinance as yf
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller, kpss
from arch.unitroot import PhillipsPerron
from statsmodels.tsa.seasonal import seasonal_decompose

#Stationarity

Stationarity is a fundamental concept in time series analysis that refers to the property of a time series where its statistical characteristics, such as mean, variance, and autocorrelation, remain constant over time. A stationary time series is easier to model and predict since its behavior does not change over time.

Mathematically, a time series $X_t$ is stationary if:
1. $E[X_t] = \mu$ (mean is constant over time),
2. $\text{Var}[X_t] = \sigma^2$ (variance is constant over time),
3. Covariance $\text{Cov}[X_t, X_{t+k}]$ depends only on $k$ (the lag), not $t$.

**Types of Stationarity**
1. **Strict Stationarity**: The joint distribution of $X_t$ remains unchanged over time.
2. **Weak Stationarity (Second-order Stationarity)**: Only the mean, variance, and autocovariance are invariant over time. This is the most commonly used type of stationarity in time series analysis.

**Why Does Stationarity Matter?**

Most statistical models for time series analysis assume stationarity because:
- It simplifies the modeling process.
- Non-stationary data often leads to spurious results when conducting hypothesis testing or forecasting.

**Testing for Stationarity**

There are several statistical tests available to determine whether a time series is stationary:

1. **Augmented Dickey-Fuller (ADF) Test**:
   - The null hypothesis $H_0$: The time series has a unit root (non-stationary).
   - The alternative hypothesis $H_1$: The time series is stationary.

2. **KPSS Test** (Kwiatkowski-Phillips-Schmidt-Shin):
   - The null hypothesis $H_0$: The time series is stationary.
   - The alternative hypothesis $H_1$: The time series is non-stationary.

3. **Phillips-Perron Test**:
   - Similar to ADF but more robust to serial correlation and heteroscedasticity.


In [None]:
# Generate random data
data = np.random.randn(100)

# Augmented Dickey-Fuller (ADF) test
result = adfuller(data)
print("ADF Statistic:", result[0])
print("p-value:", result[1])
if result[1] < 0.05:
    print("The time series is stationary")
else:
    print("The time series is non-stationary")
print()

# KPSS Test
result = kpss(data, regression='c')
print("KPSS Statistic:", result[0])
print("p-value:", result[1])
print("Critical Values:", result[3])
if result[1] < 0.05:
    print("The series is non-stationary")
else:
    print("The series is stationary")
print()

# Phillips-Perron Test
pp_test = PhillipsPerron(data)
print(pp_test)

**Transforming Non-stationary Series**

If a time series is non-stationary, you can apply transformations to make it stationary. Here are some common techniques:

1. **Differencing**:
   - Subtract the previous observation from the current observation:
$$
Y_t = X_t - X_{t-1}
$$
   - First-order differencing is usually sufficient, but higher-order differencing can be applied if needed.

2. **Log Transformation**:
   - Useful for stabilizing variance:
$$
Y_t = \log(X_t)
$$
   - Caution: Only applicable for positive values.

3. **Detrending**:
   - Remove trends from the data by subtracting a fitted trend line.

4. **Seasonal Adjustment**:
   - Remove seasonal effects by computing seasonal indices or using decomposition methods.

In [None]:
# Generate non-stationary data
non_stationary_data = np.cumsum(np.random.randn(100))

# Differencing
plt.figure(figsize=(12, 8))
plt.subplot(2, 2, 1)
plt.plot(non_stationary_data, label="Original Data")
stationary_data = np.diff(non_stationary_data)
plt.plot(stationary_data, label="Differenced Data")
plt.legend()
plt.title("Differencing")

# Log Transformation
plt.subplot(2, 2, 2)
log_transformed_data = np.log(non_stationary_data - min(non_stationary_data) + 1)
plt.plot(non_stationary_data, label="Original Data")
plt.plot(log_transformed_data, label="Log Transformed Data")
plt.legend()
plt.title("Log Transformation")

# Detrending
time = np.arange(len(non_stationary_data))
trend = np.polyfit(time, non_stationary_data, 1)
trend_line = np.polyval(trend, time)
detrended_data = non_stationary_data - trend_line
plt.subplot(2, 2, 3)
plt.plot(non_stationary_data, label="Original Data")
plt.plot(trend_line, label="Trend Line")
plt.plot(detrended_data, label="Detrended Data")
plt.legend()
plt.title("Detrending")

# Seasonal Adjustment
decomposed = seasonal_decompose(non_stationary_data, model='additive', period=12)
plt.subplot(2, 2, 4)
plt.plot(non_stationary_data, label="Original Data")
plt.plot(decomposed.trend, label="Trend")
plt.plot(decomposed.seasonal, label="Seasonality")
plt.plot(decomposed.resid, label="Residuals")
plt.legend()
plt.title("Seasonal Adjustment")

plt.tight_layout()
plt.show()

**Unit Roots**

A unit root is a characteristic of a non-stationary process where the value of the series is highly dependent on its previous value. A time series with a unit root exhibits a random walk behavior.

Autoregressive Representation

Consider the AR(1) process: $X_t = \phi X_{t-1} + \epsilon_t$

If $|\phi| < 1$: The series is stationary.

If $|\phi| = 1$: The series has a unit root (non-stationary).

If $|\phi| > 1$: The series is explosive.

In [None]:
# Example of an AR(1) process with unit root behavior
from statsmodels.tsa.arima_process import ArmaProcess

# AR(1) process with phi=1 (unit root)
ar_params = [1, -1]
ma_params = [1]
ar_process = ArmaProcess(ar_params, ma_params)
unit_root_data = ar_process.generate_sample(nsample=100)
plt.plot(unit_root_data, label="AR(1) with Unit Root")
plt.legend()
plt.title("AR(1) Process with Unit Root")
plt.show()

# Augmented Dickey-Fuller (ADF) test
result = adfuller(data)
print("ADF Statistic:", result[0])
print("p-value:", result[1])
if result[1] < 0.05:
    print("The time series is stationary")
else:
    print("The time series is non-stationary")
print()

#Conclusion

This concludes Unit 3 of the AIM Time Series Analysis curriculum. After completing this, you should have foundational understanding of stationary, unit roots, and testing for these features in your data sets. Stay tuned for Unit 4 where we will begin forecasting with AR, MA, and ARIMA models!