<a href="https://colab.research.google.com/github/BI-DS/EBA-3530/blob/main/Lecture_2/random_walk.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import numpy as np
from matplotlib import pyplot as plt

# Time series

Let's start with some basic examples about time series data and models.

## 1) White noise

Remember that in time series models we assume that the depent variable at time $t$ is decribed by its previous value $y_{t-1}$ and an error term $\epsilon_t$ that it is a white noise.

* 1.1) Simulate 10,000 realizations of a white noise, which for simplicity is assumed to be Gaussian distributed with expectation 0 and variance 1.

In [None]:
# @title 1.1) White noise suggested solution

T = 10000
samples = np.random.normal(size=T)

plt.plot(samples)
plt.xlabel('time unit')
plt.title('White Noise')
plt.show()

## 2) Random walk

The easiest model for time series is the random walk, which *says* that today value of a  variable $y_t$ is described by its previous value plus a white noise, i.e.

$$
y_t = y_{t-1} + \epsilon_t
$$


* 2.1) Simulate a random walk process where the white noise $\epsilon_t \in [-1,0,1]$, i.e. $\epsilon_t$ can be any of these values with equal probability.

In [None]:
# @title 2.1) Random walk suggested solution

y0 = 0
# Im taking an easy approach here.
# I only allow (in a way) the gaussian white noise
# to take 3 possible values (-1,0,1). Otherwise, you
# need to use something more fancy to get the
# same plot :)
step_set = [-1,0,1]
steps = np.random.choice(a=step_set, size=T)

# use cumsum to take into account the previous t value
# as in a random walk, i.e. y_t = y_t-1
path = np.r_[y0, steps].cumsum(0)

plt.plot(path)
plt.xlabel('time unit')
plt.title('Random Walk')
plt.show()

## 3) Specifying Autoregressive (AR) models in `statsmodels`

The `statsmodels` package uses a general notation for ARMA (autoregressive with movig average components) models. For this course, it is suficciente to know that the general notation `statsmodels`uses is:

\begin{align}
(1-\beta L)y_t =& \epsilon \\
y_t-\beta Ly_t =& \epsilon \\
y_t =& \beta L y_{t} + \epsilon \\
\end{align}

The **lag representation** $L y_t$ is defined as $L y_t = y_{t-1}$, hence
\begin{align}
y_t =& \beta y_{t-1} + \epsilon
\end{align}

The parameters in `statsmodels`  are specified based on the expression $(1-\beta L)$. Hence, if you want to specify that $\beta = +0.8$, for `statsmodels` it is  $\beta = -0.8$ 😲.

When you use the function `ArmaProcess` in `statsmodels` you must specify the moving average term (MA). Use a value of 1 for that term (that will include the error term $\epsilon_t$ in the model). To specify the autoregression term (AR), just specify the expression $(1-\beta)$ in a `numpy` array.

* 3.1) Simulate data from the AR(1) model

$$
y_t = 0.8y_{t-1} + \epsilon_t
$$

and make plots for the AR(1) process and its corresponding autocorrelation function.

In [None]:
# @title 3.1) Simulate AR(1) suggested solution

# import the module for simulating data
from statsmodels.tsa.arima_process import ArmaProcess
from statsmodels.graphics.tsaplots import plot_acf

# Plot 1: AR parameter = +0.8 (no intercept term)
fid, ax = plt.subplots(2,1)
# specify ar terms as explained above
ar1 = np.array([1, -0.8])
# specify ma terms as explained above
ma1 = np.array([1])
# this is the ar model (no intercept)
AR_1 = ArmaProcess(ar1,ma1)
#simulate data
# NOTE, you can specify the scale parameter which is
# the standard deviation for the white noise
ar1_data = AR_1.generate_sample(nsample=1000, scale=1)
# plot data
ax[0].plot(ar1_data)
# plot acf
plot_acf(ar1_data, alpha=1, lags=20, ax=ax[1])
plt.tight_layout()
plt.show()