# Time Series Analysis 

## Topics

* Stylized Facts
    * Stylized Facts
    * Log-Returns
* Stationarity
    * Stationarity
    * Asymptotics of Stationary Sequences
    * Standard Facts on Conditional Expectation
    * MDS
    * Wold Decomposition
* AR Processes
    * ACF
    * Bartlett's Formula
    * Ljung-Box Test
    * AR(1)
    * Causal Processes
    * AR(2)
    * Weak Stationarity of AR(p)
    * Partial Correlation Coefficients
    * PACF
* MA Processes
    * MA(q)
    * Invertibility of MA Processes
    * Formal Notations
* ARMA Models
    * AMRA Models
    * ARMA(1, 1)
    * ARMA(p, q) Analysis
    * ARIMA, Differencing to Obtain Stationarity
    * Dickey-Fuller Test
    * Parameter Estimation
    * Yule Walker Equations
    * Likelihood Methods
    * statsmodels
    * Forecasting
* Non-Stationary to Stationary
    * Box-Cox Transformation
    * Trend and Seasonal Components
    * Differencing
* ARCH/GARCH Modeling
    * Motivation
    * ARCH(1)
    * AR(1)/ARCH(1)
    * ARCH(p)
    * ARCH Properties
    * ARCH and Stylized Facts
    * Weaknesses of ARCH Model
    * From ARCH to GARCH
    * GARCH(1, 1)
    * Fitting ARCH to S&P 500 Data
    * GARCH(p, q)
    * GARCH Forecasting
    * Engle Test for ARCH Effects
    * GARCH Forecasting Example in Risk Management
    * Other Volatility Models
* Multivariate Time Series
    * Multivariate Time Series
    * Vector Autoregressioive Processes 
    * Stationarity of VAR(1) Processes
* Cointegration
    * Cointegration 
    * Johansen Test
    * Cryptocurrency Example
* State Space Modeling
    * State Space Models
    * Kalman Recursions: Kalman Prediction & Filtering
    * Example (Linear Regression)
    * AR, MA in State Space Form
    * Bayesian Background to Kalman Methods
    * Stochastic Volatility

## Notes

* Stylized Facts
    * Stylized Facts
    * Log-Returns
* Stationarity
    * Stationarity
        * Weakly stationary
            * mean and var are constant
            * Cov$(X_s, X_t)$ only depends on the lag $|s-t|$
            * weak stationarity $+$ jointly normal distributions $\implies$ strict stationarity
        * WN$(0, \sigma^2)$
            * weakly stationary process with mean 0
            * ACovF is $\{\sigma^2, 0, 0, \ldots\}$
    * Asymptotics of Stationary Sequences
    * Standard Facts on Conditional Expectation
    * MDS
        * Martingale: $E(X_{t+1}|\mathcal F_t) = X_t\quad\forall t\ge 0$
        * MDS: $E(X_{t+1}|\mathcal F_t) = 0\quad\forall t\ge 0$, hence $E(X_{t+1}) = 0$.
        * 3 types of noise processes: iid, MDS and weakly stationary processes
        * {iid, zero mean} $\subset$ {MDS}
        * {Common Finite Variance MDS} $\subset$ {White Noise Processes}
        * MDS with common finite variance has CLT
    * Wold Decomposition
        * If $\cap_{j=1}^\infty \mathcal F_{t-j} = \{\phi, \infty\}$, every weakly stationary $X_t$ is MA($\infty$)
            \begin{align*}
            &X_t = \mu + \sum^\infty_{j=0} \psi_j\epsilon_{t-j},\\
            &\psi_0 = 1,\quad\sum_{j=0}^\infty \psi_j^2 < \infty.
            \end{align*}
* AR Processes
    * ACF
    * Bartlett's Formula
    * Ljung-Box Test
        \begin{align*}
        &H_0: \rho(1)=\rho(2)=\cdots=\rho(m)=0\\
        &H_1: \text{at least one of $\rho(i)$ is nonzero, $1\le i\le m$}
        \end{align*}
    * AR(1)
        \begin{align*}
        X_t = \phi_0 + \phi_1 X_{t-1} + \epsilon_t
        \end{align*}
        * stationary iff $|\phi_1| < 1$
        * $E(X_t) = \phi_0/(1-\phi_1), \quad |\phi_1| < 1$
        * $Var(X_t) = \phi_{\epsilon}^2/(1-\phi_1^2), \quad |\phi_1| < 1$
        * $\gamma(h) = \phi_1^{|h|}\frac{\sigma_{\epsilon}^2}{1-\phi_1^2}, \quad |\phi_1| < 1$
        * $\rho(h) = \phi_1^{|h|}, \quad |\phi_1| < 1$
        * if either $E(X_0)$ or $Var(X_0)$ differ from the stationary values but $|\phi_1| < 1$ then the process is only asymptotically stationary
        * remove the mean: define $\mu = \phi_0/(1-\phi_1), Y_t = X_t - \mu$
    * Causal Processes
    * AR(2)
        \begin{align*}
        X_t &= \phi_0 + \phi_1 X_{t-1} + \phi_2 X_{t-2} + \epsilon_t, \\
        \mu &= \frac{\phi_0}{1 - \phi_1 - \phi_2}\\
        Y_t &= X_t - \mu
        \end{align*}
        * assume moment structure is constant
        * ACF: Multiply $Y_{t+h} = \phi_1Y_{t+h-1} + \phi_2Y_{t+h-2} + \epsilon_{t+h}$ by $Y_t$, take expectation, and divide by $\gamma(0)$: 
        \begin{align*}
        \rho(h) = \begin{cases}
        1 &\mbox{ if } h=0\\
        \phi_1/(1-\phi_2) &\mbox{ if } h=1\\
        \phi_1\rho(h-1) + \phi_2\rho(h-2) &\mbox{ if } h\ge 2
        \end{cases}
        \end{align*}
        * AR polynomial: Plug $z=1/\lambda$ into the chf of the recurrence relation
        \begin{align*}
        \phi(z) = 1 - \phi_1 z - \phi_2 z^2
        \end{align*}
        * $X_t$ is stationary iff all roots of $\phi(z) = 0$ (the characteristic roots) have modulus strictly greater than 1
        * recurrence relation is $\phi(B)\rho(h) = 0$
        * matrix form: 
        \begin{align*}
        \mathbf X_t &= (X_t, X_{t-1})^T, \quad\mathbf \mu = (\phi_0, 0)^T, \quad\mathbf\epsilon_t = (\epsilon_t, 0)^T, \\
        \mathbf X_t &= \mathbf \mu + \mathbf M \mathbf X_{t-1} + \mathbf\epsilon_t, \\
        \mathbf M &= \begin{pmatrix}
        \phi_1 & \phi_2\\
        1 & 0
        \end{pmatrix}
        \end{align*}
    * Weak Stationarity of AR$(p)$
        * assuming constant mean, 
        \begin{align*}
        X_t &= \phi_0 + \phi_1 X_{t-1} + \cdots + \phi_p X_{t-p} + \epsilon_t, \\
        \mu &= \frac{\phi_0}{1 - \sum_{i=1}^p\phi_i}, \quad\sum_{i=1}^p\phi_i < 1\\
        Y_t &= X_t - \mu
        \end{align*}
        * AR polynomial
        \begin{align*}
        \phi(z) = 1 - \sum_{i=1}^p \phi_i z^i
        \end{align*}
        * matrix form
        \begin{align*}
        \mathbf M = \begin{pmatrix}
        \phi_1 & \phi_2 & \cdots & \phi_{p-1} & \phi_p\\
        1 & 0 & \cdots & 0 & 0 \\
        0 & 1 & \cdots & 0 & 0 \\
        \vdots & \vdots & \vdots & \vdots & \vdots \\
        0 & 0 & \cdots & 1 & 0 
        \end{pmatrix}
        \end{align*}
    * Partial Correlation Coefficients $\rho(X, Y|\vec Z)$
        1. regress $X$ on $\vec Z$
        2. regress $Y$ on $\vec Z$
        3. compute correlation coefficient of the residuals
    * PACF for AR$(p)$
        * estimate $\hat \phi_{k, k}$ in 
        \begin{align*}
        X_t = \phi_{0, k} + \phi_{1, k}X_{t-1} + \cdots + \phi_{k, k}X_{t-k} + \epsilon_{k, t}
        \end{align*}
        * $p$ is the smallest $k$ such that the test concludes $\phi_{k, k} = 0$
* MA Processes
    * MA$(q)$
        \begin{align*}
        X_t = \mu + \sum_{i=1}^q \theta_i\epsilon_{t-i} + \epsilon_t
        \end{align*}
        * weakly stationary for all $\{\theta_i\}$
        * $E(X_t) = \mu$
        * $Var(X_t) = \sigma_\epsilon^2(1 + \sum_{i=1}^q \theta_i^2), \quad\forall t$
        \begin{align*}
        \gamma(h) &= \begin{cases}
        \sigma_\epsilon^2 \sum_{i=1}^{q-|h|} \theta_i\theta_{i+|h|} &\mbox{ if }q\le |h|\\
        0 &\mbox{ if }q>|h|
        \end{cases},\\
        \rho(h) &= \gamma(h)/\gamma(0)
        \end{align*}
    * Invertibility of MA Processes
        * the two MA(1) processes have the same ACF: 
        \begin{align*}
        Y^{(1)}_t &= \epsilon_t - \theta_1 \epsilon_{t-1}\\
        Y^{(2)}_t &= \epsilon_t - \frac{1}{\theta_1} \epsilon_{t-1}
        \end{align*}
        * write residuals as an AR process
        \begin{align*}
        \epsilon_t &= Y_t^{(1)} + \sum_{i=1}^\infty \theta_1^i Y_{t-i}^{1}\\
        \epsilon_t &= Y_t^{(2)} + \sum_{i=1}^\infty \frac{1}{\theta_1^i} Y_{t-i}^{2}
        \end{align*}
        * MA$(q)$ is invertible if the residuals can be represented by an AR process with convergent coefficients
        * MA polynomial 
        \begin{align*}
        \theta(z) = 1 - \theta_1 z - \theta_2 z^2 - \cdots - \theta_q z^q
        \end{align*}
        * An MA process is invertible iff all roots of $\theta(z) = 0$ have modulus great than 1
    * Formal Notations
        * ARMA$(p, q): Y_t - \phi_1 Y_{t-1} - \cdots - \phi_pY_{t-p} = \epsilon_t + \theta_1\epsilon_{t-1} + \cdots + \theta_q \epsilon_{t-q}$
        \begin{align*}
        \phi(B)Y_t = \theta(B)\epsilon_t
        \end{align*}
        * ARIMA$(p, d, q)$
        \begin{align*}
        \phi(B)(1-B)^dY_t = \theta(B)\epsilon_t
        \end{align*}
* ARMA Models
    * AMRA Models
        \begin{align*}
        &X_t - \phi_0 - \phi_1 X_{t-1} - \cdots - \phi_pX_{t-p} = \epsilon_t + \theta_1\epsilon_{t-1} + \cdots + \theta_q \epsilon_{t-q}\\
        &E(X_t) = \mu = \frac{\phi_0}{1-(\phi_1+\cdots+\phi_p)}
        \end{align*}
    * ARMA(1, 1)
        \begin{align*}
        X_t = \phi_0 + \phi_1X_{t-1} + \theta_1\epsilon_{t-1} + \epsilon_t
        \end{align*}
        * assuming stationary, 
        \begin{align*}
        E(X_t) &= \phi_0/(1-\phi_1)\\
        Var(X_t) &= \gamma(0) = \frac{(1 + \theta_1^2 + 2\phi_1\theta_1)\theta_\epsilon^2}{1-\phi_1^2}\\
        \rho(1) &= \frac{\gamma(1)}{\gamma(0)} = \frac{(1 + \phi_1\theta_1)(\phi_1 + \theta_1)}{1 + \theta_1^2 + 2\phi_1\theta_1}\\
        \rho(h) &= \phi_1^{h-1}\rho(1), \quad h\ge 2
        \end{align*}
    * ARMA$(p, q)$ Analysis
    * ARIMA, Differencing to Obtain Stationarity
        * $X_t$ is $\mathcal I(k)$ if $\nabla^{k-1}X_t$ is non-stationary but $\nabla^{k}X_t$ is stationary, where $\nabla = (1-B)$
        * $\mathbf X_t$ is $\mathcal I(k)$ if at least one of its coordinates is $\mathcal I(k)$ and all the others are $\mathcal I(j)$ for some $j\le k$
    * Dickey-Fuller Test
        \begin{align*}
        H_0 &: \text{a unit root is present}\\
        H_1 &: \text{no unit root}
        \end{align*}
    * Parameter Estimation: OLS for AR$(p)$
        \begin{align*}
        Y_t = \phi_1Y_{t-1} + \phi_2Y_{t-2} + \cdots + \phi_pY_{t-p} + \epsilon_t
        \end{align*}
        * assuming the errors are white noise, the least square estimate $\hat \phi$ is asymptotically normal 
        \begin{align*}
        &\sqrt{n}(\hat\phi - \phi) \implies N_p(\mathbf 0, \sigma_\epsilon^2\mathbf \Gamma_p^{-1}), \\
        &\mathbf\Gamma_p = E(\mathbf Y^T\mathbf Y), \\
        &\mathbf Y = (Y_1, Y_2, \ldots, Y_p)
        \end{align*}
        * the $(i, j)$ element  of the matrix is $E(Y_iY_j) = \gamma(i-j)$
    * Yule Walker Equations
        * $\rho(k) = \phi_1\rho(k-1) + \phi_2\rho(k-2) + \cdots + \phi_p\rho(k-p), \quad\forall 1\le k\le p$
        * solve the $p\times p$ linear system to obtain an estimate $\hat \phi$: 
        \begin{align*}
        \mathbf \rho &= \mathbf R\mathbf \phi, \\
        \mathbf \rho &= (\rho(1), \rho(2), \ldots, \rho(p))^T, \\
        \mathbf R_{i, j} &= \rho(i-j) 
        \end{align*}
        * can be used as the initial guess for numerical root finding in MLE 
    * Likelihood Methods
    * statsmodels
    * Forecasting
* Non-Stationary to Stationary
    * Box-Cox Transformation
        * Box-Cox Transformation
        \begin{align*}
        X^{(\lambda)} = \begin{cases}
        (X^\lambda - 1)/\lambda &\mbox{ if }\lambda\ne 0\\
        \log(X) &\mbox{ if } \lambda = 0
        \end{cases}
        \end{align*}
    * Trend and Seasonal Components
        * $X_t = m_t + Y_t$
            * linear trend: $\mu_t = \beta_0 + \beta_1 t$
            * quadratic trend: $\mu_t = \beta_0 + \beta_1 t + \beta_2 t^2$
            * moving average smoother
            \begin{align*}
            \hat m_t &= \frac{1}{2q+1}\sum_{j=-q}^q X_{t+j}\\
            &= \frac{1}{2q+1}\sum_{j=-q}^q m_{t+j} + \frac{1}{2q+1}\sum_{j=-q}^q Y_{t+j}\\
            &\approx m_t + \text{small error}
            \end{align*}
        * seasonal component with period $d$
        \begin{align*}
        \hat X_t &= \beta_0 + \beta_1 t + \sum_{j=2}^d\beta_j l_j(t)\\
        l_j(t) &= \begin{cases}
        1 &\mbox{if $t$ mod $d$ is $j$}\\
        0 &\mbox{otherwise}
        \end{cases}\quad \forall 1\le j\le d
        \end{align*}
        * there are January indicator function, February indicator function, and so on
        * one of the indicators is omitted as the sum of all indicators must be 0
    * Differencing 
        * $\nabla = (1-B)$ can remove polynomial trends; for example $\nabla^2$ can remove quadratic trends
        * $\nabla_d = (1-B^d)$ can remove seasonal trend: if $X_t = \beta_0 + \beta_1 t + s_t + \epsilon_t$ where $s_t$ is the seasonal term such that $s_t = s_{t-d}$, then $\nabla_d X_t$ is weakly stationary
        * $\nabla_d \ne \nabla^d = (1-B)^d$
* ARCH/GARCH Modeling
    * Motivation
        * ARIMA has non-constant $E(X_t|\mathcal F_{t-1})$ but constant $Var(X_t|\mathcal F_{t-1})$, GARCH is the opposite
        * deterministic models: $Var(X_t|\mathcal F_{t-1})$ is deterministic
        * stochastic volatility models: $Var(X_t|\mathcal F_{t-1})$ is a stochastic process
        * GARCH by itself does not explain the JPM GS situation
    * ARCH(1)
        \begin{align*}
        a_t &= \sigma_t\epsilon_t\\
        \sigma_t &= \sqrt{\omega + \alpha a_{t-1}^2}, \quad\omega > 0, 0\le \alpha < 1
        \end{align*}
        * $\epsilon_t$ is iid with mean 0 and variance 1
        * $E(X_t|\mathcal F_{t-1}) = 0$
        * $Var(X_t|\mathcal F_{t-1}) = \sigma_t^2 = \omega + \alpha a_{t-1}^2$
        * assuming weak stationarity, ARCH(1) is a white noise: $\gamma_a(0) = E(\sigma_t^2) = E(\omega + \alpha a_{t-1}^2) = \omega + \alpha\gamma_a(0)$, so 
        \begin{align*}
        \gamma_a(0) &= \frac{\omega}{1-\alpha}\\
        \gamma_a(h) &= 0
        \end{align*}
        * $\alpha$ controls the mean reversion of $\sigma^2_t$
    * AR(1)/ARCH(1)
        \begin{align*}
        X_t = \mu + \beta(X_{t-1} - \mu) + a_t, \quad |\beta| < 1
        \end{align*}
        * $\rho_X(h) = \beta^{|h|}, \rho_{a^2} = \alpha^{|h|}$
        * non-constant conditional mean and variance
    * ARCH(p)
    * ARCH(1) Properties
        * $a_t^2$ is an AR(1) if $E(\epsilon_t^4) < \infty$: 
        \begin{align*}
        a_t^2 = \omega + \alpha a_{t-1}^2 + \sigma_t^2(\epsilon_t^2 - 1), 
        \end{align*}
        * $\nu_t = \sigma_t^2(\epsilon_t^2 - 1)$ can be shown to be a white noise
        * when $\epsilon_t$ is iid $N(0, 1)$, the unconditional kurtosis > 3: Following AR(1) properties, we have 
        \begin{align*}
        E(a_t^2) &= \frac{\omega}{1-\alpha}, \\
        Var(a_t^2) &= \frac{2E(\sigma_t^4)}{1-\alpha^2}, \\
        E(\sigma_t^4) &= E((\omega + \alpha a_{t-1}^2)^2) \\
        &= \frac{\omega^2(1+\alpha)}{(1-3\alpha^2)(1-\alpha)}\\
        &= 3(E(a_t^2))^2\frac{1-\alpha^2}{1-3\alpha^2} > 3(E(a_t^2))^2
        \end{align*}
        * ARCH Effect: $a_t^2$ and $a_{t+h}^2$ are positively correlated
    * ARCH and Stylized Facts
        * ARCH does not support asymmetry or the leverage effect
    * Weaknesses of ARCH Model
    * From ARCH to GARCH
        \begin{align*}
        a_t &= \sigma_t\epsilon_t, \\
        \sigma^2_t &= \omega + \sum_{i=1}^p \alpha_i a_{t-i}^2 + \sum_{j=1}^q \beta_j\sigma_{t-j}^2, \quad\omega \ge 0, \alpha_i \ge 0, \beta_j > 0
        \end{align*}
        * $\epsilon_t$ is iid $N(0, 1)$
    * GARCH(1, 1) squared is ARMA(1, 1)
        \begin{align*}
        a_t^2 - c &= (\alpha + \beta)(a_{t-1}^2 - c) - \beta\eta_{t-1} + \eta_t, 
        \end{align*}
        * $c = \omega/(1-\alpha-\beta), \eta_t = a_t^2 - \sigma_t^2$
        * ARMA(1, 1) with mean $c$ and coefficients $\phi_1 = \alpha + \beta, \theta_1 = -\beta$
    * Fitting ARCH to S&P 500 Data
    * GARCH$(p, q)$ squared is ARMA$(p, q)$
        \begin{align*}
        a_t^2 - c &= \sum_{i=1}^{\max(p, q)}(\alpha_i + \beta_i)(a_{t-i}^2 - c) - \sum_{i=1}^{\max(p, q)}\beta_i\eta_{t-1} + \eta_t, 
        \end{align*}
        * $c = \omega/(1-\sum_{j=1}^{\max(p, q)}(\alpha_i + \beta_i)), \eta_t = a_t^2 - \sigma_t^2$
        * given $\alpha_i > 0, \beta_i \ge 0$, $a_t^2$ is weakly stationary if $\sum_{i=1}^p\alpha_i + \sum_{j=1}^q\beta_j < 1$
    * GARCH Forecasting
        * 1-step ahead forecast of the conditional variance $\sigma_{t+1}^2$ is already given by the model
        * for GARCH(1, 1), let $\lambda = \alpha + \beta < 1$, the $k$-step ahead forecast is 
        \begin{align*}
        \hat \sigma_{t+k}^2 &= \omega + \lambda \hat \sigma_{t+k-1}^2\\
        &= \omega(1 + \lambda + \cdots + \lambda^{k-2}) + \lambda^{k-1} \hat \sigma_{t+1}^2 \\
        &\rightarrow \frac{\omega}{1-\lambda}\quad \text{ as }k\rightarrow \infty
        \end{align*}
        * half-life of the volatility difference is approximately $\lambda^T = 1/2$, so $T\approx -\frac{\log 2}{\log\lambda}$
    * Engle Test for ARCH Effects
    * GARCH Forecasting Example in Risk Management
    * Other Volatility Models
        * GARCHM
        \begin{align*}
        X_t &= \mu + c\sigma_t^2 + a_t\\
        a_t &= \epsilon_t\sigma_t\\
        \sigma_t^2 &= \omega + \alpha a_{t-1}^2 + \beta \sigma_{t-1}^2
        \end{align*}
        * EGARCH
        \begin{align*}
        g(\epsilon_t) &= \theta\epsilon_t + \gamma(|\epsilon_t| - E(|\epsilon_t|))\\
        &= \begin{cases}
        (\theta + \gamma)\epsilon_t - \gamma(|\epsilon_t|) \mbox{ if } \epsilon_t\ge 0\\
        (\theta - \gamma)\epsilon_t - \gamma(|\epsilon_t|) \mbox{ if } \epsilon_t < 0
        \end{cases},\\
        a_t &= \sigma_t\epsilon_t\\
        \log(\sigma_t^2) &= \omega + \sum_{i=1}^p\beta_i \log(\sigma_{t-i}^2) + \sum_{j=1}^q g_j(\epsilon_{t-j})
        \end{align*}        
* Multivariate Time Series
    * Multivariate Time Series
        * weak stationary: mean vector and autocovariance function (now a matrix) are independent of $t$
        \begin{align*}
        \mathbf X_t &= (X_{1,t}, X_{2,t}, \ldots, X_{m,t})\\
        \mathbf \Gamma(t+h, t) &= E((\mathbf X_{t+h}-\mathbf \mu_{t+h})(\mathbf X_{t}-\mathbf \mu_{t})^T)\\
        \rho_{i, j}(h) &= \frac{\gamma_{i, j}(h)}{\sqrt{\gamma_{i, i}(0)\gamma_{j, j}(0)}}
        \end{align*}
        * the diagonal elements are the ACovF of the individual component time series
        * white noise: weak stationary + zero mean + zero ACF $\forall h\ne 0$
        * $\rho_{i, j}(h) = \rho(X_{i,(t+h)}, X_{j, t}) = \rho_{i, j}(-h)$
        * the sample mean of a weakly stationary process converges and is asymptotically normal
    * Vector Autoregressioive Processes 
        \begin{align*}
        \mathbf X_t = \mathbf a_0 + \sum_{i=1}^p\mathbf A_i\mathbf X_{t-i} + \epsilon_t
        \end{align*}
        * stationarity condition: roots of 
        \begin{align*}
        \det\left(I - \sum_{i=1}^p \mathbf A_i x^i\right) = 0
        \end{align*}
        have modulus strictly larger than 1
* Cointegration
    * Cointegration 
    * Johansen Test
    * Cryptocurrency Example
* State Space Modeling
    * State Space Models
    * Kalman Recursions: Kalman Prediction & Filtering
    * Example (Linear Regression)
    * AR, MA in State Space Form
    * Bayesian Background to Kalman Methods
    * Stochastic Volatility 