# VARS AND DMDS

# Content
- first-order vector autoregressions (VARs)
- dynamic mode decompositions (DMDs)
- connections between DMDs and first-order VARs

This lecture applies computational methods that we learned about in this lecture Singular Value Decomposition.

## First-Order Vector Autoregressions

We want to fit a **first-order vector autoregression**

$$
\begin{equation}
X_{t+1} = AX_t + C\epsilon_{t+1}, \text{ } \epsilon_{t+1}\perp X_t
\end{equation}\tag{6.1}
$$

where $\epsilon_{t+1}$ is the time $t+1$ component of a sequence of i.i.d. $m\times 1$ random vectors with mean vector zero and identity covariance matrix and where the $m\times 1$ vector $X_t$ is

$$
X_t = 
\begin{bmatrix}
X_{1,t} & X_{2,t} & \cdots & X_{m,t}
\end{bmatrix}^T\tag{6.2}
$$

and where $\cdot^T$ again denotes complex transposition and $X_{i,t}$ is variable $i$ at time $t$. We want to fit equation (6.1). Our data are organized in an $m\times (n+1)$ matrix $\tilde X$ 

$$
\tilde X = 
\begin{bmatrix}
X_{1} | X_{2} | \cdots | X_{n} | X_{n+1}
\end{bmatrix}\tag{6.2}
$$

where for $t=1,...,n+1$, the $m\times 1$ vector $X_t$ is given by (6.2). Thus, we want to estimate a system (6.1) that consists of $m$ least squares regressions of **everything** on one lagged value of **everything**.

The $i$ ’th equation of (6.1) is a regression of $X_{i,t+1}$ on the vector $X_t$. We proceed as follows.

From $\tilde X$, we form two $m\times n$ matrices

$$
X = 
\begin{bmatrix}
X_{1} | X_{2} | \cdots | X_{n}
\end{bmatrix}
$$
 
and

$$
X' = 
\begin{bmatrix}
X_{2} | X_{3} | \cdots | X_{n+1}
\end{bmatrix}
$$

 
Here $'$ is part of the name of the matrix $X'$ and does not indicate matrix transposition. We use $\cdot^T$ to denote matrix transposition or its extension to complex matrices. In forming $X$ and $X'$, we have in each case dropped a column from $\tilde X$, the last column in the case of $X$, and the first column in the case of $X'$.

Evidently, $X$ and $X'$ are both $m\times n$ matrices. We denote the rank of $X$ as $p\leq min(m,n)$. Two cases that interest us are

- $n >> m$, so that we have many more time series observations than variables 
- $n << m$, so that we have many more variables than time series observations 

At a general level that includes both of these special cases, a common formula describes the least squares estimator $\hat A$ of $A$. But important details differ. The common formula is

$$
\begin{equation}
\hat A = X'X^+
\end{equation}\tag{6.3}
$$

(6.3)
where $X^+$ is the pseudo-inverse of $X$. To read about the **Moore-Penrose pseudo-inverse** please see [Moore-Penrose pseudo-inverse](https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse)

Applicable formulas for the pseudo-inverse differ for our two cases.

**Short-Fat Case:**

When $n >> m$, so that we have many more time series observations $n$ than variables $m$ and when $X$ has linearly independent **rows**, $XX^T$ has an inverse and the pseudo-inverse $X^+$ is

$$
X^+ = X^T(XX^T)^{-1}
$$

Here $X^+$ is a **right-inverse** that verifies $XX^+ = I_{m\times m}$. In this case, our formula (6.3) for the least-squares estimator of the population matrix of regression coefficients $A$ becomes

$$
\begin{equation}
\hat A = X'X^T(XX^T)^{-1}
\end{equation}\tag{6.4}
$$

This formula for least-squares regression coefficients is widely used in econometrics. It is used to estimate vector autorgressions. The right side of formula (6.4) is proportional to the empirical cross second moment matrix of $X_{t+1}$ and $X_t$ times the inverse of the second moment matrix of $X_t$.

**Tall-Skinny Case:**

When $m >> n$, so that we have many more attributes $m$ than time series observations $n$ and when $X$ has linearly independent **columns**, $X^TX$ has an inverse and the pseudo-inverse $X^+$ is

$$
X^+ = (X^TX)^{-1}X^T
$$

Here $X^+$ is a **left-inverse** that verifies $X^+X = I_{n\times n}$. In this case, our formula (6.3) for a least-squares estimator of $A$ becomes

$$
\begin{equation}
\hat A = X'(X^TX)^{-1}X^T
\end{equation}\tag{6.5}
$$

Please compare formulas (6.4) and (6.5) for $\hat A$. Here we are especially interested in formula (6.5). The $i$ th row of $\hat A$ is an $m\times 1$ vector of regression coefficients of $X_{i,t+1}$ on $X_{j,t},j=1,...,m$. If we use formula (6.5) to calculate $\hat A X$ we find that

$$
\hat A X = X'
$$

so that the regression equation **fits perfectly**. This is a typical outcome in an **underdetermined least-squares** model. To reiterate, in the **tall-skinny** case (described in [Singular Value Decomposition](https://python.quantecon.org/svd_intro.html)) in which we have a number $n$ of observations that is small relative to the number $m$ of attributes that appear in the vector $X_t$, we want to fit equation (6.1).

We confront the facts that the least squares estimator is underdetermined and that the regression equation fits perfectly. To proceed, we’ll want efficiently to calculate the pseudo-inverse $X^+$. The pseudo-inverse $X^+$ will be a component of our estimator of $A$. As our estimator $\hat A$ of $A$ we want to form an $m\times m$ matrix that solves the least-squares best-fit problem

$$
\begin{equation}
\hat A = \argmin_{\tilde A}\Vert X'-\tilde A X \Vert_F
\end{equation}\tag{6.6}
$$

where $\Vert\cdot\Vert_F$ denotes the Frobenius (or Euclidean) norm of a matrix. The Frobenius norm is defined as

$$
\Vert A\Vert_F = \sqrt{\sum_{i=1}^m\sum_{j=1}^m|A_{ij}|^2}
$$

The minimizer of the right side of equation (6.6) is

$$
\begin{equation}
\hat A = X'X^+
\end{equation}\tag{6.7}
$$

where the (possibly huge) $n\times m$ matrix $X^+ = (X^TX)^{-1}X^T$ is again a pseudo-inverse of $X$. For some situations that we are interested in, $X^TX$ can be close to singular, a situation that makes some numerical algorithms be inaccurate. To acknowledge that possibility, we’ll use efficient algorithms to constructing a **reduced-rank approximation** of $\hat A$ in formula (6.5).

Such an approximation to our vector autoregression will no longer fit perfectly. The $i$ th row of $\hat A$ is an $m\times 1$ vector of regression coefficients of $X_{i,t+1}$ on $X_{j,t},j=1,...,m$. An efficient way to compute the pseudo-inverse $X_+$ is to start with a singular value decomposition

$$
\begin{equation}
X = U\Sigma V^T
\end{equation}\tag{6.8}
$$

where we remind ourselves that for a **reduced** SVD, $X$ is an $m\times n$ matrix of data, $U$ is an $m\times p$ matrix, $\Sigma$ is a $p\times p$ matrix, and $V$ is an $n\times p$ matrix. We can efficiently construct the pertinent pseudo-inverse $X^+$ by recognizing the following string of equalities.
 
$$
\begin{align}
X^+ &= (X^TX)^{-1}X^T\\
    &= (V\Sigma U^TU\Sigma V^T)^{-1}V\Sigma U^T\\
    &= (V\Sigma \Sigma V^T)^{-1}V\Sigma U^T\\
    &= V\Sigma^{-1} \Sigma^{-1} V^T V\Sigma U^T\\
    &= V\Sigma^{-1} U^T
\end{align}\tag{6.9}
$$
 
(Since we are in the $m >> n$ case in which $V^TV = I_{p\times p}$ in a reduced SVD, we can use the preceding string of equalities for a reduced SVD as well as for a full SVD.) Thus, we shall construct a pseudo-inverse $X^+$ of $X$ by using a singular value decomposition of $X$ in equation (6.8) to compute

$$
\begin{align}
X^+ = V\Sigma^{-1} U^T
\end{align}\tag{6.10}
$$

where the matrix $\Sigma^{-1}$ is constructed by replacing each non-zero element of $\Sigma$ with $\sigma_j^{-1}$. We can use formula (6.10) together with formula (6.7) to compute the matrix $\hat A$ of regression coefficients. Thus, our estimator $\hat A = X'X^+$ of the $m\times m$ matrix of coefficients $A$ is

$$
\begin{align}
\hat A = X'V\Sigma^{-1}U^T
\end{align}\tag{6.11}
$$