# VARS AND DMDS

# Content
- first-order vector autoregressions (VARs)
- dynamic mode decompositions (DMDs)
- connections between DMDs and first-order VARs

This lecture applies computational methods that we learned about in this lecture Singular Value Decomposition.

## First-Order Vector Autoregressions

We want to fit a **first-order vector autoregression**

$$
\begin{equation}
X_{t+1} = AX_t + C\epsilon_{t+1}, \text{ } \epsilon_{t+1}\perp X_t
\end{equation}\tag{6.1}
$$

where $\epsilon_{t+1}$ is the time $t+1$ component of a sequence of i.i.d. $m\times 1$ random vectors with mean vector zero and identity covariance matrix and where the $m\times 1$ vector $X_t$ is

$$
X_t = 
\begin{bmatrix}
X_{1,t} & X_{2,t} & \cdots & X_{m,t}
\end{bmatrix}^T\tag{6.2}
$$

and where $\cdot^T$ again denotes complex transposition and $X_{i,t}$ is variable $i$ at time $t$. We want to fit equation (6.1). Our data are organized in an $m\times (n+1)$ matrix $\tilde X$ 

$$
\tilde X = 
\begin{bmatrix}
X_{1} | X_{2} | \cdots | X_{n} | X_{n+1}
\end{bmatrix}\tag{6.2}
$$

where for $t=1,...,n+1$, the $m\times 1$ vector $X_t$ is given by (6.2). Thus, we want to estimate a system (6.1) that consists of $m$ least squares regressions of **everything** on one lagged value of **everything**.

The $i$ ’th equation of (6.1) is a regression of $X_{i,t+1}$ on the vector $X_t$. We proceed as follows.

From $\tilde X$, we form two $m\times n$ matrices

$$
X = 
\begin{bmatrix}
X_{1} | X_{2} | \cdots | X_{n}
\end{bmatrix}
$$
 
and

$$
X' = 
\begin{bmatrix}
X_{2} | X_{3} | \cdots | X_{n+1}
\end{bmatrix}
$$

 
Here $'$ is part of the name of the matrix $X'$ and does not indicate matrix transposition. We use $\cdot^T$ to denote matrix transposition or its extension to complex matrices. In forming $X$ and $X'$, we have in each case dropped a column from $\tilde X$, the last column in the case of $X$, and the first column in the case of $X'$.

Evidently, $X$ and $X'$ are both $m\times n$ matrices. We denote the rank of $X$ as $p\leq min(m,n)$. Two cases that interest us are

- $n >> m$, so that we have many more time series observations than variables 
- $n << m$, so that we have many more variables than time series observations 

At a general level that includes both of these special cases, a common formula describes the least squares estimator $\hat A$ of $A$. But important details differ. The common formula is

$$
\begin{equation}
\hat A = X'X^+
\end{equation}\tag{6.3}
$$

(6.3)
where $X^+$ is the pseudo-inverse of $X$. To read about the **Moore-Penrose pseudo-inverse** please see [Moore-Penrose pseudo-inverse](https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse)

Applicable formulas for the pseudo-inverse differ for our two cases.

**Short-Fat Case:**

When $n >> m$, so that we have many more time series observations $n$ than variables $m$ and when $X$ has linearly independent **rows**, $XX^T$ has an inverse and the pseudo-inverse $X^+$ is

$$
X^+ = X^T(XX^T)^{-1}
$$

Here $X^+$ is a **right-inverse** that verifies $XX^+ = I_{m\times m}$. In this case, our formula (6.3) for the least-squares estimator of the population matrix of regression coefficients $A$ becomes

$$
\begin{equation}
\hat A = X'X^T(XX^T)^{-1}
\end{equation}\tag{6.4}
$$

This formula for least-squares regression coefficients is widely used in econometrics. It is used to estimate vector autorgressions. The right side of formula (6.4) is proportional to the empirical cross second moment matrix of $X_{t+1}$ and $X_t$ times the inverse of the second moment matrix of $X_t$.

**Tall-Skinny Case:**

When $m >> n$, so that we have many more attributes $m$ than time series observations $n$ and when $X$ has linearly independent **columns**, $X^TX$ has an inverse and the pseudo-inverse $X^+$ is

$$
X^+ = (X^TX)^{-1}X^T
$$

Here $X^+$ is a **left-inverse** that verifies $X^+X = I_{n\times n}$. In this case, our formula (6.3) for a least-squares estimator of $A$ becomes

$$
\begin{equation}
\hat A = X'(X^TX)^{-1}X^T
\end{equation}\tag{6.5}
$$

Please compare formulas (6.4) and (6.5) for $\hat A$. Here we are especially interested in formula (6.5). The $i$ th row of $\hat A$ is an $m\times 1$ vector of regression coefficients of $X_{i,t+1}$ on $X_{j,t},j=1,...,m$. If we use formula (6.5) to calculate $\hat A X$ we find that

$$
\hat A X = X'
$$

so that the regression equation **fits perfectly**. This is a typical outcome in an **underdetermined least-squares** model. To reiterate, in the **tall-skinny** case (described in [Singular Value Decomposition](https://python.quantecon.org/svd_intro.html)) in which we have a number $n$ of observations that is small relative to the number $m$ of attributes that appear in the vector $X_t$, we want to fit equation (6.1).

We confront the facts that the least squares estimator is underdetermined and that the regression equation fits perfectly. To proceed, we’ll want efficiently to calculate the pseudo-inverse $X^+$. The pseudo-inverse $X^+$ will be a component of our estimator of $A$. As our estimator $\hat A$ of $A$ we want to form an $m\times m$ matrix that solves the least-squares best-fit problem

$$
\begin{equation}
\hat A = \argmin_{\tilde A}\Vert X'-\tilde A X \Vert_F
\end{equation}\tag{6.6}
$$

where $\Vert\cdot\Vert_F$ denotes the Frobenius (or Euclidean) norm of a matrix. The Frobenius norm is defined as

$$
\Vert A\Vert_F = \sqrt{\sum_{i=1}^m\sum_{j=1}^m|A_{ij}|^2}
$$

The minimizer of the right side of equation (6.6) is

$$
\begin{equation}
\hat A = X'X^+
\end{equation}\tag{6.7}
$$

where the (possibly huge) $n\times m$ matrix $X^+ = (X^TX)^{-1}X^T$ is again a pseudo-inverse of $X$. For some situations that we are interested in, $X^TX$ can be close to singular, a situation that makes some numerical algorithms be inaccurate. To acknowledge that possibility, we’ll use efficient algorithms to constructing a **reduced-rank approximation** of $\hat A$ in formula (6.5).

Such an approximation to our vector autoregression will no longer fit perfectly. The $i$ th row of $\hat A$ is an $m\times 1$ vector of regression coefficients of $X_{i,t+1}$ on $X_{j,t},j=1,...,m$. An efficient way to compute the pseudo-inverse $X_+$ is to start with a singular value decomposition

$$
\begin{equation}
X = U\Sigma V^T
\end{equation}\tag{6.8}
$$

where we remind ourselves that for a **reduced** SVD, $X$ is an $m\times n$ matrix of data, $U$ is an $m\times p$ matrix, $\Sigma$ is a $p\times p$ matrix, and $V$ is an $n\times p$ matrix. We can efficiently construct the pertinent pseudo-inverse $X^+$ by recognizing the following string of equalities.
 
$$
\begin{align}
X^+ &= (X^TX)^{-1}X^T\\
    &= (V\Sigma U^TU\Sigma V^T)^{-1}V\Sigma U^T\\
    &= (V\Sigma \Sigma V^T)^{-1}V\Sigma U^T\\
    &= V\Sigma^{-1} \Sigma^{-1} V^T V\Sigma U^T\\
    &= V\Sigma^{-1} U^T
\end{align}\tag{6.9}
$$
 
(Since we are in the $m >> n$ case in which $V^TV = I_{p\times p}$ in a reduced SVD, we can use the preceding string of equalities for a reduced SVD as well as for a full SVD.) Thus, we shall construct a pseudo-inverse $X^+$ of $X$ by using a singular value decomposition of $X$ in equation (6.8) to compute

$$
\begin{align}
X^+ = V\Sigma^{-1} U^T
\end{align}\tag{6.10}
$$

where the matrix $\Sigma^{-1}$ is constructed by replacing each non-zero element of $\Sigma$ with $\sigma_j^{-1}$. We can use formula (6.10) together with formula (6.7) to compute the matrix $\hat A$ of regression coefficients. Thus, our estimator $\hat A = X'X^+$ of the $m\times m$ matrix of coefficients $A$ is

$$
\begin{align}
\hat A = X'V\Sigma^{-1}U^T
\end{align}\tag{6.11}
$$

## Dynamic Mode Decomposition (DMD)

We turn to the $m >>n $ **tall and skinny** case associated with **Dynamic Mode Decomposition**. Here an $m\times (n+1)$ data matrix $\tilde X$ contains many more attributes (or variables) $m$ than time periods $n+1$.

**Dynamic Mode Decomposition** (DMD) computes a rank $r < p$ approximation to the least squares regression coefficients $\hat A$ described by formula (6.11). We’ll build up gradually to a formulation that is useful in applications. We’ll do this by describing three alternative representations of our first-order linear dynamic system, i.e., our vector autoregression.

**Guide to three representations:** In practice, we’ll mainly be interested in Representation 3. We use the first two representations to present some useful intermediate steps that help us to appreciate what is under the hood of Representation 3. In applications, we’ll use only a small subset of **DMD modes** to approximate dynamics. We use such a small subset of DMD modes to construct a reduced-rank approximation to $A$.

To do that, we’ll want to use the **reduced** SVD’s affiliated with representation 3, not the **full** SVD’s affiliated with representations 1 and 2.

## Representation 1

In this representation, we shall use a **full** SVD of $X$. We use the $m$ **columns** of $U$, and thus the $m$ **rows** of $U^T$, to define a $m\times 1$ vector $\tilde b_t$ as

$$
\begin{equation}
\tilde b = U^TX_t
\end{equation}\tag{6.12}
$$

The original data $X_t$ can be represented as

$$
\begin{equation}
X_t = U^T\tilde b_t
\end{equation}\tag{6.13}
$$

(Here we use $b$ to remind ourselves that we are creating a basis **vector**.) Since we are now using a **full** SVD, $UU^T = I_{m\times m}$. So it follows from equation (6.12) that we can reconstruct $X_t$ from $\tilde b_t$.

In particular,
- Equation (6.12) serves as an **encoder** that **rotates** the $m\times 1$ vector $X_t$ to become an $m\times 1$ vector $\tilde b_t$ 
- Equation (6.13) serves as a **decoder** that **reconstructs** the $m\times 1$ vector $X_t$ by rotating the $m\times 1$ vector $\tilde b_t$ 

Define a transition matrix for an $m\times 1$ basis vector $\tilde b_t$ by

$$
\begin{equation}
\tilde A = U^T\hat A U
\end{equation}\tag{6.14}
$$

We can recover $\hat A$ from

$$
\hat A = U\tilde A U^T
$$

Dynamics of the $m\times 1$ basis vector $\tilde b_t$ are governed by

$$
\tilde b_{t+1} = \tilde A\tilde b_t
$$

To construct forecasts $\bar X_t$ of future values of $X_t$ conditional on $X_1$, we can apply decoders (i.e., rotators) to both sides of this equation and deduce

$$
\bar X_{t+1} = U\tilde A^t U^TX_1
$$

where we use $\bar X_{t+1}, t\geq 1$ to denote a forecast.

## Representation 2

This representation is related to one originally proposed by [Schmid, 2010](https://python.quantecon.org/zreferences.html#id19). It can be regarded as an intermediate step on the way to obtaining a related representation 3 to be presented later

As with Representation 1, we continue to
- use a **full** SVD and **not** a reduced SVD

As we observed and illustrated in [Singular Value Decomposition](https://python.quantecon.org/svd_intro.html)

- (a) for a full SVD $UU^T = I_{m\times m}$ and $U^TU = I_{p\times p}$ are both identity matrices
- (b) for a reduced SVD of $X$,$U^TU$ is not an identity matrix.

As we shall see later, a full SVD is too confining for what we ultimately want to do, namely, cope with situations in which $U^TU$ is **not** an identity matrix because we use a reduced SVD of $X$. But for now, let’s proceed under the assumption that we are using a full SVD so that requirements (a) and (b) are both satisfied.

Form an eigendecomposition of the $m\times m$ matrix $\tilde A = U^T\hat A U$ defined in equation (6.14):

$$
\begin{equation}
\tilde A = W\Lambda W^{-1}
\end{equation}\tag{6.15}
$$

where $\Lambda$ is a diagonal matrix of eigenvalues and $W$ is an $m\times m$ matrix whose columns are eigenvectors corresponding to rows (eigenvalues) in $\Lambda$. When $UU^T = I_{m\times m}$, as is true with a full SVD of $X$, it follows that

$$
\begin{equation}
\hat A = U \tilde A U^T = U W\Lambda W^{-1} U^T
\end{equation}\tag{6.16}
$$

According to equation (6.16), the diagonal matrix $\Lambda$ contains eigenvalues of $\hat A$ and corresponding eigenvectors of $\hat A$ are columns of the matrix $UW$. It follows that the systematic (i.e., not random) parts of the $X_t$ dynamics captured by our first-order vector autoregressions are described by

$$
X_{t+1} = U W\Lambda W^{-1} U^T X_t
$$

Multiplying both sides of the above equation by $W^{-1} U^T$ gives

$$
W^{-1} U^TX_{t+1} = \Lambda W^{-1} U^T X_t
$$

or

$$
\hat b_{t+1} = \Lambda\hat b_t
$$

where our **encoder** is

$$
\hat b_{t} = W^{-1} U^T X_t
$$

and our **decoder** is

$$
X_{t} = U W\hat b_t
$$

We can use this representation to construct a predictor $\bar X_{t+1}$ of $X_{t+1}$ conditional on $X_1$ via:

$$
\begin{equation}
\bar X_{t+1} = U W \Lambda^t W^{-1} U^T X_1
\end{equation}\tag{6.17}
$$
 
In effect, [Schmid, 2010](https://python.quantecon.org/zreferences.html#id19) defined an $m\times m$ matrix $\Phi_s$ as

$$
\begin{equation}
\Phi_s = U W 
\end{equation}\tag{6.18}
$$

and a generalized inverse

$$
\begin{equation}
\Phi_s^+ = W^{-1}U^T 
\end{equation}\tag{6.19}
$$

[Schmid, 2010](https://python.quantecon.org/zreferences.html#id19) then represented equation (6.17) as

$$
\begin{equation}
\bar X_{t+1} = \Phi_s \Lambda^t \Phi_s^+ X_1
\end{equation}\tag{6.20}
$$

Components of the basis vector $\hat b_t = W^{-1}U^T X_t \equiv \Phi_s^+ X_t$ are
DMD **projected modes**. To understand why they are called **projected modes**, notice that

$$
\Phi_s^+ = (\Phi_s^T\Phi_s)^{-1}\Phi_s^T
$$

so that the $m\times p$ matrix

$$
\hat b = \Phi_s^+ X
$$

is a matrix of regression coefficients of the $m\times n$ matrix $X$ on the $m\times p$ matrix $\Phi_s$. We’ll say more about this interpretation in a related context when we discuss representation 3, which was suggested by Tu et al. [Tu et al., 2014](https://python.quantecon.org/zreferences.html#id28). It is more appropriate to use representation 3 when, as is often the case in practice, we want to use a reduced SVD.

## Representation 3

Departing from the procedures used to construct Representations 1 and 2, each of which deployed a **full** SVD, we now use a **reduced** SVD. Again, we let $p\leq min(m,n)$ be the rank of $X$. Construct a **reduced** SVD

$$
X = \tilde U\tilde\Sigma \tilde V^T
$$

where now $\tilde U$ is $m\times p$, $\tilde\Sigma$ is $p\times p$, and $\tilde V^T$ is $p\times n$. Our minimum-norm least-squares approximator of $A$ now has representation

$$
\begin{equation}
\hat A = X'\tilde V\tilde \Sigma^{-1} \tilde U^T
\end{equation}\tag{6.21}
$$

**Computing Dominant Eigenvectors of $\hat A$**

We begin by paralleling a step used to construct Representation 1, define a transition matrix for a rotated $p\times 1$ state $\tilde b_t$ by

$$
\begin{equation}
\hat A = \tilde U^T\hat A\tilde U
\end{equation}\tag{6.22}
$$

**Interpretation as projection coefficients**

[Brunton and Kutz, 2022](https://python.quantecon.org/zreferences.html#id43) remark that $\tilde A$ can be interpreted in terms of a projection of $\hat A$ onto the $p$ modes in $\tilde U$. To verify this, first note that, because $\tilde U^T\tilde U = I$, it follows that

$$
\begin{equation}
\tilde A
= \tilde U^T\hat A\tilde U 
= \tilde U^T X'\tilde V\tilde\Sigma^{-1}\tilde U^T \tilde U 
= \tilde U^T X'\tilde V\tilde\Sigma^{-1}
\end{equation}\tag{6.23}
$$

Next, we’ll just compute the regression coefficients in a projection of $\hat A$ on $\tilde U$ using a standard least-squares formula

$$
(U^T \tilde U )^{-1}\tilde U^T\hat A
= (U^T \tilde U )^{-1}\tilde U^T X'\tilde V\tilde \Sigma^{-1} \tilde U^T
= (U^T \tilde U )^{-1}\tilde U^T X'\tilde V\tilde \Sigma^{-1} \tilde U^T
$$

Thus, we have verified that 
 is a least-squares projection of 
 onto 
.

An Inverse Challenge

Because we are using a reduced SVD, 
.

Consequently,

so we can’t simply recover 
 from 
 and 
.

A Blind Alley

We can start by hoping for the best and proceeding to construct an eigendecomposition of the 
 matrix 
:

(6.24)
where 
 is a diagonal matrix of 
 eigenvalues and the columns of 
 are corresponding eigenvectors.

Mimicking our procedure in Representation 2, we cross our fingers and compute an 
 matrix

(6.25)
that corresponds to (6.18) for a full SVD.

At this point, where 
 is given by formula (6.21) it is interesting to compute 
:

 
 
 
That 
 means that, unlike the corresponding situation in Representation 2, columns of 
 are not eigenvectors of 
 corresponding to eigenvalues on the diagonal of matix 
.

An Approach That Works

Continuing our quest for eigenvectors of 
 that we can compute with a reduced SVD, let’s define an 
 matrix 
 as

(6.26)
It turns out that columns of 
 are eigenvectors of 
.

This is a consequence of a result established by Tu et al. [Tu et al., 2014] that we now present.

Proposition The 
 columns of 
 are eigenvectors of 
.

Proof: From formula (6.26) we have

 
so that

(6.27)
Let 
 be the 
th column of 
 and 
 be the corresponding 
 eigenvalue of 
 from decomposition (6.24).

Equating the 
 vectors that appear on the two sides of equation (6.27) gives

This equation confirms that 
 is an eigenvector of 
 that corresponds to eigenvalue 
 of both 
 and 
.

This concludes the proof.

Also see [Brunton and Kutz, 2022] (p. 238)