# More Linear Algebra

In [None]:
import numpy as np
import scipy.linalg
import matplotlib.pyplot as plt

%matplotlib inline

# Special types of matrices

Some matrices have special properties. Let's take a look at a few of them.

## Toeplitz matrices

A **Toeplitz matrix** A is an m\*n matrix where all of the entries on a diagonal (from top left to bottom right) are the same. Here's what a 3\*2 Toeplitz matrix looks like:

$$ \begin{bmatrix}
a & b \\
c & a \\
d & c
\end{bmatrix} $$

We can generate these using scipy as follows:

In [None]:
A = scipy.linalg.toeplitz([1,2,3], [1,4,5,6]) # The first entry is the first column of A and the second is the first row.
                                              # A(1,1) is determined by the first list, so passing in (2,4,5,6) here wouldn't
                                              # change the output.
A

### Properties

- Toeplitz matrices are a subspace of m\*n matrices.

A Toeplitz matrix is much less computationally intensive than a generic matrix. The following operations can be performed in O(n<sup>2</sup>) time if A and B are n\*n Toeplitz matrices:

- Solving the system Ax = b (in scipy, we use solve_toeplitz, see below)
- Computing the determinant of A
- Computing the LU-decomposition of A (see below)
- Multiplying A and B (provided their dimensions are compatible)

In addition, A and B can be added in O(n) time.

In [None]:
scipy.linalg.solve_toeplitz(([1,2,3], [1,4,5]), [1, 1, 1]) # The first two vectors are the 1st column and 1st row of A. 
                                                           # The third vector is b.

In [None]:
C = np.array([[1, .5, -.4],[.3, 1, -.9],[0.0, -.5, 1]]) # arbitrary noise weighting matrix

V = C@np.transpose(C)  # associated covariance matrix
print('V= ', V)
v,lam = np.linalg.eig(V)
print('v = ',v)
print('lam = ', lam)
c1 = V[:,0]
r1 = V[0,:]
print('c1 = ', c1)
print('r1 = ', r1)
Vt= scipy.linalg.toeplitz(np.transpose(c1),r1)
print('V - Vt = ', V - Vt)

## Circulant matrices

A **circulant matrix** is a special type of square Toeplitz matrix, which is obtained as follows:

1. Start with the first row of our matrix, A. The entries can be any number.
2. Let's take n = length(first row of A) = matrix size.
3. Let's take k = 2.
4. To find the k<sup>th</sup> row, take row (k-1) and shift all the entries one spot to the right. The first element of row k should be the last element of row k-1.
5. Increment k by 1 unless k = n.

Let's look at this process for a 3\*3 matrix. We'll start with [1, 3, 5] as our first row.

$$ [1, 3, 5] \implies [3, 5, 1] \implies [5, 1, 3] $$

$$ A = \begin{bmatrix}
1 & 3 & 5 \\
3 & 5 & 1 \\
5 & 1 & 3
\end{bmatrix} $$

We can also do this using scipy:

In [None]:
A = scipy.linalg.circulant([1, 2, 3]) # The parameter we pass to this function is the first row of A
A

### Properties

Circulant matrices have special properties in addition to those shared by all Toeplitz matrices.

Its eigenvalues and  eigenvectors are both easy to compute. 

In fact, its eiegenvectors are the Fourier modes (see more at https://en.wikipedia.org/wiki/Circulant_matrix#Eigenvectors_and_eigenvalues). 

Its determinant is also easy to compute.

- The set of circulant matrices is a vector space.
- If A and B are circulant, so is A + B.
- If A and B are circulant, AB = BA and AB is circulant. In other words, circulant matrices are closed under matrix multiplication, and matrix multiplication is commutative.
- It is also easy to solve systems with circulant matrices (see below).

In [None]:
scipy.linalg.solve_circulant([1, 2, 3], [1, 1, 1]) 

# The first vector is the 1st row of A and the second is b in Ax = b

# the solution x is then returned by the above code

In [None]:
# let's instead solve A x = b the old fashioned way

A = scipy.linalg.circulant([1, 2, 3])

b = np.array([1, 1, 1])

x = np.linalg.solve(A, b)

x


### Covariogram matrix

In [None]:
V = scipy.linalg.circulant([1.0, .9, .81])
print('V =', V)

# Matrix factorization

An important problem in linear algebra is matrix factorization

Let's take a look at a few different decompositions:

## LU-decomposition

Suppose we have an n\*n matrix A. We seek to compute the following:

$$ A = LU $$

where L is a lower triangular n\*n matrix, and U is an upper triangular n\*n matrix.

Note that this isn't always possible (we'll discuss this below), in which case we decompose A as follows:

$$ PA = LU $$
where P is a permutation matrix.

How do we find L and U?

### Row echelon form
Let's start with U. One way to obtain U is by computing the **row echelon form** of A. To compute the row echelon form, we start with A. The operations we are allowed to use are called **row operations**, and are described below:

1. Interchange two rows of A.

$$ \begin{bmatrix}
a & b & c \\
d & e & f \\
g & h & i
\end{bmatrix} \implies 
\begin{bmatrix}
d & e & f \\
a & b & c \\
g & h & i
\end{bmatrix} $$

2. Multiply any row of A by a scalar.

$$ \begin{bmatrix}
a & b & c \\
d & e & f \\
g & h & i
\end{bmatrix} \implies 
\begin{bmatrix}
2a & 2b & 2c \\
d & e & f \\
g & h & i
\end{bmatrix}  $$

3. Add any two rows of A.

$$ \begin{bmatrix}
a & b & c \\
d & e & f \\
g & h & i
\end{bmatrix} \implies 
\begin{bmatrix}
a & b & c \\
a+d & b+e & c+f \\
g & h & i
\end{bmatrix} $$

We want to use these operations to transform A into an upper triangular matrix U. Let's do an example:

$$ \begin{bmatrix}
3 & 1\\
-6 & -4\\
\end{bmatrix} \implies 
\begin{bmatrix}
6 & 2 \\
-6 & -4\\
\end{bmatrix} \implies
\begin{bmatrix}
6 & 2 \\
0 & -2\\
\end{bmatrix}$$

First, we multiply the top row by 2. Next, we add the top row to the bottom row and store the result on the bottom row, changing its leftmost element to zero. Notice that the row operations are the same as the addition/elimination method of solving systems of equations. The procedure of eliminating unknowns from equations (or here, numbers from the leftmost column) is the same.

Next, we need to find L. We can write out the sequence of row operations as matrix multiplications:

$$ \begin{bmatrix}
3 & 1\\
-6 & -4\\
\end{bmatrix} =
\begin{bmatrix}
0.5 & 0 \\
0 & 1\\
\end{bmatrix}
\begin{bmatrix}
6 & 2 \\
-6 & -4\\
\end{bmatrix}
= 
\begin{bmatrix}
0.5 & 0 \\
0 & 1\\
\end{bmatrix}
\begin{bmatrix}
1 & 0 \\
-1 & 1\\
\end{bmatrix}
\begin{bmatrix}
6 & 2 \\
0 & -2\\
\end{bmatrix}$$

Multiplying all the matrices except U together gives us L:

$$ L = \begin{bmatrix}
0.5 & 0 \\
0 & 1\\
\end{bmatrix}
\begin{bmatrix}
1 & 0 \\
-1 & 1\\
\end{bmatrix} =
\begin{bmatrix}
0.5 & 0 \\
-1 & 1\\
\end{bmatrix} $$

This works because all of the matrices are lower triangular. However, this is not always the case. If we have to swap rows at any point, this process will not work because these matrices, called **permutation matrices**, are not triangular. In that case, we say that the matrix doesn't have an LU-decomposition. However, we can get around this issue if A by using the following equation:

$$ PA = LU $$

Why does this work? What we're saying here is that for the matrix A, there exists some permutation of the rows of A with an LU-decomposition. Proving this fact is relatively straightforward and involves showing that row operations can transform any matrix into an upper triangular one.


Now, try computing the LU-decomposition for the following matrix:
$$ \begin{bmatrix}
1 & 2 & 4 \\
3 & 8 & 14 \\
2 & 6 & 13\\
\end{bmatrix} $$

### LU-decomposition using Python

This snippet of code shows how to perform LU-decomposition on the matrix from the example. We use the LU function included in the scipy package:

In [None]:
A = np.array([np.array([3, 1]), np.array([-6, -4])])
scipy.linalg.lu(A) # The first matrix returned is a permutation matrix, the second is L, and the third is U

### Properties of LU-decomposition

Let's suppose that we have an n\*n matrix A that has an LU-decomposition and we wish to solve the following system:

$$ Ax = b $$

where x and b are vectors of length n.

Let's substitute in LU for A:
$$ LUx = b $$

Now let's define y = Ux:

$$ y = Ux $$
$$ Ly = b $$

Notice that we went from a single complex linear system to two simpler ones, making the solution easier to compute. Both of these systems involve triangular matrices, so we can use substitution to solve them quickly. This is especially useful in cases where the matrix A remains the same, but we want to solve the system for different values of b.

## QR-decomposition

The QR-decomposition of an m\*n matrix A is as follows. Note that this decomposition is not possible when m < n.

$$ A = QR $$

Here, Q is a m\*m matrix with orthonormal columns, and R is an upper triangular m\*n matrix.

Let's start by finding Q.

### Gram-Schmidt process

The **Gram-Schmidt process** is a way to obtain a matrix with orthonormal columns from A. Let's look at how it works:

1. Set n = 1.
2. If n is 2 or more, subtract the orthogonal complement of columns n-1, n-2, ... , 1 from column n.
3. Normalize column n.
4. Increment n by 1.

Let's look at an example to make this easier to understand:

$$\begin{bmatrix}
1 & 4 \\
0 & 3 \\
1 & 0
\end{bmatrix}$$

We start by normalizing the first column.

$$\begin{bmatrix}
\frac{1}{\sqrt{2}} & 4 \\
0 & 3 \\
\frac{1}{\sqrt{2}} & 0
\end{bmatrix}$$

Next, we take the component of column 2 that is orthogonal to column 1. In other words, we're looking for orth<sub>col1</sub> col2.

$$\begin{bmatrix}
\frac{1}{\sqrt{2}} & 2\\
0 & 3 \\
\frac{1}{\sqrt{2}} & -2
\end{bmatrix}$$

Now, we normalize column 2.

$$\begin{bmatrix}
\frac{1}{\sqrt{2}} & \frac{2}{\sqrt{17}}\\
0 & \frac{3}{\sqrt{17}} \\
\frac{1}{\sqrt{2}} & \frac{-2}{\sqrt{17}}
\end{bmatrix}$$

Note that Q is an m\*m matrix. We need to add the 3rd unit vector to complete the basis of R<sup>3</sup>.

$$\begin{bmatrix}
\frac{1}{\sqrt{2}} & \frac{2}{\sqrt{17}} & \frac{-3}{\sqrt{34}}\\
0 & \frac{3}{\sqrt{17}} & \frac{2\sqrt{2}}{\sqrt{17}}\\
\frac{1}{\sqrt{2}} & \frac{-2}{\sqrt{17}} & \frac{3}{\sqrt{34}}
\end{bmatrix}$$

Now, we compute R by solving the following equation:

$$ R = Q^TA $$

Note that this process works for non-square matrices as well.

### QR decomposition using Python

We can use the qr function in the scipy module to compute the QR decomposition:

In [None]:
A = np.array([np.array([1, 4]),
              np.array([0, 3]),
              np.array([1, 0])])

scipy.linalg.qr(A)

### Properties of QR-decomposition

- Q<sup>T</sup>Q = I. This holds because the columns of Q are orthonormal.
- The solution of Rx = Q<sup>T</sup>b is the least squares solution to Ax=b.
- The QR decomposition is not unique.

## Cholesky decomposition

The Cholesky decomposition only exists for certain matrices. Let's talk about their properties.

1. The matrix must be **Hermitian**. This means that it is equal to its conjugate transpose (take the transpose, then take the conjugate of each element). Here, H denotes the conjugate transpose.

$$ A = A^H $$

2. The matrix must be **positive-definite**. An n\*n Hermitian matrix is positive-definite if the following holds for every nonzero vector z of length n:

$$ z^HMz > 0 $$

Note that the expression above will always be real since M is a Hermitian matrix.

Now that we've established some conditions, let's talk about what Cholesky decomposition is. The **Cholesky decomposition** of a Hermitian positive-definite matrix A is as follows:

$$ A = LL^H $$

Here, L is a lower-triangular matrix. Its diagonal entries are real, positive numbers. L<sup>H</sup> is its conjugate transpose. 

How do we compute the Cholesky decomposition?

There are a few ways of doing this, but one of the least computationally intensive is to look at what happens when we multiply L and L<sup>H</sup>. We'll use a real 3x3 matrix to illustrate the process:

$$ A = LL^H =\begin{bmatrix}
a & 0 & 0 \\
b & c & 0 \\
d & e & f \\
\end{bmatrix}
\begin{bmatrix}
a & b & d \\
0 & c & e \\
0 & 0 & f \\
\end{bmatrix}= 
\begin{bmatrix}
a^2 & ab & ad \\
ab & b^2 + c^2 & bd + ce \\
ad & bd + ce & d^2 + e^2 + f^2 \\
\end{bmatrix} $$

We can now solve recursively for the entries of L from the coefficients of A. Note that we always take the positive square root to ensure that the elements on the main diagonal are positive.

$$ L_{11} = \sqrt{A_{11}} $$


$$ L_{21} = \frac{A_{21}}{L_{11}} $$


$$ L_{22} =  \sqrt{A_{22} - L_{21}^2} $$


$$ L_{31} = \frac{A_{31}}{L_{11}} $$


$$ L_{32} = \frac{A_{32} - L_{31}L_{21}}{L_{22}} $$


$$ L_{33} = \sqrt{A_{33} - L_{31}^2 - L_{32}^2} $$


We can use this to create formulas for the entries of L. The following formulas give us the correct entries for L:

$$ L_{j,j} = \sqrt{A_{j,j} - \sum\limits_{k=1}^{j-1}{L_{j,k}L_{j,k}^*}} $$
$$ L_{i,j} = \frac{1}{L_{j,j}}\left(A_{i,j} - \sum\limits_{k=1}^{j-1}{L_{i,k}L_{j,k}^*}\right), i > j$$
$$ 0, i < j $$

Here, the * operation is the conjugate of a complex number. Note that these formulas also apply to any Hermitian psoitive-definite matrix A.

Here's how to compute a Cholesky decomposition in Python. We simply use the built-in cholesky function in the scipy package:

In [None]:
A = np.array([np.array([3, 1]), 
              np.array([1, 4])])

scipy.linalg.cholesky(A, lower=True) # If lower = False, the default value, this returns L^H instead of L

### Properties of Cholesky decomposition

- The Cholesky decomposition of a Hermitian positive-definite matrix is unique.
- When A is real, L is also real and L<sup>H</sup> = L<sup>T</sup>.
- If A is Hermitian and positive-definite, L is invertible.
- The Cholesky decomposition can also be performed on Hermitian positive-semidefinite matrices, but it is not unique in that case. We also have to allow some of the diagonal entries of L to be zero.

## Diagonalization

A square n\*n matrix A is **diagonalizable** if its right eigenvectors form a basis of R<sup>n</sup>.

The diagonalization of A is as follows:

$$ A = PDP^{-1} $$

Here, P is an invertible matrix, and D is a diagonal matrix.

The elements on the main diagonal of D are the eigenvalues of A, and the columns of P are its eigenvectors. Let's do an example:

$$ A = \begin{bmatrix}
3 & 2 \\
3 & 4 
\end{bmatrix} $$

Let's start by computing the eigenvalues and eigenvectors of A. We'll use numpy to do this:

In [None]:
A = np.array([np.array([3, 2]),
              np.array([3, 4])])

eig = np.linalg.eig(A) # The first item returned is an array of eigenvalues, 
                       # and the second is an array of the corresponding eigenvectors
    
eig # Note that the eigenvectors are columns here, not rows, so (-0.71, -0.55) is not an eigenvector, but (-0.71, 0.71) is

Note that numpy returns eigenvectors of unit length.

First, we check that the eigenvectors form a basis of R<sup>2</sup>. Since this is R<sup>2</sup>, we only need to check that our vectors are not colinear. The dot product of (-0.71, 0.71) and (-0.55, -0.83) is different from 1 (the product of their lengths), so the vectors are not colinear.

In [None]:
abs(np.dot(eig[1][:,0], eig[1][:,1])) # We take the absoulte value since length is always positive

Let's rescale our eigenvectors to make calculations easier. Note that this won't have an effect on our results.

$$ (-0.71, 0.71) \implies (-1, 1) $$
$$ (-0.55, -0.83) \implies (2, 3) $$

Since this is the case, we know that A is diagonalizable.
The eigenvalues are 1 and 6. Then

$$ D = \begin{bmatrix}
1 & 0 \\
0 & 6 
\end{bmatrix} $$

Next, we want to find P. P is just a matrix of the eigenvectors. However, we have to make sure that we put the eigenvectors in the right order. In other words, our first column should be the eigenvector with eigenvalue 1, and the second should be the eigenvector with eigenvalue 6. In our example, we have

$$ P = \begin{bmatrix}
1 & 2 \\
-1 & 3 
\end{bmatrix} $$

### Properties of diagonalization

Taking powers of diagonal matrices is very easy. To compute D<sup>n</sup>, where D is diagonal, we can just take the nth power of each of the diagonal elements. For 2\*2 matrices, it looks like this:

$$ D = \begin{bmatrix}
a & 0 \\
0 & b 
\end{bmatrix} $$

$$ D^n = \begin{bmatrix}
a^n & 0 \\
0 & b^n 
\end{bmatrix} $$

This gives us an easy way to compute powers of diagonalizable matrices:

$$ A^n = (PDP^{-1})^n = PD^nP^{-1} $$

To prove this statement, we can use a proof by induction relying on the idea that

$$ A^n = (PDP^{-1})^n = (PDP^{-1})^{n-1} (PDP^{-1}) = PD^{n-1}P^{-1}PDP^{-1} = PD^nP^{-1} $$

Note that diagonalization is not unique.

If A is symmetric and diagonalizable, we can write diagonalization as

$$ A = PDP^T $$

where P is an orthogonal matrix (i.e. its columns are orthonormal). This works because all the eigenvectors of a symmetric matrix are orthogonal if their eigenvalues are different, and therefore the inverse and the transpose of P are the same.

The columns of P form an eigenbasis of R<sup>n</sup>.

## QZ-decomposition

To apply QZ-decomposition, also called **generalized Schur decomposition**, we start with a square matrices A and B with complex entries. We factor as follows:

$$ A = QSZ^* $$

$$ B = QTZ^* $$

Here, Q and Z are unitary matrices, and S and T are upper triangular. The * operation is the conjugate transpose.

Let's break this down. A **unitary matrix** U is simply one for which

$$ U^* = U^{-1} $$

In other words, the inverse and conjugate transpose of a unitary matrix are the same.

To compute a QZ decomposition in Python, we use scipy:

In [None]:
A = np.array([np.array([3, 2]),
              np.array([9, 4])])

B = np.array([np.array([3, 6]), 
              np.array([1, 4])])

scipy.linalg.qz(A, B) # This returns S, T, Q, and Z in that order

### Properties of QZ-decomposition

- The QZ-decomposition is not unique.
- If A and B are real, S is only quasi upper-triangular.
- All complex pairs of matrices have a QZ-decomposition.
- The ratio of the diagonal elements of S and T have the following property:

$$ \frac{S_{ii}}{T_{ii}} = \lambda $$
$$ Ax = \lambda Bx $$

The vectors x are the vectors in Z here (the analog of right eigenvectors).

## A Control Problem

Here we'll use an $LU$ decomposition to solve a control problem.  

Let $ L $ be the **lag operator**, so that, for sequence $ \{x_t\} $ we have $ L x_t = x_{t-1} $.

More generally, let $ L^k x_t = x_{t-k} $ with $ L^0 x_t = x_t $ and

$$
d(L) = d_0 + d_1 L+ \ldots + d_m L^m
$$

where $ d_0, d_1, \ldots, d_m $ is a given scalar sequence.

Consider the discrete-time control problem


<a id='equation-oneone'></a>
$$
\max_{\{y_t\}}
\lim_{N \to \infty} \sum^N_{t=0} \beta^t\,
\left\{
     a_t y_t - {1 \over 2}\, hy^2_t - {1 \over 2} \,
         \left[ d(L)y_t \right]^2
\right\}, \tag{1}
$$

where

- $ h $ is a positive parameter and $ \beta \in (0,1) $ is a discount factor.  
- $ \{a_t\}_{t \geq 0} $ is a sequence of exponential order less than $ \beta^{-1/2} $, by which we mean $ \lim_{t \rightarrow \infty} \beta^{\frac{t}{2}} a_t = 0 $.  


Maximization in [(1)](#equation-oneone) is subject to  initial conditions for $ y_{-1}, y_{-2} \ldots, y_{-m} $.

Maximization is over infinite sequences $ \{y_t\}_{t \geq 0} $.

### Example

The formulation of the LQ problem given above is broad enough to encompass
many useful models.

As a simple illustration, recall that in [LQ Control: Foundations](https://python-intro.quantecon.org/lqcontrol.html) we consider a monopolist facing stochastic demand
shocks and adjustment costs.

Let’s consider a deterministic version of this problem, where the monopolist
maximizes the discounted sum

$$
\sum_{t=0}^{\infty} \beta^t \pi_t
$$

and

$$
\pi_t = p_t q_t - c q_t - \gamma (q_{t+1} - q_t)^2
\quad \text{with} \quad
p_t = \alpha_0 - \alpha_1 q_t + d_t
$$

In this expression, $ q_t $ is output, $ c $ is average cost of production, and $ d_t $ is a demand shock.

The term $ \gamma (q_{t+1} - q_t)^2 $ represents adjustment costs.

You will be able to confirm that the objective function can be rewritten as [(1)](#equation-oneone) when

- $ a_t := \alpha_0 + d_t - c $  
- $ h := 2 \alpha_1 $  
- $ d(L) := \sqrt{2 \gamma}(I - L) $  


Further examples of this problem for factor demand, economic growth, and government policy problems are given in ch. IX of [[Sar87]](https://python-programming.quantecon.org/zreferences.html#sargent1987).

## Finite Horizon Theory

We first study a finite $ N $ version of the problem.

Later we will study an infinite horizon problem solution as a limiting version of a finite horizon problem.

(This will require being careful because the limits as $ N \to \infty $ of the necessary and sufficient conditions for maximizing finite $ N $ versions of [(1)](#equation-oneone)
are not sufficient for maximizing [(1)](#equation-oneone))

We begin by

1. fixing $ N > m $,  
1. differentiating the finite version of [(1)](#equation-oneone) with respect to $ y_0, y_1, \ldots, y_N $, and  
1. setting these derivatives to zero.  


For $ t=0, \ldots, N-m $ these first-order necessary conditions are the
*Euler equations*.

For $ t = N-m + 1, \ldots, N $, the first-order conditions are a set of
*terminal conditions*.

Consider the term

$$
\begin{aligned}
J
& = \sum^N_{t=0} \beta^t [d(L) y_t] [d(L) y_t]
\\
& = \sum^N_{t=0}
    \beta^t \, (d_0 \, y_t + d_1 \, y_{t-1} + \cdots + d_m \, y_{t-m}) \,
               (d_0 \, y_t + d_1 \, y_{t-1} + \cdots  + d_m\, y_{t-m})
\end{aligned}
$$

Differentiating $ J $ with respect to $ y_t $ for
$ t=0,\ 1,\ \ldots,\ N-m $ gives

$$
\begin{aligned}
{\partial {J} \over \partial y_t}
   & = 2 \beta^t \, d_0 \, d(L)y_t +
       2 \beta^{t+1} \, d_1\, d(L)y_{t+1} + \cdots +
       2 \beta^{t+m}\, d_m\, d(L) y_{t+m} \\
   & = 2\beta^t\, \bigl(d_0 + d_1 \, \beta L^{-1} + d_2 \, \beta^2\, L^{-2} +
       \cdots + d_m \, \beta^m \, L^{-m}\bigr)\, d (L) y_t\
\end{aligned}
$$

We can write this more succinctly as


<a id='equation-onetwo'></a>
$$
{\partial {J} \over \partial y_t}
    = 2 \beta^t \, d(\beta L^{-1}) \, d (L) y_t \tag{2}
$$

Differentiating $ J $ with respect to $ y_t $ for $ t = N-m + 1, \ldots, N $ gives


<a id='equation-onethree'></a>
$$
\begin{aligned}
 {\partial J \over \partial y_N}
 &= 2 \beta^N\, d_0 \, d(L) y_N \cr
   {\partial J \over \partial y_{N-1}}
 &= 2\beta^{N-1} \,\bigl[d_0 + \beta \,
   d_1\, L^{-1}\bigr] \, d(L)y_{N-1} \cr
   \vdots
 & \quad \quad \vdots \cr
   {\partial {J} \over \partial y_{N-m+1}}
 &= 2 \beta^{N-m+1}\,\bigl[d_0 + \beta
   L^{-1} \,d_1 + \cdots + \beta^{m-1}\, L^{-m+1}\, d_{m-1}\bigr]  d(L)y_{N-m+1}
\end{aligned} \tag{3}
$$

With these preliminaries under our belts, we are ready to differentiate [(1)](#equation-oneone).

Differentiating [(1)](#equation-oneone) with respect to $ y_t $ for $ t=0, \ldots, N-m $ gives the Euler equations


<a id='equation-onefour'></a>
$$
\bigl[h+d\,(\beta L^{-1})\,d(L)\bigr] y_t = a_t,
\quad t=0,\, 1,\, \ldots, N-m \tag{4}
$$

The system of equations [(4)](#equation-onefour) forms  a  $ 2 \times m $ order linear *difference
equation* that must hold for the values of $ t $ indicated.

Differentiating [(1)](#equation-oneone) with respect to $ y_t $ for $ t = N-m + 1, \ldots, N $ gives the terminal conditions


<a id='equation-onefive'></a>
$$
\begin{aligned}
\beta^N  (a_N - hy_N - d_0\,d(L)y_N)
&= 0  \cr
  \beta^{N-1} \left(a_{N-1}-hy_{N-1}-\Bigl(d_0 + \beta \, d_1\,
L^{-1}\Bigr)\, d(L)\, y_{N-1}\right)
& = 0 \cr
 \vdots & \vdots\cr
\beta^{N-m+1} \biggl(a_{N-m+1} - h y_{N-m+1} -(d_0+\beta L^{-1}
d_1+\cdots\  +\beta^{m-1} L^{-m+1} d_{m-1}) d(L) y_{N-m+1}\biggr)
& = 0
\end{aligned} \tag{5}
$$

In the finite $ N $ problem, we want simultaneously to solve [(4)](#equation-onefour) subject to the $ m $ initial conditions
$ y_{-1}, \ldots, y_{-m} $ and the $ m $ terminal conditions
[(5)](#equation-onefive).

These conditions uniquely pin down the solution of the finite $ N $ problem.

That is, for the finite $ N $ problem,
conditions [(4)](#equation-onefour) and [(5)](#equation-onefive) are necessary and sufficient for a maximum,
by concavity of the objective function.

Next, we describe how to obtain the solution using matrix methods.


<a id='fdlq'></a>

### Matrix Methods

Let’s look at how linear algebra can be used to tackle and shed light on the finite horizon LQ control problem.

#### A Single Lag Term

Let’s begin with the special case in which $ m=1 $.

We want to solve the system of $ N+1 $ linear equations


<a id='equation-oneff'></a>
$$
\begin{aligned}
\bigl[h & + d\, (\beta L^{-1})\, d\, (L) ] y_t = a_t, \quad
t = 0,\ 1,\ \ldots,\, N-1\cr
\beta^N & \bigl[a_N-h\, y_N-d_0\, d\, (L) y_N\bigr] = 0
\end{aligned} \tag{6}
$$

where $ d(L) = d_0 + d_1 L $.

These equations are to be solved for
$ y_0, y_1, \ldots, y_N $ as functions of
$ a_0, a_1, \ldots,  a_N $ and $ y_{-1} $.

Let

$$
\phi (L)
= \phi_0 + \phi_1 L + \beta \phi_1 L^{-1}
= h + d (\beta L^{-1}) d(L)
= (h + d_0^2 + d_1^2) + d_1 d_0 L+ d_1 d_0 \beta L^{-1}
$$

Then we can represent [(6)](#equation-oneff) as the matrix equation


<a id='equation-onefourfive'></a>
$$
\left[
    \begin{matrix}
        (\phi_0-d_1^2) & \phi_1 & 0 & 0 & \ldots & \ldots & 0 \cr
        \beta \phi_1 & \phi_0 & \phi_1 & 0 & \ldots & \dots & 0 \cr
        0 & \beta \phi_1 & \phi_0 & \phi_1 & \ldots & \ldots & 0 \cr
        \vdots &\vdots & \vdots & \ddots & \vdots & \vdots & \vdots \cr
        0 & \ldots & \ldots & \ldots & \beta \phi_1 & \phi_0 &\phi_1 \cr
        0 & \ldots & \ldots & \ldots & 0 & \beta \phi_1 & \phi_0
    \end{matrix}
\right]
\left [
    \begin{matrix}
        y_N \cr y_{N-1} \cr y_{N-2} \cr \vdots \cr
        y_1 \cr y_0
    \end{matrix}
\right ] =
\left[
\begin{matrix}
    a_N \cr a_{N-1} \cr a_{N-2} \cr \vdots \cr a_1 \cr
    a_0 - \phi_1 y_{-1}
\end{matrix}
\right] \tag{7}
$$

or


<a id='equation-onefoursix'></a>
$$
W\bar y = \bar a \tag{8}
$$

Notice how we have chosen to arrange the $ y_t $’s in reverse
time order.

The matrix $ W $ on the left side of [(7)](#equation-onefourfive) is “almost” a
[Toeplitz matrix](https://en.wikipedia.org/wiki/Toeplitz_matrix) (where each
descending diagonal is constant).

There are two sources of deviation from the  form  of a Toeplitz matrix

1. The first element differs from the remaining diagonal elements, reflecting the terminal condition.  
1. The sub-diagonal elements equal $ \beta $ time the super-diagonal elements.  


The solution of [(8)](#equation-onefoursix) can be expressed in the form


<a id='equation-onefourseven'></a>
$$
\bar y = W^{-1} \bar a \tag{9}
$$

which represents each element $ y_t $ of $ \bar y $ as a function of the entire vector $ \bar a $.

That is, $ y_t $ is a function of past, present, and future values of $ a $’s, as well as of the initial condition $ y_{-1} $.


#### An Alternative Representation

An alternative way to express the solution to [(7)](#equation-onefourfive) or
[(8)](#equation-onefoursix) is in so-called **feedback-feedforward** form.

The idea here is to find a solution expressing $ y_t $ as a function of *past* $ y $’s and *current* and *future* $ a $’s.

To achieve this solution, one can use an [LU decomposition](https://en.wikipedia.org/wiki/LU_decomposition) of $ W $.

There always exists a decomposition of $ W $ of the form $ W= LU $
where

- $ L $ is an $ (N+1) \times (N+1) $ lower triangular matrix.  
- $ U $ is an $ (N+1) \times (N+1) $ upper triangular matrix.  


The factorization can be normalized so that the diagonal elements of $ U $ are unity.

Using the LU representation in [(9)](#equation-onefourseven), we obtain


<a id='equation-onefournine'></a>
$$
U \bar y = L^{-1} \bar a \tag{10}
$$

Since $ L^{-1} $ is lower triangular, this representation expresses
$ y_t $ as a function of

- lagged $ y $’s (via the term $ U \bar y $), and  
- current and future $ a $’s (via the term $ L^{-1} \bar a $)  


Because there are zeros everywhere in the matrix
on the left of [(7)](#equation-onefourfive) except on the diagonal, super-diagonal, and
sub-diagonal, the $ LU $ decomposition takes

- $ L $ to be zero except in the diagonal  and the leading sub-diagonal.  
- $ U $ to be zero except on the diagonal and the super-diagonal.  


Thus, [(10)](#equation-onefournine) has the form

$$
\left[
\begin{matrix}
    1& U_{12} & 0 & 0 & \ldots & 0 & 0 \cr
    0 & 1 & U_{23} & 0 & \ldots & 0 & 0 \cr
    0 & 0 & 1 & U_{34} & \ldots & 0 & 0 \cr
    0 & 0 & 0 & 1 & \ldots & 0 & 0\cr
    \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots\cr
    0 & 0 & 0 & 0 & \ldots & 1 & U_{N,N+1} \cr
    0 & 0 & 0 & 0 & \ldots & 0 & 1
\end{matrix}
\right] \ \ \
\left[
\begin{matrix}
    y_N \cr y_{N-1} \cr y_{N-2} \cr y_{N-3} \cr \vdots \cr y_1 \cr y_0
\end{matrix}
\right] =
$$

$$
\quad
\left[
\begin{matrix}
    L^{-1}_{11} & 0 & 0 & \ldots & 0 \cr
    L^{-1}_{21} & L^{-1}_{22} & 0 & \ldots & 0 \cr
    L^{-1}_{31} & L^{-1}_{32} & L^{-1}_{33}& \ldots & 0 \cr
    \vdots & \vdots & \vdots & \ddots & \vdots\cr
    L^{-1}_{N,1} & L^{-1}_{N,2} & L^{-1}_{N,3} & \ldots & 0 \cr
    L^{-1}_{N+1,1} & L^{-1}_{N+1,2} & L^{-1}_{N+1,3} & \ldots &
    L^{-1}_{N+1\, N+1}
\end{matrix}
\right]
\left[
\begin{matrix}
    a_N \cr a_{N-1} \cr a_{N-2} \cr \vdots \cr a_1 \cr a_0 -
    \phi_1 y_{-1}
\end{matrix}
\right ]
$$

where $ L^{-1}_{ij} $ is the $ (i,j) $ element of $ L^{-1} $ and $ U_{ij} $ is the $ (i,j) $ element of $ U $.

Note how the left side for a given $ t $ involves  $ y_t $ and one lagged value $ y_{t-1} $ while the right side involves all future values of the forcing process $ a_t, a_{t+1}, \ldots, a_N $.


### Let's generate a simple LQ control example

and solve it with the $LU$ decomposition

In [None]:
c1=np.array([3.0, .9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])
# c1 = np.array([1.0, .8, 0.0])
a=np.ones(8)

d1 = .5
ym1 = 0.0 
a[7] = a[7] - ym1
r1 = c1
V1= scipy.linalg.toeplitz(np.transpose(c1),r1)
V1[0,0] = V1[0,0] - d1**2
print('V1 = ', V1)
V1inv = scipy.linalg.inv(V1)
P1, L1, U1 = scipy.linalg.lu(V1)
L1inv=scipy.linalg.inv(L1)
print('L1 = ', L1)
print('L1inv = ', L1inv)
print('U1 = ', U1)
print('P1 = ', P1)
Vinva = V1inv@a
print('Vinva =', Vinva)

### Covariogram as Toeplitz matrix



Just specify one side of the autocovariogram; equate  both the first row and the first column to it
then go.

We use this to compute finite dimensional moving average and autoregressive representations

Use this as a laboratory to show how you can violate "positive definiteness" of the autocovariance sequence


In [None]:
c1 = np.array([1.0, .9, .81, .9**3, .9**4, .9**5])
r1 = c1
V0= scipy.linalg.toeplitz(np.transpose(c1),r1)
print('V0 =', V0)
v,d = np.linalg.eig(V0)
print('v =', v)

F = scipy.linalg.cholesky(V0)

print('F.T = ', F.T)

FF = np.dot(F.T,F)

FF = F.T@F

V0 - FF

AR = scipy.linalg.inv(F.T)   # Moving average representation 

print("autoregressive representation")
print("AR = ", AR)

print("moving average representation")
print("MA = ", F.T)
print("notice striking first column and how it relates to the covariance matrix -- there is a theorem here")
print('V0 =', V0)


In [None]:
2.065/2.294

In [None]:
.393/.435