In [1]:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

## Quick Mathematical Prelude

Consider a function of the form $f:{\rm I\!R}^{m\times n} \to {\rm I\!R}$ (that maps matrices to scalars). An example of such a function is $f(A) = \|Ax + b\|_2^2$ (where $x$ and $b$ are constant). The derivative of $f$ with respect to $A$ is a function $\frac{\partial f(A)}{\partial A}$ is a matrix-valued function such that
$$\left(\frac{\partial f(A)}{\partial A}\right)_{i,j} = \frac{\partial f(A)}{\partial A_{i, j}}.$$
In what follows we will use the formula
$$\frac{\partial \|Ax-b\|_2^2}{\partial A} = (Ax-b)x^\intercal.$$

## Identification of linear systems

Consider the dynamical system
$$x_{t+1} = Ax_t + w_t,$$
with the following assumptions:
1. The noise, $w_t$, is iid, independent of the state, and has zero mean
2. We can directly measure the state, $x_t$, and we have collected a set of measurements $x_0, \ldots, x_N$
3. The matrix $A$ is unknown and we need to estimate it


### The least squares approach

At every time $t=0,\ldots, N-1$, the error is $w_t = Ax_t - x_{t+1}$. We can define the total error as
$$e = \sum_{t=0}^{N-1}\|w_t\|_2^2 = \sum_{t=0}^{N-1}\|Ax_t - x_{t+1}\|_2^2.$$
This is a function of $A$ and
$$\frac{\partial e(A)}{\partial A} = \sum_{t=0}^{N-1} (Ax_t - x_{t+1})x_t^\intercal = \sum_{t=0}^{N-1} Ax_tx_t^\intercal - \sum_{t=0}^{N-1} x_{t+1}x_t^\intercal.$$
In order to determine the value of $A$ that minimises the error we will set the derivative to zero and solve for $A$; we have
$$A \cdot \sum_{t=0}^{N-1} x_tx_t^\intercal = \sum_{t=0}^{N-1} x_{t+1}x_t^\intercal.$$
Provided $N > n$ chances are that $\sum_{t=0}^{N-1} x_tx_t^\intercal$ is full rank, thus invertible, so we can solve the above equation.

### Implementation
Let $X = [x_0 ~ \cdots ~ x_{N-1}]$ and $X^+ = [x_1 ~ \cdots ~ x_N]$. Then, the above equation becomes
$$A XX^\intercal = X^{+}X^\intercal \Leftrightarrow XX^\intercal A^\intercal = XX^{+\intercal}$$
We can now solve this with [`np.linalg.solve`](https://numpy.org/doc/stable/reference/generated/numpy.linalg.solve.html) to determine $A^\intercal$.

## Example

Consider a system with
$$A = \begin{bmatrix}0.8 & 0.1 & 0.1 \\ -0.1 & 0.8 & -0.2 \\ 0 & 0 & 0.5\end{bmatrix},$$
and $w_t \overset{\text{iid}}{\sim} \mathcal{N}(0, Q)$ with $Q = 0.01 \cdot I_3$. Let us generate $N$ states starting from $x_0=(1, 1, 1)$.

In [8]:
np.random.seed(1)

A = np.array([[0.8, -0.1, 0.1],
              [-0.1, 0.8, -0.2],
              [0, 0, 0.5]])
nx = A.shape[0]
n_samples = 50
Q = 0.01 * np.eye(nx)
X = np.zeros((nx, n_samples+1))
X[:, 0] = np.array([1, 1, 1])
for t in range(n_samples):
    wt = np.random.multivariate_normal(np.zeros(nx), Q)
    X[:, t+1] = A @ X[:, t] + wt

XXt = X[:, :n_samples-1] @ X[:, :n_samples-1].T
XplusXt = X[:, :n_samples-1] @ X[:, 1:n_samples].T

A_LS_estimate = np.linalg.solve(XXt, XplusXt).T
print(A_LS_estimate - A
      )

[[-0.04084323  0.10181821  0.03510555]
 [ 0.05443117 -0.12138291  0.02865516]
 [-0.16313922  0.19042983 -0.14296792]]


**Exercise:** Plot the estimation error against the number of samples.

In [None]:
# Your code goes here

## Exercise

Consider the system
$$x_{t+1} = \underbracket{\begin{bmatrix}0.8 & 0.1 & 0.1 \\ -0.1 & 0.8 & -0.2 \\ 0 & 0 & 0.5\end{bmatrix}}_{A}x_t + \begin{bmatrix}0\\0\\1\end{bmatrix}u_t + w_t,$$
where $w_t \overset{\text{iid}}{\sim} \mathcal{N}(0, Q)$ with $Q = 0.01 \cdot I_3$ and $u_t$ are *known* inputs. Suppose that matrix $A$ is unknown and estimate it using observations $x_0, \ldots, x_N$ and inputs $u_0, \ldots, u_{N-1}$.

In [3]:
# Your code goes here