In [None]:
%%capture
%load_ext autoreload
%autoreload 2
%matplotlib inline
%load_ext training_rl
%set_random_seed 12

In [None]:
%presentation_style

In [None]:
%load_latex_macros

In [None]:
%autoreload
import logging
import math
import os
import tempfile
import warnings
from pathlib import Path
from typing import Optional

import control as ct
import estimagic as em
import gymnasium as gym
import matplotlib.pyplot as plt
import mediapy as media
import numpy as np
from ipywidgets import interact, Output, widgets
from IPython.display import display, HTML
from numpy.typing import NDArray
from scipy.special import expit

from training_rl.environment import create_inverted_pendulum_environment

warnings.simplefilter("ignore", UserWarning)

<img src="_static/images/aai-institute-cover.svg" alt="Snow" style="width:100%;">
<div class="md-slide title">Introduction to Control Theory</div>

# Introduction

Control theory is a field of control engineering and applied mathematics that deals with the control of dynamical systems in engineered processes and machines. The objective is to develop a model or algorithm governing the application of system inputs to drive the system to a desired state, while minimizing any delay, overshoot, or steady-state error and ensuring a level of control stability; often with the aim to achieve a degree of optimality. 

## Control Theory and Reinforcement Learning

- System <-> Environment
- Controller (Regulator) <-> Agent (Policy)
- Control <-> Action
- Cost <-> Reward

# Inverted Pendulum

<div>
<div style="float: left; width: 60%; margin-right: 5%;">
    
An inverted pendulum is a pendulum that has its center of mass above its pivot point. It is unstable and without additional help will fall over.

The inverted pendulum is a classic problem in dynamics and control theory and is used as a benchmark for testing control strategies. It is often implemented with the pivot point mounted on a cart that can move horizontally under control of an electronic servo system as shown in the photo.
</div>
<div style="float: left; width: 30%;">
    <figure>
        <img src="_static/images/20_inverted_pendulum_photo.png" width="50%"/>
        <figcaption>
            Balancing cart, a simple robotics system circa 1976. <a href="#wiki_inverted_pendulum"><b id="wiki_inverted_pendulum-back">[Wiki Inverted Pendulum]</b></a>
        </figcaption>
    </figure>
</div>
</div>


In [None]:
%%html
<iframe width="800" height="600" src="https://www.youtube-nocookie.com/embed/AuAZ5zOP0yQ?si=1Lnyg2ghX6BJEEVX&amp;start=55" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

For the simulation we will use the [Inverted Pendulum](https://gymnasium.farama.org/environments/mujoco/inverted_pendulum/) environment from [gymnasium](https://gymnasium.farama.org/).

It has the following possible action and observations:

<table class="docutils align-default">
<thead>
<tr class="row-odd"><th class="head"><p>Num</p></th>
<th class="head"><p>Action</p></th>
<th class="head"><p>Control Min</p></th>
<th class="head"><p>Control Max</p></th>
<th class="head"><p>Name (in corresponding XML file)</p></th>
<th class="head"><p>Joint</p></th>
<th class="head"><p>Unit</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>0</p></td>
<td><p>Force applied on the cart</p></td>
<td><p>-3</p></td>
<td><p>3</p></td>
<td><p>slider</p></td>
<td><p>slide</p></td>
<td><p>Force (N)</p></td>
</tr>
</tbody>
</table>

<table class="docutils align-default">
<thead>
<tr class="row-odd"><th class="head"><p>Num</p></th>
<th class="head"><p>Observation</p></th>
<th class="head"><p>Min</p></th>
<th class="head"><p>Max</p></th>
<th class="head"><p>Name (in corresponding XML file)</p></th>
<th class="head"><p>Joint</p></th>
<th class="head"><p>Unit</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>0</p></td>
<td><p>position of the cart along the linear surface</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>slider</p></td>
<td><p>slide</p></td>
<td><p>position (m)</p></td>
</tr>
<tr class="row-odd"><td><p>1</p></td>
<td><p>vertical angle of the pole on the cart</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>hinge</p></td>
<td><p>hinge</p></td>
<td><p>angle (rad)</p></td>
</tr>
<tr class="row-even"><td><p>2</p></td>
<td><p>linear velocity of the cart</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>slider</p></td>
<td><p>slide</p></td>
<td><p>velocity (m/s)</p></td>
</tr>
<tr class="row-odd"><td><p>3</p></td>
<td><p>angular velocity of the pole on the cart</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>hinge</p></td>
<td><p>hinge</p></td>
<td><p>anglular velocity (rad/s)</p></td>
</tr>
</tbody>
</table>

In [None]:
env = create_inverted_pendulum_environment()
env.reset()
all_frames = []
for i in range(2):
    for _ in range(100):
        if i == 1:
            action = np.zeros_like(env.action_space.sample())
        else:
            action = env.action_space.sample()
        observation, _, terminated, truncated, _ = env.step(action)
        if terminated or truncated:
            all_frames.append(env.render())
            env.reset()
            break
env.close()

In [None]:
media.show_videos(all_frames, fps=1/env.dt)

## Questions

- What can we say about the system?
- How we balance the pendulum?

## Naive Control

If we try to balance an elongated object on our hand, we intuitively try to move our hand in the same direction in which the object is falling. We can use this intuition to make a very simple and straightforward controller.

$$
u(t) = K * \theta(t)
$$

with $K \gt 0$

## Exercise

Implement the naive controller described above and visualize the result.

What is the best value for the coefficient?

> **Hint** the second value in the `observation` vector represents the angle of the pendulum.
> You can use `theta = observation[[1]]` to get its value. 

##  Solution

In [None]:
def control_inverted_pendulum(K = widgets.FloatSlider(min=0.0, max=1000.0, step=10, value=10.0)):
    env = create_inverted_pendulum_environment()
    observation, _ = env.reset()
    for _ in range(100):
        theta = observation[[1]]
        action = K * theta
        observation, _, terminated, truncated, _ = env.step(action)
        if terminated or truncated:
            frames = env.render()
            env.reset()
            break
    env.close()
    media.show_video(frames, fps=1/env.dt)
interact(control_inverted_pendulum);

Let's try different values of $K$ and plot the different trajectories.

In [None]:
env = create_inverted_pendulum_environment()
initial_observation, _ = env.reset()
K_values = [0.1, 1.0, 2.0, 10.0]

all_observations = []

for K in K_values:
    env.set_state(initial_observation[:2], initial_observation[2:])
    observation = initial_observation.copy()
    observations = [observation]

    for _ in range(100):
        theta = observation[[1]]
        action = K * theta
        observation, _, terminated, truncated, _ = env.step(action)
        observations.append(observation)
        if terminated or truncated:
            env.reset()
            break
            
    observations = np.stack(observations)
    all_observations.append(observations)

In [None]:
for i, K in enumerate(K_values):
    plt.plot(
        np.arange(all_observations[i].shape[0]) * env.model.opt.timestep,
        all_observations[i][:, 1],
        label=f"{K=}"
    )
plt.legend()
plt.xlabel("Time")
plt.ylabel("Angle (rad)");

# Control Systems

There are two types of control loop:

- **Open-loop control (feedforward)**

  An open-loop control system operates without feedback, which means that the output is not measured or compared to the desired input. Thery are simple and inexpensive to implement. They are often used in systems where the output does not need to be precisely controlled. For example, a washing machine may use an open-loop control system to regulate the water level.

- **Closed-loop control (feedback)**

  A closed-loop control system, on the other hand, operates with feedback, meaning that the output is measured, and corrective action is taken to ensure it always matches the desired input. They are more complex and expensive to implement. However, they offer greater precision and accuracy in controlling the system's output. Closed-loop control systems are often used in critical applications, such as aerospace engineering or medical devices

## Types of Systems

- Time-Invariant (TI) or Time-Variant (TV).
- Linear or Non-Linear.
- Continuous or Discrete.
- Deterministic or Stochastic.

# Controller Design

1. Define a mathematical model that represents the system.
2. Determine properties of this system:

   - Identifiability.
   - Stability.
   - Observability.
   - Controllability.

3. Determine model's parameters, if they're not known already.
4. If it's a continous-time system, discretize it to obtain a discrete-time system.
5. (Optional) Linearize model around operating point.
6. Design a controller to stabilize the system.
7. Simulate the closed-loop system in order to validate the controller design.
8. Use controller with actual system.

# Modeling

To design a control system, it is first necessary to gain an understanding of how the system operates.
This understanding is typically expressed in the form of a mathematical model which describes the
steady state and dynamic behavior of the system.

This aspect of Control Engineering is closer to model-based Reinforcement Learning.

## Modeling Approaches

<div>
<div style="float: left; width: 50%; margin-right: 5%;">
    
- **White box**:
- **Black box**:
- **Grey box**: Combines both approaches to relax the need to exactly model the underlying physics, while requiring considerably less data than a pure black box approach.

</div>
<div style="float: left; width: 40%;">
    <figure>
        <img src="_static/images/20_modeling_approaches.svg" width="100%"/>
        <figcaption>
            White box models are based mainly on knowledge about the system.
            Blackbox models are built on statistical information from the data.
            Grey box modeling combines the two approaches. <a href="#duun_henriksen_2013"><b id="duun_henriksen_2013-back">[Duun-Henriksen et al., 2013]</b></a>
        </figcaption>
    </figure>
</div>
</div>

## System Representation

- Transfer Function
- State-Space
- Input-Output

### Transfer Function Representation

A transfer function of a system, sub-system, or component is a mathematical function that models the system's output for each possible input.

Transfer functions are commonly used in the analysis of systems such as single-input single-output filters in the fields of signal processing, communication theory, and control theory. The term is often used exclusively to refer to linear time-invariant (LTI) systems. Most real systems have non-linear input/output characteristics, but many systems, when operated within nominal parameters (not "over-driven") have behavior close enough to linear that LTI system theory is an acceptable representation of the input/output behavior.

For continuous-time input signal $x(t)$ and output $y(t)$, dividing the Laplace transform of the output, $Y(s) = \mathcal{L}\left\{y(t)\right\}$, by the Laplace transform of the input, $X(s) = \mathcal{L}\left\{x(t)\right\}$, yields the system's transfer function $H(s)$:

$$
H(s) = \frac{Y(s)}{X(s)} = \frac{ \mathcal{L}\left\{y(t)\right\} }{ \mathcal{L}\left\{x(t)\right\}}
$$

which can be rearranged as:

$$
Y(s)=H(s)X(s)
$$

Where $s = \sigma + j \cdot \omega$ is a complex variable. When we're only interested in the steady-state response of the system then it is sufficient to set $\sigma = 0$ (thus $s = j \cdot \omega$), which reduces the Laplace transforms with complex arguments to Fourier transforms with real argument $\omega$.

The transfer function was the primary tool used in classical control engineering. However, it has proven to be unwieldy for the analysis of multiple-input multiple-output (MIMO) systems, and has been largely supplanted by state space representations for such systems. In spite of this, a transfer matrix can always be obtained for any linear system, in order to analyze its dynamics and other properties: each element of a transfer matrix is a transfer function relating a particular input variable to an output variable.

### State-Space Representation

A state-space representation is a mathematical model of a physical system specified as a set of input, output and variables related by first-order (not involving second derivatives) differential equations or difference equations.

Such variables, called state variables, evolve over time in a way that depends on the values they have at any given instant and on the externally imposed values of input variables. Output variables’ values depend on the values of the state variables.

In the general case, for continuous-time systems:

$$
\dot{\mathbf{x}}(t) = f(x(t), u(t)) \\
\mathbf{y}(t) = g(x(t), u(t))
$$

for discrete-time systems:

$$
\mathbf{x}(k+1) = f(x(k), u(k)) \\
\mathbf{y}(k+1) = g(x(k), u(k))
$$

| <div style="width:290px">System type</div> | State-space model |
|:-------------|:-----------------:|
| Continuous time-invariant  | $$
\dot{\mathbf{x}}(t)=  A \mathbf{x}(t) + B \mathbf {u} (t)\\
\mathbf{y}(t) = C \mathbf{x}(t) + D \mathbf{u}(t)
$$ |
| Continuous time-variant | $$
\dot{\mathbf{x}}(t) = A(t) \mathbf{x}(t) + B(t) \mathbf{u}(t) \\
\mathbf{y}(t) = C(t) \mathbf{x}(t) + D(t) \mathbf{u}(t)
$$ |
| Discrete time-invariant | $$
\mathbf{x}(k+1) = A \mathbf{x}(k) + B \mathbf{u}(k) \\
\mathbf{y}(k+1) = C \mathbf{x}(k) + D \mathbf{u}(k)
$$ |
| Discrete time-variant | $$
\mathbf{x}(k+1) = A(k) \mathbf{x}(k) + B(k) \mathbf{u}(k) \\
\mathbf{y}(k+1) = C(k) \mathbf{x}(k) + D(k) \mathbf{u}(k)
$$ |
| Laplace domain of continuous time-invariant| $$
s\mathbf{X}(s) - \mathbf{x}(0) = A \mathbf{X}(s) + B \mathbf{U}(s) \\
\mathbf{Y}(s) = C \mathbf{X}(s) + D \mathbf{U}(s)
$$ |
| Z-domain of discrete time-invariant | $$
z\mathbf{X}(z) - z\mathbf{x}(0) = A \mathbf{X}(z) + B \mathbf{U}(z) \\
\mathbf{Y}(z) = C \mathbf{X}(z) + D \mathbf{U}(z)
$$

[<b id="wiki_state_space-back">[Wiki State-Space, 2023]</b>](#wiki_state_space)

### Input-Output Representation

$$
\mathbf{y}(k) = \mathbf{h}(\mathbf{y}(k-1), \mathbf{y}(k-2), \dots, u\mathbf{u}(k), \mathbf{u}(k-1), \dots)
$$

# Inverted Pendulum Model


<div>
<figure style="float: left; width: 40%;">
    <img src="_static/images/20_inverted_pendulum.svg" width="50%"/>
    <figcaption>
        Inverted pendulum model <a href="#goodwin_control_2000"><b id="goodwin_control_2000-back">[Goodwin et al., 2000]</b></a>
    </figcaption>
</figure>
<div style="float: left;">
    
- $y(t)$: distance along the horizontal axis from some reference point.
- $\theta(t)$: angle of the pendulum.
- $M$: mass of the cart.
- $m$: mass of the pendulum (assumed to be concentrated at the tip).
- $l$: length of the pendulum.
- $f(t)$: force applied on the cart.
</div>
</div>

Application of Newtonian physics to this system leads to the following model:

$$
\ddot{y} = \frac{1}{\lambda_m + \sin^2\theta(t)} \left[
\frac{f(t)}{m} + \dot{\theta}^2 l \sin\theta(t) - g \cos\theta(t) \sin\theta(t)
\right]
\\
\ddot{\theta} = \frac{1}{l\lambda_m + \sin^2\theta(t)} \left[
-\frac{f(t)}{m}\cos\theta(t) + \dot{\theta}^2 l \sin\theta(t) \cos\theta(t) + (1 - \lambda_m) \sin\theta(t)
\right]
$$

where $\lambda_m = \frac{M}{m}$

We're only interested in controlling the pendulum's angle, so we can ignore the first equation.

We can convert this to state space form with input $u(t) = f(t)$ and output
$y(t)$; by introducing:

$$
X(t) = \begin{bmatrix}
x_1(t) \\ x_2(t)
\end{bmatrix}
= \begin{bmatrix}
\theta(t) \\ \dot{\theta}(t) 
\end{bmatrix}
$$

$$
\dot{X}(t) = \begin{bmatrix}
\dot{x_1}(t) \\ \dot{x_2}(t)
\end{bmatrix} =
\begin{bmatrix}
x_2(t) \\
\frac{1}{l\lambda_m + \sin^2 x_1(t)} \left[
-\frac{u(t)}{m}\cos x_1(t) + x_2(t)^2 l \sin x_1(t) \cos x_1(t) + (1 - \lambda_m) \sin x_1(t)
\right] \\
\end{bmatrix}
$$

These equations are nonlinear. However, for small departures of $x_1$ (i.e. $\theta$) from the
vertical position we can linearize about $x_1 = 0$, $x_2 = 0$:

$$
\dot{X}(t) = \begin{bmatrix}
\dot{x_1}(t) \\ \dot{x_2}(t)
\end{bmatrix} =
\begin{bmatrix}
x_2(t) \\
\frac{1}{l\lambda_m} \left[
-\frac{u(t)}{m} + (1 - \lambda_m) x_1(t)
\right] \\
\end{bmatrix} =
\begin{bmatrix}
0 & 1\\
0 & \frac{(M + m)g}{Ml}\\
\end{bmatrix}
\begin{bmatrix}
x_1(t) \\ x_2(t)
\end{bmatrix}
+
\begin{bmatrix}
0 \\ -\frac{1}{Ml}
\end{bmatrix}
\begin{bmatrix}
u(t)\\
\end{bmatrix}
$$

This leads to a linear state-space model with matrices:

$$
A = \begin{bmatrix}
0 & 0 & 1 & 0\\
0 & 0 & 0 & 1\\
0 & -\frac{mg}{M} & 0 & 0 \\
0 & \frac{(M + m)g}{Ml} & 0 & 0\\
\end{bmatrix};
B = \begin{bmatrix}
0 \\ 0 \\ \frac{1}{M} \\ -\frac{1}{Ml}
\end{bmatrix};
C = \begin{bmatrix}
1 & 0 \\
\end{bmatrix};
D = \begin{bmatrix}
0
\end{bmatrix}
$$

## System Identification

## Structural Identifiability

### Parameter Estimation

Luckily for us, the system parameters are easy to extract from the simulation environment

In [None]:
g = 9.81
l = env.model.geom_pos[2, 2]
m = env.model.body_mass[2]
M = env.model.body_mass[1]
lambda_m = M / m
print(f"{l=}, {m=}, {M=}")

In [None]:
# Dynamics matrix
A = np.array([
    [ 0, 1],
    [(1+lambda_m)*g/(lambda_m*l), 0],
])
# Input matrix
B = np.array([[
    0,
    -1/(M*l)
]]).transpose()
# Output matrices
C = np.array([
    [1, 0],
])
D = np.zeros(1)

In [None]:
inverted_pendulum = ct.ss(A, B, C, D)
inverted_pendulum

In [None]:
tf = ct.ss2tf(inverted_pendulum)
tf

We now discretize the system using the simulation environment's timestep

In [None]:
inverted_pendulum = inverted_pendulum.sample(env.dt)
inverted_pendulum.name = "inverted_pendulum"
inverted_pendulum

In [None]:
tf = ct.ss2tf(inverted_pendulum)
tf

# Stability

There are many notions of stability in Control Theory:

- **BIBO Stability**

  A system is bounded-input, bounded-output stable (**BIBO** stable) if, for every bounded input, the output is finite. Mathematically, if every input satisfying

    $$
    ||x(t)||_\infty \lt \infty
    $$

    leads to an output satisfying 

    $$
    ||y(t)||_\infty \lt \infty
    $$

A Linearly Time-Invariant (LTI) system is stable if and only if:

- All the poles are in the left half of the complex plane.
- All the eigenvalues of $A$ have negative real parts.
- We can find two $p \times p$ matrices $M$ and $N$ such that satisfy the Lyapunov Equation:

  $$
  MA + A^TM = -N
  $$
  
  and $N$ is an arbitrary positive definite matrix, and $M$ is a unique positive definite matrix.

## Exercise

- Determine whether the linearized inverted pendulum system is stable.

> **Hint** 
>
> - Use the `poles()` method of the `inverted_pendulum` object to determine the system's poles.
> - Use `np.linalg.eig` to determine the eigenvalues of the system matrix.
> - Use `ct.lyap` to solve the lyapunov equation.

## Solution

In [None]:
poles = inverted_pendulum.poles()
poles

In [None]:
np.all(np.real(poles) <= 0.0)

This means that the system is unstable.

For the sake of demonstration, let's also compute the system matrix' eigenvalues and compare them to the poles

In [None]:
result = np.linalg.eig(inverted_pendulum.A)
eigenvalues = result.eigenvalues
eigenvalues

In [None]:
np.all(np.real(eigenvalues) <= 0.0)

In [None]:
np.all(poles == eigenvalues)

We can visualize the system's poles and zeros

In [None]:
ct.pzmap(inverted_pendulum);

# Observability

Observability is a measure for how well internal states of a system can be inferred by knowledge of its external outputs..

For a continuous linear time-invariant system, the $n \times nr$ observability matrix is given by:

$O=\begin{bmatrix}C \\ CA \\ CA^{2} \\ \dots \\ CA^{n-1}\end{bmatrix}$

The system is controllable if the observability matrix has full row rank (i.e. $\operatorname{rank}(O)=n$).

In [None]:
O = ct.obsv(inverted_pendulum.A, inverted_pendulum.C)
print(O)

In [None]:
np.linalg.matrix_rank(O)

## Observer Design

### Kalman Filter

$$
\mathbf{x}(k+1) = A(k)\mathbf{x}(k) + B(k)\mathbf{u}(k) + \mathbf{w}(k)\\
\mathbf{y}(k+1) = C(k)\mathbf{x}(k) + \mathbf{v}(k)
$$

Where:

- $w(k)$ is the process noise, which is assumed to be drawn from a zero-mean multivariate normal distribution,
  $\mathcal{N}$, with covariance, $\mathbf{w}_{k} \sim {\mathcal {N}}\left(0,\mathbf {Q} _{k}\right)$.
- $v(k)$ is the observation noise, which is assumed to be zero-mean Gaussian white noise with covariance 
  $\mathbf {v} _{k}\sim {\mathcal {N}}\left(0,\mathbf {R} _{k}\right)$.

In [None]:
Q = np.diag([100])
R = np.diag([10.0]) * inverted_pendulum.dt
# Initial State Covariance
P0 = np.diag([0.0, 10000])
estimator = ct.create_estimator_iosystem(inverted_pendulum, Q, R, P0=P0)
estimator.name = "estimator"
print(estimator)

We collect some data from the environment

In [None]:
env = create_inverted_pendulum_environment(max_steps=200)
initial_observation, _ = env.reset()
K = 200.0

observation = initial_observation.copy()
observations = [observation]
actions = []

for _ in range(200):
    theta = observation[[1]]
    action = K * theta
    actions.append(action)
    observation, _, terminated, truncated, _ = env.step(action)
    observations.append(observation)
    if terminated or truncated:
        env.reset()
        break

observations = np.stack(observations)
actions = np.stack(actions)

And run the estimator on the trajectory

In [None]:
dt = inverted_pendulum.dt
T = np.arange(0, len(observations)*dt - dt, dt)
U = np.concatenate([observations[1:, [1]], actions], axis=1).transpose()
X0 = np.zeros(2)
estimator_response = ct.input_output_response(estimator, T, U, X0)

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, sharex=True)
ax1.errorbar(
    estimator_response.time,
    estimator_response.outputs[0],
    estimator_response.states[estimator.find_state("P[0,0]")],
    fmt="b-",
    label="Estimated"
)
ax1.plot(estimator_response.time, observations[1:, 1], label="Ground Truth")
ax1.set_xlabel("Time")
ax1.set_ylabel("Angle (rad)")
ax1.legend()
ax2.errorbar(
    estimator_response.time,
    estimator_response.outputs[1],
    estimator_response.states[estimator.find_state("P[1,1]")],
    fmt="b-",
    label="Estimated"
)
ax2.plot(estimator_response.time, observations[1:, 3], label="Ground Truth")
ax2.set_xlabel("Time")
ax2.set_ylabel("Angular Velocity (rad/s)");
ax2.legend()
fig.tight_layout()

We also run predictions in the future to see what happens next

In [None]:
T_predict = np.arange(T[-1], T[-1] + 0.2 + dt, dt)
U_predict = np.outer(U[:, -1], np.ones_like(T_predict))
predicted_response = ct.input_output_response(
    estimator, T_predict, U_predict, estimator_response.states[:, -1],
    params={'correct': False}
)

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, sharex=True)
ax1.errorbar(
    estimator_response.time,
    estimator_response.outputs[0],
    estimator_response.states[estimator.find_state("P[0,0]")],
    fmt="b-",
    label="Estimated"
)
ax1.plot(estimator_response.time, observations[1:, 1], label="Ground Truth")
ax1.errorbar(
    predicted_response.time,
    predicted_response.outputs[0],
    predicted_response.states[estimator.find_state("P[0,0]")],
    fmt="r-",
    label="Predicted",
)
ax1.legend()
ax1.set_xlabel("Time")
ax1.set_ylabel("Angle (rad)")
ax2.errorbar(
    estimator_response.time,
    estimator_response.outputs[1],
    estimator_response.states[estimator.find_state("P[1,1]")],
    fmt="b-",
    label="Estimated"
)
ax2.plot(estimator_response.time, observations[1:, 3], label="Ground Truth")
ax2.errorbar(
    predicted_response.time,
    predicted_response.outputs[1],
    predicted_response.states[estimator.find_state("P[1,1]")],
    fmt="r-",
    label="Predicted",
)
ax2.legend()
ax2.set_xlabel("Time")
ax2.set_ylabel("Angular Velocity (rad/s)")
fig.tight_layout()

# Controllability

The state controllability condition implies that it is possible - by admissible inputs - to steer the states from any initial value to any final value within some finite time window.

For a continuous linear time-invariant system, the $n \times nr$ controllability matrix is given by:

$R=\begin{bmatrix}B & AB & A^{2}B & \dots & A^{n-1}B\end{bmatrix}$

The system is controllable if the controllability matrix has full row rank (i.e. $\operatorname{rank}(R)=n$). 

In [None]:
R = ct.ctrb(inverted_pendulum.A, inverted_pendulum.B)
print(R)

In [None]:
np.linalg.matrix_rank(R)

# Controller Design

## Full State Feedback

Full state feedback (FSF), or pole placement, is a method employed in feedback control system theory to place the closed-loop poles of a system in pre-determined locations in the s-plane. Placing poles is desirable because the location of the poles corresponds directly to the eigenvalues of the system, which control the characteristics of the response of the system. The system must be considered controllable in order to implement this method. 

We want to design a controller such that we can place the poles (eigenvalues) of our system at desired locations.

We choose the following control law:

$$
\mathbf{u}(t) = -K \mathbf{x}(t)
$$

By replacing this into the system's state-space representation we obtain the closed-loop dynamics:

$$
\dot{\mathbf{x}}(t) =  (A - BK) \mathbf{x}(t)\\
\mathbf{y}(t) = (C - DK) \mathbf{x}(t)
$$

The poles of this new system are the eigenvalues of $A - BK$. 

In [None]:
ct.pzmap(inverted_pendulum);

In [None]:
K = ct.place(inverted_pendulum.A, inverted_pendulum.B, np.array([-0.2 + 0.5j, -0.2 - 0.5j]))
K

In [None]:
A_fsfbk = inverted_pendulum.A - inverted_pendulum.B * K
B_fsfbk = np.zeros(2)
C_fsfbk = inverted_pendulum.C - inverted_pendulum.D * K
D_fsfbk = np.zeros(1)
closed_loop = ct.ss(A_fsfbk, B_fsfbk, C_fsfbk, D_fsfbk, dt=inverted_pendulum.dt)
closed_loop

In [None]:
ct.pzmap(closed_loop);

In [None]:
n_steps = 1000
x0 = np.zeros(closed_loop.nstates)
x0[0] = 0.01
u0 = 0.0
T = np.arange(0, n_steps) * closed_loop.dt
response = ct.input_output_response(
    closed_loop, T, u0, x0
)

In [None]:
fig, ax = plt.subplots(1, 1, sharex=True)
ax.plot(response.time, response.outputs)
ax.set_xlabel("Time")
ax.set_ylabel("Angle (rad)")
fig.tight_layout()

### Evaluation

In [None]:
env = create_inverted_pendulum_environment(max_steps=1000)
initial_observation, _ = env.reset()

observation = initial_observation.copy()
observations = [observation]
actions = []

for _ in range(1000):
    # errors = estimator.updfcn(0.0, errors, observation[[1, 3]], params)
    # action = pid_controller.outfcn(0.0, errors, action, params)
    action = -K @ observation[[1, 3]]
    actions.append(action)
    observation, _, terminated, truncated, _ = env.step(action)
    observations.append(observation)
    if terminated or truncated:
        frames = env.render()
        break

observations = np.stack(observations)
env.close()

In [None]:
media.show_video(frames, fps=1/env.dt)

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, sharex=True)
T = np.arange(observations.shape[0]) * env.model.opt.timestep
ax1.plot(T, observations[:, 0])
ax1.set_xlabel("Time")
ax1.set_ylabel("Position (m)")
ax2.plot(T, observations[:, 1])
ax2.set_xlabel("Time")
ax2.set_ylabel("Angle (rad)")
ax3.plot(T[1:], actions)
ax3.set_xlabel("Time")
ax3.set_ylabel("Force")
fig.tight_layout()

## PID Controller

Proportional–integral–derivative (PID) controller is a control loop mechanism employing feedback that is widely used in industrial control systems and a variety of other applications requiring continuously modulated control.

The overall control function: 

$$
u(t) = K_{\text{p}}e(t) + K_{\text{i}}\int _{0}^{t}e(\tau )\,\mathrm {d} \tau +K_{\text{d}}{\frac {\mathrm {d} e(t)}{\mathrm {d} t}},
$$

where $e(t) = y(t) - r(t)$, $K_{\text{p}}$, $K_{\text{i}}$, and $K_{\text{d}}$, all non-negative, denote the coefficients for the proportional, integral, and derivative terms respectively.

The use of the PID algorithm does not guarantee optimal control of the system or its control stability but in practice it works really well for simple systems. It is broadly applicable since it relies only on the response of the measured process variable, not on knowledge or a model of the underlying process.

In [None]:
def pid_controler_update(t: float, x: NDArray, u: NDArray, params: dict):        
    e = u[0]
    e_i = x[1] + e * inverted_pendulum.dt
    e_d = (e - x[0]) /  inverted_pendulum.dt
    return np.array([e, e_i, e_d])

def pid_controler_output(t: float, x: NDArray, u: NDArray, params: dict):
    Kp = params.get('Kp', 1.0)
    Ki = params.get('Ki', 0)
    Kd = params.get('Kd', 0)
    return np.array([Kp * x[0] + Ki * x[1] + Kd * x[2]])

pid_controller = ct.NonlinearIOSystem(
    pid_controler_update,
    pid_controler_output,
    name="pid",
    inputs=('y[0]'),
    states=("e[0]", "e[1]", "e[2]"),
    outputs=('u[0]'),
    dt=inverted_pendulum.dt,
)
print(pid_controller)

In [None]:
closed_loop = inverted_pendulum.feedback(pid_controller, sign=1)
print(closed_loop)

In [None]:
n_steps = 500
T = np.arange(0, n_steps) * dt
x0 = np.array([0.01, 0.0])
u0 = 0.0
params = {"Kp": 300.0, "Ki": 50, "Kd": 30}
response = ct.input_output_response(
    closed_loop, T, u0, x0, params=params
)

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, sharex=True)
ax1.plot(response.time, response.outputs)
ax1.set_xlabel("Time")
ax1.set_ylabel("Angle (rad)")
ax2.plot(response.time, response.inputs)
ax2.set_xlabel("Time")
ax2.set_ylabel("Input")
fig.tight_layout()

### Evaluation

In [None]:
env = create_inverted_pendulum_environment(max_steps=1000)
initial_observation, _ = env.reset()

observation = initial_observation.copy()
observations = [observation]
actions = []

errors = np.zeros(3)
action = np.zeros(1)

for _ in range(1000):
    errors = pid_controller.updfcn(0.0, errors, observation[[1, 3]], params)
    action = pid_controller.outfcn(0.0, errors, action, params)
    actions.append(action)
    observation, _, terminated, truncated, _ = env.step(action)
    observations.append(observation)
    if terminated or truncated:
        frames = env.render()
        env.reset()
        break

observations = np.stack(observations)
env.close()

In [None]:
media.show_video(frames, fps=1/env.dt)

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, sharex=True)
T = np.arange(observations.shape[0]) * env.model.opt.timestep
ax1.plot(T, observations[:, 0])
ax1.set_xlabel("Time")
ax1.set_ylabel("Position (m)")
ax2.plot(T, observations[:, 1])
ax2.set_xlabel("Time")
ax2.set_ylabel("Angle (rad)")
ax3.plot(T[1:], actions)
ax3.set_xlabel("Time")
ax3.set_ylabel("Force")
fig.tight_layout()

<img src="_static/images/aai-institute-cover.svg" alt="Snow" style="width:100%;">
<div class="md-slide title">Thank you for your attention!</div>

## References

- [<b id="duun_henriksen_2013">[Duun-Henriksen et al., 2013]</b>](#duun_henriksen_2013-back) [Model identification using stochastic differential equation grey-box models in diabetes](https://journals.sagepub.com/doi/abs/10.1177/193229681300700220) - Duun-Henriksen, Anne Katrine, et al. Journal of diabetes science and technology 7, no. 2 (2013): 431-440.

- [<b id="goodwin_control_2000">[Goodwin et al., 2000]</b>](#goodwin_control_2000-back) [Control System Design](https://ds.amu.edu.et/xmlui/bitstream/handle/123456789/17461/Graham%20C.%20Goodwin%2C%20Stefan%20F.%20Graebe%2C%20Mario%20E.%20Salgado-Control%20System%20Design%20-Prentice%20Hall%20%282000%29.pdf?sequence=1&isAllowed=y) - Goodwin, Graham C., Stefan F. Graebe, and Mario E. Salgado. (2000)

- [<b id="nijmeijer_nonlinear_1990">[Nijmeijer et al. 1990]</b>](#nijmeijer_nonlinear_1990-back) [Nonlinear dynamical control systems.](https://link.springer.com/book/10.1007/978-1-4757-2101-0) Nijmeijer, Henk, and Arjan Van der Schaft. Vol. 464, no. 2. New York: Springer-verlag, 1990.

- [<b id="wiki_state_space">[Wiki State-Space, 2023]</b>](#wiki_state_space-back) [State-space representation](https://en.wikipedia.org/w/index.php?title=State-space_representation&oldid=1175410959) Wikipedia, The Free Encyclopedia, (accessed September 26, 2023). 

- [<b id="wiki_inverted_pendulum">[Wiki Inverted Pendulum, 2023]</b>](#wiki_inverted_pendulum-back) [Inverted pendulum](https://en.wikipedia.org/w/index.php?title=Inverted_pendulum&oldid=1152479964), Wikipedia, The Free Encyclopedia, (accessed September 24, 2023).