In [1]:
%%capture
%load_ext autoreload
%autoreload 2
%matplotlib inline
%load_ext training_rl
%set_random_seed 12

In [2]:
%presentation_style

In [3]:
%load_latex_macros


$\newcommand{\vect}[1]{{\mathbf{\boldsymbol{#1}} }}$
$\newcommand{\amax}{{\text{argmax}}}$
$\newcommand{\P}{{\mathbb{P}}}$
$\newcommand{\E}{{\mathbb{E}}}$
$\newcommand{\R}{{\mathbb{R}}}$
$\newcommand{\Z}{{\mathbb{Z}}}$
$\newcommand{\N}{{\mathbb{N}}}$
$\newcommand{\C}{{\mathbb{C}}}$
$\newcommand{\abs}[1]{{ \left| #1 \right| }}$
$\newcommand{\simpl}[1]{{\Delta^{#1} }}$


In [4]:
%autoreload
import warnings

import control as ct
import gymnasium as gym
import matplotlib.pyplot as plt
import matplotx
import mediapy as media
import mujoco
import numpy as np
import seaborn as sns
from ipywidgets import interact, widgets
from numpy.typing import NDArray
from scipy.signal import find_peaks

from training_rl.environment import create_inverted_pendulum_environment, create_mass_spring_damper_environment

warnings.simplefilter("ignore", UserWarning)
sns.set_theme()
plt.rcParams["figure.figsize"] = [9, 5]

ModuleNotFoundError: No module named 'training_rl.control'

<img src="_static/images/aai-institute-cover.svg" alt="Snow" style="width:100%;">
<div class="md-slide title">Introduction to Control Theory</div>

# Introduction

Control theory is a field of control engineering and applied mathematics that deals with the control of dynamical systems in engineered processes and machines. The objective is to develop a model or algorithm governing the application of system inputs to drive the system to a desired state, while minimizing any delay, overshoot, or steady-state error and ensuring a level of control stability; often with the aim to achieve a degree of optimality. 

There are few different branches of Control Theory:

- **Optimal Control**: deals with finding a control for a dynamical system over a period of time such that an objective function is optimized.
- **Adaptive Control**: adapt to a controlled system with parameters which vary, or are initially uncertain.
- **Robust Control**: an approach to controller design that explicitly deals with uncertainty.

The central problem in control is to find a technically feasible way to act on a
given process so that the process behaves, as closely as possible, to some desired
behavior. Furthermore, this approximate behavior should be achieved in the face of
uncertainty of the process and in the presence of uncontrollable external disturbances
acting on the process.

- **Desired behavior** This needs to be specified as part of the design problem.
- **Feasibility** This means that the solution must satisfy various constraints,
  which can be of technical, environmental, economic or other nature.
- **Uncertainty** The available knowledge about a system will usually be limited
  and of limited accuracy.
- **Action** The solution requires that action be somehow applied to the process
  typically via one or more manipulated variables which command the actuators.

- **Disturbances** The process to be controlled will typically have inputs other
  than those that are manipulated by the controller. These other inputs are
  called disturbances.
- **Approximate behavior** A feasible solution will rarely be perfect. There
  will invariably be a degree of approximation in achieving the specified goal.
- **Measurements** These are crucial to let the controller know what the system
  is actually doing and how the unavoidable disturbances are affecting it.


</div>
<div style="float: left; width: 50%;">
    <figure>
        <img src="_static/images/20_feedback_block_diagram.svg" width="100%"/>
        <figcaption>
            Feedback Control in Control Engineering
        </figcaption>
    </figure>
</div>
<div style="float: right; width: 45%;">
    <figure>
        <img src="_static/images/20_reinforcement_learning_block_diagram.svg" width="100%"/>
        <figcaption>
            Feedback Control in Reinforcement Learning
        </figcaption>
    </figure>
</div>
</div>

## Control Theory and Reinforcement Learning

### Terminology

Here are a list of terms commonly used in Reinforcement Learning, and their control counterparts:

<ol type="a">
    <li><b>Environment</b> = System</li>
    <li><b>Agent (Policy)</b> = Controller or Regulator</li>
    <li><b>Action</b> = Decision or Control</li>
    <li><b>Observation</b> = Measurement</li>
    <li><b>Reward</b> = (Opposite of) Cost</li>
</ol>

# Example Systems

## Mass-Spring Damper

The mass-spring-damper system consists of discrete mass nodes distributed throughout an object and interconnected via a network of springs and dampers. This model is well-suited for modelling object with complex material properties such as nonlinearity and viscoelasticity.

For the simulation we will use a custom MassSpringDamper environment created using [mujoco](https://mujoco.org/) and [gymnasium](https://gymnasium.farama.org/).

It has the following possible action and observations:

<table class="docutils align-default">
<thead>
<tr class="row-odd"><th class="head"><p>Num</p></th>
<th class="head"><p>Action</p></th>
<th class="head"><p>Control Min</p></th>
<th class="head"><p>Control Max</p></th>
<th class="head"><p>Name (in corresponding XML file)</p></th>
<th class="head"><p>Joint</p></th>
<th class="head"><p>Unit</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>0</p></td>
<td><p>Force applied on the mass</p></td>
<td><p>-0.5</p></td>
<td><p>0.5</p></td>
<td><p>slider</p></td>
<td><p>slide</p></td>
<td><p>Force (N)</p></td>
</tr>
</tbody>
</table>

<table class="docutils align-default">
<thead>
<tr class="row-odd"><th class="head"><p>Num</p></th>
<th class="head"><p>Observation</p></th>
<th class="head"><p>Min</p></th>
<th class="head"><p>Max</p></th>
<th class="head"><p>Name (in corresponding XML file)</p></th>
<th class="head"><p>Joint</p></th>
<th class="head"><p>Unit</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>0</p></td>
<td><p>position of the mass along the z-axis</p></td>
<td><p>-2.0</p></td>
<td><p>2.0</p></td>
<td><p>slider</p></td>
<td><p>slide</p></td>
<td><p>position (m)</p></td>
</tr>
<tr class="row-even"><td><p>1</p></td>
<td><p>linear velocity of the mass</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>mass</p></td>
<td><p>mass</p></td>
<td><p>velocity (m/s)</p></td>
</tr>
</tbody>
</table>

In [None]:
env = create_mass_spring_damper_environment()
env.reset()
all_frames = []
for i in range(2):
    frames = []
    for _ in range(100):
        if i == 1:
            action = np.zeros_like(env.action_space.sample())
        else:
            action = env.action_space.sample()
        observation, _, terminated, truncated, _ = env.step(action)
        if terminated or truncated:
            all_frames.append(env.render())
            env.reset()
            break
env.close()

In [None]:
media.show_videos(all_frames, fps=1/env.dt)

## Questions

- What can we say about the system?
- How can we control the mass?

## Inverted Pendulum

<div>
<div style="float: left; width: 60%; margin-right: 5%;">
    
An inverted pendulum is a pendulum that has its center of mass above its pivot point. It is unstable and without additional help will fall over.

The inverted pendulum is a classic problem in dynamics and control theory and is used as a benchmark for testing control strategies. It is often implemented with the pivot point mounted on a cart that can move horizontally under control of an electronic servo system as shown in the photo.
</div>
<div style="float: left; width: 30%;">
    <figure>
        <img src="_static/images/20_inverted_pendulum_photo.png" width="50%"/>
        <figcaption>
            Balancing cart, a simple robotics system circa 1976. <a href="#wiki_inverted_pendulum"><b id="wiki_inverted_pendulum-back">[Wiki Inverted Pendulum]</b></a>
        </figcaption>
    </figure>
</div>
</div>


In [None]:
%%html
<iframe width="800" height="600" src="https://www.youtube-nocookie.com/embed/AuAZ5zOP0yQ?si=1Lnyg2ghX6BJEEVX&amp;start=55" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

For the simulation we will use the [Inverted Pendulum](https://gymnasium.farama.org/environments/mujoco/inverted_pendulum/) environment from [gymnasium](https://gymnasium.farama.org/).

It has the following possible action and observations:

<table class="docutils align-default">
<thead>
<tr class="row-odd"><th class="head"><p>Num</p></th>
<th class="head"><p>Action</p></th>
<th class="head"><p>Control Min</p></th>
<th class="head"><p>Control Max</p></th>
<th class="head"><p>Name (in corresponding XML file)</p></th>
<th class="head"><p>Joint</p></th>
<th class="head"><p>Unit</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>0</p></td>
<td><p>Force applied on the cart</p></td>
<td><p>-3</p></td>
<td><p>3</p></td>
<td><p>slider</p></td>
<td><p>slide</p></td>
<td><p>Force (N)</p></td>
</tr>
</tbody>
</table>

<table class="docutils align-default">
<thead>
<tr class="row-odd"><th class="head"><p>Num</p></th>
<th class="head"><p>Observation</p></th>
<th class="head"><p>Min</p></th>
<th class="head"><p>Max</p></th>
<th class="head"><p>Name (in corresponding XML file)</p></th>
<th class="head"><p>Joint</p></th>
<th class="head"><p>Unit</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>0</p></td>
<td><p>position of the cart along the linear surface</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>slider</p></td>
<td><p>slide</p></td>
<td><p>position (m)</p></td>
</tr>
<tr class="row-odd"><td><p>1</p></td>
<td><p>vertical angle of the pole on the cart</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>hinge</p></td>
<td><p>hinge</p></td>
<td><p>angle (rad)</p></td>
</tr>
<tr class="row-even"><td><p>2</p></td>
<td><p>linear velocity of the cart</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>slider</p></td>
<td><p>slide</p></td>
<td><p>velocity (m/s)</p></td>
</tr>
<tr class="row-odd"><td><p>3</p></td>
<td><p>angular velocity of the pole on the cart</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>hinge</p></td>
<td><p>hinge</p></td>
<td><p>anglular velocity (rad/s)</p></td>
</tr>
</tbody>
</table>

In [None]:
env = create_inverted_pendulum_environment()
env.reset()
all_frames = []
for i in range(2):
    frames = []
    for _ in range(100):
        if i == 1:
            action = np.zeros_like(env.action_space.sample())
        else:
            action = env.action_space.sample()
        observation, _, terminated, truncated, _ = env.step(action)
        if terminated or truncated:
            all_frames.append(env.render())
            env.reset()
            break
env.close()

In [None]:
media.show_videos(all_frames, fps=1/env.dt)

## Questions

- What can we say about the system?
- How can we balance the pendulum?

## Naive Control

If we try to balance an elongated object on our hand, we intuitively try to move our hand in the same direction in which the object is falling. We can use this intuition to make a very simple and straightforward controller.

$$
u(t) = K * \theta(t)
$$

with $K \gt 0$

## Exercise

Implement the naive controller described above and visualize the result.

What is the best value for the coefficient?

> **Hint** the second value in the `observation` vector represents the angle of the pendulum.
> You can use `theta = observation[[1]]` to get its value. 

##  Solution

In [None]:
def control_inverted_pendulum(K = widgets.FloatSlider(min=0.0, max=1000.0, step=10, value=10.0)):
    env = create_inverted_pendulum_environment()
    observation, _ = env.reset()
    for _ in range(100):
        theta = observation[[1]]
        action = K * theta
        observation, _, terminated, truncated, _ = env.step(action)
        if terminated or truncated:
            frames = env.render()
            env.reset()
            break
    env.close()
    media.show_video(frames, fps=1/env.dt)
interact(control_inverted_pendulum);

Let's try different values of $K$ and plot the different trajectories.

In [None]:
env = create_inverted_pendulum_environment()
initial_observation, _ = env.reset()
K_values = [0.1, 1.0, 2.0, 10.0]

all_observations = []

for K in K_values:
    env.set_state(initial_observation[:2], initial_observation[2:])
    observation = initial_observation.copy()
    observations = [observation]

    for _ in range(100):
        theta = observation[[1]]
        action = K * theta
        observation, _, terminated, truncated, _ = env.step(action)
        observations.append(observation)
        if terminated or truncated:
            env.reset()
            break
            
    observations = np.stack(observations)
    all_observations.append(observations)

In [None]:
for i, K in enumerate(K_values):
    plt.plot(
        np.arange(all_observations[i].shape[0]) * env.dt,
        all_observations[i][:, 1],
        label=f"{K=}"
    )
plt.legend()
plt.xlabel("Time")
plt.ylabel("Angle (rad)");

# Control Systems

There are two types of control loop:

- **Open-loop control (feedforward)**

  An open-loop control system operates without feedback, which means that the output is not measured or compared to the desired input. Thery are simple and inexpensive to implement. They are often used in systems where the output does not need to be precisely controlled. For example, a washing machine may use an open-loop control system to regulate the water level.

- **Closed-loop control (feedback)**

  A closed-loop control system, on the other hand, operates with feedback, meaning that the output is measured, and corrective action is taken to ensure it always matches the desired input. They are more complex and expensive to implement. However, they offer greater precision and accuracy in controlling the system's output. Closed-loop control systems are often used in critical applications, such as aerospace engineering or medical devices

## Types of Systems

- Time-Invariant (TI) or Time-Variant (TV).
- Linear or Non-Linear.
- Continuous or Discrete.
- Deterministic or Stochastic.

# Controller Design

1. Define a mathematical model that represents the system.
2. Determine properties of this system:

   - Identifiability.
   - Stability.
   - Observability.
   - Controllability.

3. Determine model's parameters, if they're not known already.
4. If it's a continous-time system, discretize it to obtain a discrete-time system.
5. (Optional) Linearize model around operating point.
6. Design a controller to stabilize the system.
7. Simulate the closed-loop system in order to validate the controller design.
8. Use controller with actual system.

# Modeling

To design a control system, it is first necessary to gain an understanding of how the system operates.
This understanding is typically expressed in the form of a mathematical model which describes the
steady state and dynamic behavior of the system.

This aspect of Control Engineering is closer to model-based Reinforcement Learning.

## Modeling Approaches

<div>
<div style="float: left; width: 50%; margin-right: 5%;">
    
- **White box**:
- **Black box**:
- **Grey box**: Combines both approaches to relax the need to exactly model the underlying physics, while requiring considerably less data than a pure black box approach.

</div>
<div style="float: left; width: 40%;">
    <figure>
        <img src="_static/images/20_modeling_approaches.svg" width="100%"/>
        <figcaption>
            White box models are based mainly on knowledge about the system.
            Blackbox models are built on statistical information from the data.
            Grey box modeling combines the two approaches. <a href="#duun_henriksen_2013"><b id="duun_henriksen_2013-back">[Duun-Henriksen et al., 2013]</b></a>
        </figcaption>
    </figure>
</div>
</div>

## Notation

- Time:
  - Continuous-time: $t \in \mathbb{R}$
  - Discrete-time: $t \in \mathbb{N}$
- State: $x(t) \in \mathbb{R}^n$
- Control input: $u(t) \in \mathbb{R}^m$
- Dynamics
  - Continuous-time: $\dot{x}(t) = f(x(t), u(t), t)$
  - Discrete-time: $x_{t+1} = f(x_t, u_t, t)$
- Trajectories:
  - $x: t \rightarrow x(t)$
  - $u: t \rightarrow u(t)$

## System Representation

- Transfer Function
- State-Space
- Input-Output

### Transfer Function Representation

A transfer function of a system, sub-system, or component is a mathematical function that models the system's output for each possible input.

Transfer functions are commonly used in the analysis of systems such as single-input single-output filters in the fields of signal processing, communication theory, and control theory. The term is often used exclusively to refer to linear time-invariant (LTI) systems. Most real systems have non-linear input/output characteristics, but many systems, when operated within nominal parameters (not "over-driven") have behavior close enough to linear that LTI system theory is an acceptable representation of the input/output behavior.

For continuous-time input signal $x(t)$ and output $y(t)$, dividing the Laplace transform of the output, $\mathbf{Y}(s) = \mathcal{L}\left\{y(t)\right\}$, by the Laplace transform of the input, $\mathbf{X}(s) = \mathcal{L}\left\{x(t)\right\}$, yields the system's transfer function $\mathbf{G}(s)$:

$$
\mathbf{G}(s) = \frac{\mathbf{Y}(s)}{\mathbf{X}(s)} = \frac{ \mathcal{L}\left\{y(t)\right\} }{ \mathcal{L}\left\{x(t)\right\}}
$$

which can be rearranged as:

$$
\mathbf{Y}(s) = \mathbf{G}(s)\mathbf{X}(s)
$$

Where $s = \sigma + j \cdot \omega$ is a complex variable. When we're only interested in the steady-state response of the system then it is sufficient to set $\sigma = 0$ (thus $s = j \cdot \omega$), which reduces the Laplace transforms with complex arguments to Fourier transforms with real argument $\omega$.

The transfer function was the primary tool used in classical control engineering. However, it has proven to be unwieldy for the analysis of multiple-input multiple-output (MIMO) systems, and has been largely supplanted by state space representations for such systems. In spite of this, a transfer matrix can always be obtained for any linear system, in order to analyze its dynamics and other properties: each element of a transfer matrix is a transfer function relating a particular input variable to an output variable.

### State-Space Representation

A state-space representation is a mathematical model of a physical system specified as a set of input, output and variables related by first-order (not involving second derivatives) differential equations or difference equations.

Such variables, called state variables, evolve over time in a way that depends on the values they have at any given instant and on the externally imposed values of input variables. Output variables’ values depend on the values of the state variables.

In the general case, for continuous-time systems:

$$
\dot{\mathbf{x}}(t) = f(x(t), u(t)) \\
\mathbf{y}(t) = g(x(t), u(t))
$$

for discrete-time systems:

$$
\mathbf{x}(k+1) = f(x(k), u(k)) \\
\mathbf{y}(k+1) = g(x(k), u(k))
$$

| <div style="width:290px">System type</div> | State-space model |
|:-------------|:-----------------:|
| Continuous time-invariant  | $$
\dot{\mathbf{x}}(t)=  A \mathbf{x}(t) + B \mathbf {u} (t)\\
\mathbf{y}(t) = C \mathbf{x}(t) + D \mathbf{u}(t)
$$ |
| Continuous time-variant | $$
\dot{\mathbf{x}}(t) = A(t) \mathbf{x}(t) + B(t) \mathbf{u}(t) \\
\mathbf{y}(t) = C(t) \mathbf{x}(t) + D(t) \mathbf{u}(t)
$$ |
| Discrete time-invariant | $$
\mathbf{x}(k+1) = A \mathbf{x}(k) + B \mathbf{u}(k) \\
\mathbf{y}(k+1) = C \mathbf{x}(k) + D \mathbf{u}(k)
$$ |
| Discrete time-variant | $$
\mathbf{x}(k+1) = A(k) \mathbf{x}(k) + B(k) \mathbf{u}(k) \\
\mathbf{y}(k+1) = C(k) \mathbf{x}(k) + D(k) \mathbf{u}(k)
$$ |
| Laplace domain of continuous time-invariant| $$
s\mathbf{X}(s) - \mathbf{x}(0) = A \mathbf{X}(s) + B \mathbf{U}(s) \\
\mathbf{Y}(s) = C \mathbf{X}(s) + D \mathbf{U}(s)
$$ |
| Z-domain of discrete time-invariant | $$
z\mathbf{X}(z) - z\mathbf{x}(0) = A \mathbf{X}(z) + B \mathbf{U}(z) \\
\mathbf{Y}(z) = C \mathbf{X}(z) + D \mathbf{U}(z)
$$

[<b id="wiki_state_space-back">[Wiki State-Space, 2023]</b>](#wiki_state_space)

### From State-Space to Transfer Function

$$
\dot{\mathbf{x}}(t)=  A \mathbf{x}(t) + B \mathbf{u}(t)\\
\mathbf{y}(t) = C \mathbf{x}(t) + D \mathbf{u}(t)
$$

We take the Laplace Transform (assuming X(0) = 0):

$$
s{\mathbf{X}}(s)=  A \mathbf{X}(s) + B \mathbf{U}(s)\\
\mathbf{Y}(s) = C \mathbf{X}(s) + D \mathbf{U}(s)
$$

We want to solve for $\mathbf{G}(s) = \frac{{\mathbf{Y}}(s)}{\mathbf{U}(s)}$, so we start by solving for ${\mathbf{X}}(s)$:

$$
{\mathbf{X}}(s) =  (sI - A)^{-1}B \mathbf{U}(s)
$$

Now we put this into the output equation:

$$
{\mathbf{Y}}(s) =  \left(C(sI - A)^{-1}B + D\right) \mathbf {U}(s)
$$

$$
\mathbf{G}(s) = \frac{{\mathbf{Y}}(s)}{\mathbf{U}(s)} = C(sI - A)^{-1}B + D
$$

### Input-Output Representation

$$
\mathbf{y}(k) = \mathbf{h}(\mathbf{y}(k-1), \mathbf{y}(k-2), \dots, u\mathbf{u}(k), \mathbf{u}(k-1), \dots)
$$

## Mass-Spring-Damper Model


<div>
<figure style="float: left; width: 50%;">
    <img src="_static/images/20_mass_spring_damper.svg" width="50%"/>
    <figcaption>
        Classic model used for deriving the equations of a mass spring damper model <a href="#wiki_mass_spring_damper"><b id="wiki_mass_spring_damper-back">[Wiki Mass-Spring-Damper, 2023]</b></a>
    </figcaption>
</figure>
<div style="float: left; width: 40%;">
    
- $z(t)$: Distance along the vertical axis from some reference point.
- $m$: Mass of the object.
- $\lambda$: Coefficient of elasticity.
- $l$: Length of spring.
- $k = \frac{\lambda}{l}$
- $c$: Damping coefficient.
- $f(t)$: Force applied on the object.
- $g$: Gravity.
</div>
</div>

Application of Newtonian physics to this system leads to the following model:

$$
m\ddot{z}(t) + c \dot{z}(t) + kz(t) = f(t)
\\
$$

We can convert this to state space form with input $u(t) = f(t)$ and output
$x(t)$; by introducing:

$$
X(t) = \begin{bmatrix}
x_1(t) \\ x_2(t)
\end{bmatrix}
= \begin{bmatrix}
z(t) \\ \dot{z}(t) 
\end{bmatrix}
$$

$$
\dot{X}(t) = \begin{bmatrix}
\dot{x_1}(t) \\ \dot{x_2}(t)
\end{bmatrix} =
\begin{bmatrix}
x_2(t) \\
\frac{u(t)}{m} - \frac{c}{m}x_2(t) - \frac{k}{m}x_1(t)\\
\end{bmatrix}
=
\begin{bmatrix}
0 & 1 \\
-\frac{k}{m} & -\frac{c}{m}\\
\end{bmatrix}
\begin{bmatrix}
x_1(t) \\ x_2(t)
\end{bmatrix}
+
\begin{bmatrix}
0 \\
\frac{1}{m}\\
\end{bmatrix}
\begin{bmatrix}
u(t) \\
\end{bmatrix}
$$

This is a linear state-space model with matrices:

$$
A
=
\begin{bmatrix}
0 & 1 \\
-\frac{k}{m} & -\frac{c}{m}\\
\end{bmatrix};
B
=
\begin{bmatrix}
0 \\
\frac{1}{m}\\
\end{bmatrix};
C = \begin{bmatrix}
1 & 0 \\
\end{bmatrix};
D = \begin{bmatrix}
0
\end{bmatrix}
$$

And this the corresponding transfer function:

$$
(sI - A)^{-1}
=
\begin{bmatrix}
s & - 1 \\
\frac{k}{m} & s + \frac{c}{m}\\
\end{bmatrix}^{-1}
= 
\frac{1}{m s^2 + c s + k}
\begin{bmatrix}
ms + c & m \\
-k & ms\\
\end{bmatrix}
$$

$$
\mathbf{G}(s) = C(sI - A)^{-1}B + D = \frac{1}{m s^2 + c s + k}
$$

## Inverted Pendulum Model


<div>
<figure style="float: left; width: 40%;">
    <img src="_static/images/20_inverted_pendulum.svg" width="50%"/>
    <figcaption>
        Inverted pendulum model <a href="#goodwin_control_2000"><b id="goodwin_control_2000-back">[Goodwin et al., 2000]</b></a>
    </figcaption>
</figure>
<div style="float: left; width: 50%;">
    
- $y(t)$: distance along the horizontal axis from some reference point.
- $\theta(t)$: angle of the pendulum.
- $M$: mass of the cart.
- $m$: mass of the pendulum (assumed to be concentrated at the tip).
- $l$: length of the pendulum.
- $f(t)$: force applied on the cart.
</div>
</div>

Application of Newtonian physics to this system leads to the following model:

$$
\ddot{y} = \frac{1}{\lambda_m + \sin^2\theta(t)} \left[
\frac{f(t)}{m} + \dot{\theta}(t)^2 l \sin\theta(t) - g \cos\theta(t) \sin\theta(t)
\right]
\\
\ddot{\theta} = \frac{1}{l\lambda_m + \sin^2\theta(t)} \left[
-\frac{f(t)}{m}\cos\theta(t) + \dot{\theta}(t)^2 l \sin\theta(t) \cos\theta(t) + (1 - \lambda_m) \sin\theta(t)
\right]
$$

where $\lambda_m = \frac{M}{m}$

We're only interested in controlling the pendulum's angle, so we can ignore the first equation.

We can convert this to state space form with input $u(t) = f(t)$ and output
$y(t)$; by introducing:

$$
X(t) = \begin{bmatrix}
x_1(t) \\ x_2(t)
\end{bmatrix}
= \begin{bmatrix}
\theta(t) \\ \dot{\theta}(t) 
\end{bmatrix}
$$

$$
\dot{X}(t) = \begin{bmatrix}
\dot{x_1}(t) \\ \dot{x_2}(t)
\end{bmatrix} =
\begin{bmatrix}
x_2(t) \\
\frac{1}{l\lambda_m + \sin^2 x_1(t)} \left[
-\frac{u(t)}{m}\cos x_1(t) + x_2(t)^2 l \sin x_1(t) \cos x_1(t) + (1 - \lambda_m) \sin x_1(t)
\right] \\
\end{bmatrix}
$$

These equations are nonlinear. However, for small departures of $x_1$ (i.e. $\theta$) from the
vertical position we can linearize about $x_1 = 0$, $x_2 = 0$. This is called the [small-angle approximation](https://en.wikipedia.org/wiki/Small-angle_approximation).

How good is this approximation?

In [None]:
x = np.arange(-np.pi/4, np.pi/4, 0.01)
fig, (ax1, ax2) = plt.subplots(1, 2, sharex=True)
ax1.plot(x, x, label="$x$")
ax1.plot(x, np.sin(x), color="r", label="$\sin(x)$")
ax1.legend();
ax2.hlines(1, x[0], x[-1], label="$1$")
ax2.plot(x, np.cos(x), color="r", label="$\cos(x)$")
ax2.plot(x, 1 - x**2/2, color="orange", label="$1 - \\frac{x^2}{2}$")
ax2.legend();

Applying the approximation gives us:

$$
\dot{X}(t) = \begin{bmatrix}
\dot{x_1}(t) \\ \dot{x_2}(t)
\end{bmatrix} =
\begin{bmatrix}
x_2(t) \\
\frac{1}{l\lambda_m} \left[
-\frac{u(t)}{m} + (1 - \lambda_m) x_1(t)
\right] \\
\end{bmatrix} =
\begin{bmatrix}
0 & 1\\
\frac{(M + m)g}{Ml} & 0\\
\end{bmatrix}
\begin{bmatrix}
x_1(t) \\ x_2(t)
\end{bmatrix}
+
\begin{bmatrix}
0 \\ -\frac{1}{Ml}
\end{bmatrix}
\begin{bmatrix}
u(t)\\
\end{bmatrix}
$$

This leads to a linear state-space model with matrices:

$$
A = \begin{bmatrix}
0 & 1 \\
\frac{(M + m)g}{Ml} & 0\\
\end{bmatrix};
B = \begin{bmatrix}
0 \\ -\frac{1}{Ml}
\end{bmatrix};
C = \begin{bmatrix}
1 & 0 \\
\end{bmatrix};
D = \begin{bmatrix}
0
\end{bmatrix}
$$

And this the corresponding transfer function:

$$
(sI - A)^{-1} =
\begin{bmatrix}
s & - 1 \\
- \frac{(M + m)g}{Ml} & s\\
\end{bmatrix}^{-1} = 
\frac{1}{Mls^2 - g(M+m)}
\begin{bmatrix}
Mls & Ml \\
g(M+m) & Mls\\
\end{bmatrix}
$$

$$
\mathbf{G}(s) = C(sI - A)^{-1}B + D = \frac{-1}{Mls^2 - g(M+m)}
$$



# System Identification

System Identification is the process of constructing a mathematical model of a
(dynamical) system from observations (measurements) of its inputs and outputs

## Structural Identifiability

Given a state-space model for a given system, we would like to know whether the unknown parameters are uniquely determined by the input-output behaviour of the system.

## Parameter Identification



### Second Order Resonant System

In [None]:
second_order_system_plot()

- $U_0$: Initial input level.
- $U_f$: Final input level.
- $Y_0$: Initial output level.
- $Y_f$ : Final output level.
- Peak 1: First output peak.
- Peak n: Peak n counting
from Peak 1.
- $A_1$: Amplitude from $Y_f$ to Peak 1.
- $A_n$: Amplitude from $Y_f$ to Peak n.
- $T_w$ : Time between two successive peaks.

Given those measurements we can calculate the system's parameters using:

$$
K = \frac{Y_f - Y_0}{U_f - U_0}\\
d_r =\left(\frac{A_n}{A_1}\right)^{\frac{1}{n−1}}\\
\hat{\zeta} = -\frac{\ln(d_r)}{\sqrt{4\pi^2 + \sqrt{d_r}^2}}\\
\hat{T_n} = \frac{T_w \sqrt{1 - \hat{\zeta}^2}}{2\pi}
$$

And then plug their values into the following transfer function:

$$
\hat{G}(s) = \frac{\hat{K}}{\hat{T_n}^2 s^2 + 2\hat{\zeta}\hat{T}_n s + 1}
$$

## Mass-Spring-Damper

We start by collecting data

In [None]:
env = create_mass_spring_damper_environment(max_steps=400)
initial_observation, _ = env.reset()

observation = initial_observation.copy()
observations = [observation]
actions = []

for i in range(400):
    if i < 50:
        action = np.zeros(1)
    else:
        action = np.array([10])
    actions.append(action)
    observation, _, terminated, truncated, _ = env.step(action)
    observations.append(observation)
    if terminated or truncated:
        break

observations = np.stack(observations)
actions = np.stack(actions)
env.close()

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, sharex=True)
peak_indices, _ = find_peaks(observations[:, 0])
peak_indices = peak_indices[observations[peak_indices, 0] > 1.05 * observations[-1, 0]]
T = np.arange(observations.shape[0]) * env.dt
ax1.plot(T, observations[:, 0])
ax1.plot(T[peak_indices], observations[peak_indices, 0], "x")
ax1.set_xlabel("Time")
ax1.set_ylabel("Position");
ax2.plot(T[1:], actions)
ax2.set_xlabel("Time")
ax2.set_ylabel("Force");
fig.tight_layout()

## Exercise

- Using the collected data from the environment, identify the system parameters.

  > **Hint** You can use the `find_peaks` function from scipy to identify the peaks.

## Solution

In [None]:
yf = observations[-1, 0]
ymax = np.max(observations[:, 0])
y0 = observations[0, 0]
uf = actions[-1]
u0 = actions[0]
A1 = ymax - yf
indices, _ = find_peaks(observations[:, 0])
peaks = observations[indices, 0]
peaks = peaks[peaks > 1.05*yf]
n = len(peaks)
An = peaks[-1] - yf
Tw = np.diff(indices)[-1].item() * env.dt

In [None]:
K = (yf - y0)/(uf - u0)
K = K.item()
dr = (An / A1)**(1/(n-1))
zeta = -np.log(dr) / np.sqrt(4 * np.pi**2 + np.log(dr) ** 2)
Tn = Tw * np.sqrt(1 - zeta**2) / (2 * np.pi)
print(f"{K=:.3f}, {dr=:.3f}, {zeta=:.3f}, {Tn=:.3f}")

In [None]:
tf = ct.tf(1, [Tn**2/K, 2*zeta*Tn/K, 1/K])
tf

We then match the coefficient of each one of the terms into the system's transfer function.

$$
\mathbf{G}(s) = C(sI - A)^{-1}B + D = \frac{1}{m s^2 + c s + k}
$$

In [None]:
m = Tn**2 / K
c = 2*zeta*Tn / K
k = 1 / K
print(f"{m=}, {c=}, {k=}")

In [None]:
# Dynamics matrix
A = np.array([
    [0, 1],
    [-k/m, -c/m],
])
# Input matrix
B = np.array([[
    0,
    1/m
]]).transpose()
# Output matrices
C = np.array([
    [1, 0],
])
D = np.zeros(1)
mass_spring_damper = ct.ss(A, B, C, D)

In [None]:
dt = env.dt
T = np.arange(0, len(observations)*dt, dt)[:-1]
U = actions.transpose()
X0 = np.zeros(2)
response = ct.input_output_response(mass_spring_damper, T, U, X0)

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, sharex=True)
ax1.plot(response.time, response.outputs, label="Model")
ax1.plot(response.time, observations[1:, 0], label="Environment")
ax1.set_xlabel("Time")
ax1.set_ylabel("Position")
ax1.legend()
ax2.plot(response.time, response.inputs)
ax2.set_xlabel("Time")
ax2.set_ylabel("Input")
fig.tight_layout()

We now discretize the system using the simulation environment's timestep

In [None]:
mass_spring_damper = mass_spring_damper.sample(env.dt)
mass_spring_damper.name = "mass_spring_damper"
mass_spring_damper

In [None]:
tf = ct.ss2tf(mass_spring_damper)
tf

## Inverted Pendulum

### Question

- Can we use the same approach with the inverted pendulum systems?
- If yes, go ahead and determine the system's parameters.
- If no, then explain your reasoning.

### Solution

Unfortunately, system identification using the step response cannot be used for inherently unstable systems.

Luckily for us, the system parameters are easy to extract from the simulation environment

In [None]:
env = create_inverted_pendulum_environment()
g = 9.81
l = env.model.geom_pos[2, 2]
m = env.model.body_mass[2]
M = env.model.body_mass[1]
lambda_m = M / m
print(f"{l=}, {m=}, {M=}")

In [None]:
# Dynamics matrix
A = np.array([
    [ 0, 1],
    [(1+lambda_m)*g/(lambda_m*l), 0],
])
# Input matrix
B = np.array([[
    0,
    -1/(M*l)
]]).transpose()
# Output matrices
C = np.array([
    [1, 0],
])
D = np.zeros(1)

In [None]:
inverted_pendulum = ct.ss(A, B, C, D)
inverted_pendulum

In [None]:
tf = ct.ss2tf(inverted_pendulum)
tf

We now discretize the system using the simulation environment's timestep

In [None]:
inverted_pendulum = inverted_pendulum.sample(env.dt)
inverted_pendulum.name = "inverted_pendulum"
inverted_pendulum

In [None]:
tf = ct.ss2tf(inverted_pendulum)
tf

# Stability

There are many notions of stability in Control Theory:

- **BIBO Stability**

  A system is bounded-input, bounded-output stable (**BIBO** stable) if, for every bounded input, the output is finite. Mathematically, if every input satisfying

    $$
    ||x(t)||_\infty \lt \infty
    $$

    leads to an output satisfying 

    $$
    ||y(t)||_\infty \lt \infty
    $$

- **Marginal/Lyapunov** $\forall \epsilon > 0, \exists \delta > 0 : ||x(0) - x_{\text{eq}}|| < \delta \implies ||x(t) - x_{\text{eq}}|| < \epsilon, \forall t \gt 0$
  
  Trajectories that start close to the equilibrium remain close to the equilibrium.
  
- **Asymptotic (local)** $\exists \delta > 0 : ||x(0) - x_{\text{eq}}|| < \delta \implies \lim_{\rightarrow \infty} ||x(t) - x_{\text{eq}}|| = 0$

  Trajectories that start near the equilibrium converge to it.

- **Exponential (local)** $\exists \delta, c, \alpha > 0 : ||x(0) − x_{\text{eq}}|| < \delta \implies ||x(t) - x_{\text{eq}}|| \lt c\exp^{−\alpha t} || x(0) - x_{\text{eq}}||$

  Trajectories that start near the equilibrium converge to it exponentially fast.
  
> *Note 1*: The global definitions can be obtained by taking $\delta \rightarrow \infty$.

> *Note 2* For linear time-invariant (LTI) systems, "asymptotic = exponential" and "local = global" always.

A Linearly Time-Invariant (LTI) system is stable if and only if:

- All the poles are in the left half of the complex plane.
- All the eigenvalues of $A$ have negative real parts.
- We can find two $p \times p$ matrices $M$ and $N$ such that satisfy the Lyapunov Equation:

  $$
  MA + A^TM = -N
  $$
  
  and $N$ is an arbitrary positive definite matrix, and $M$ is a unique positive definite matrix.

## Exercise

- Determine whether the mass-spring-damper system is stable.
- Determine whether the linearized inverted pendulum system is stable.

> **Hint** 
>
> - Use the `poles()` method of the `inverted_pendulum` object to determine the system's poles.
> - Use `np.linalg.eig` to determine the eigenvalues of the system matrix.
> - Use `ct.lyap` to solve the lyapunov equation.

## Solution

### Mass-Damper-Spring

In [None]:
poles = mass_spring_damper.poles()
poles

In [None]:
np.all(np.real(poles) <= 0.0)

This means that the system is stable.

For the sake of demonstration, let's also compute the system matrix' eigenvalues and compare them to the poles

In [None]:
result = np.linalg.eig(mass_spring_damper.A)
eigenvalues = result.eigenvalues
eigenvalues

In [None]:
np.all(np.real(eigenvalues) <= 0.0)

In [None]:
np.all(poles == eigenvalues)

We can visualize the system's poles and zeros

In [None]:
ct.pzmap(mass_spring_damper)
unit_circle = plt.Circle((0, 0), 1.0, color="r", alpha=0.1)
plt.gca().add_patch(unit_circle);

### Inverted Pendulum

In [None]:
poles = inverted_pendulum.poles()
poles

In [None]:
np.all(np.real(poles) <= 0.0)

This means that the system is unstable.

For the sake of demonstration, let's also compute the system matrix' eigenvalues and compare them to the poles

In [None]:
result = np.linalg.eig(inverted_pendulum.A)
eigenvalues = result.eigenvalues
eigenvalues

In [None]:
np.all(np.real(eigenvalues) <= 0.0)

In [None]:
np.all(poles == eigenvalues)

We can visualize the system's poles and zeros

In [None]:
ct.pzmap(inverted_pendulum)
unit_circle = plt.Circle((0, 0), 1.0, color="r", alpha=0.1)
plt.gca().add_patch(unit_circle);

# Observability

Observability is a measure for how well internal states of a system can be inferred by knowledge of its external outputs..

For a continuous linear time-invariant system, the $n \times nr$ observability matrix is given by:

$O=\begin{bmatrix}C \\ CA \\ CA^{2} \\ \dots \\ CA^{n-1}\end{bmatrix}$

The system is controllable if the observability matrix has full row rank (i.e. $\operatorname{rank}(O)=n$).

### Mass-Spring-Damper

In [None]:
O = ct.obsv(mass_spring_damper.A, mass_spring_damper.C)
print(O)

In [None]:
np.linalg.matrix_rank(O)

### Inverted Pendulum

In [None]:
O = ct.obsv(inverted_pendulum.A, inverted_pendulum.C)
print(O)

In [None]:
np.linalg.matrix_rank(O)

## Parameter Estimation

Consider the augmented state vector $\mathbf{z}(t)$:

$$
\mathbf{z}(t) = \begin{bmatrix}
\mathbf{x}(t) \\ \mathbf{p}
\end{bmatrix}
$$

Where $\mathbf{p}$ represents the system's parameters.

This new system has the following dynamics:

$$
\dot{\mathbf{z}}(t)
=
\begin{bmatrix}
\dot{\mathbf{x}}(t) \\ \dot{\mathbf{p}}
\end{bmatrix}
=
\begin{bmatrix}
f(\mathbf{x}(t), \mathbf{u}(t)) \\ 0
\end{bmatrix}
$$

## Observer Design

### Kalman Filter

$$
\mathbf{x}(k+1) = A(k)\mathbf{x}(k) + B(k)\mathbf{u}(k) + \mathbf{w}(k)\\
\mathbf{y}(k+1) = C(k)\mathbf{x}(k) + \mathbf{v}(k)
$$

Where:

- $w(k)$ is the process noise, which is assumed to be drawn from a zero-mean multivariate normal distribution,
  $\mathcal{N}$, with covariance, $\mathbf{w}_{k} \sim {\mathcal {N}}\left(0,\mathbf {Q} _{k}\right)$.
- $v(k)$ is the observation noise, which is assumed to be zero-mean Gaussian white noise with covariance 
  $\mathbf {v} _{k}\sim {\mathcal {N}}\left(0,\mathbf {R} _{k}\right)$.

### Mass-Spring-Damper

In [None]:
env = create_mass_spring_damper_environment(max_steps=50)
initial_observation, _ = env.reset()
observation = initial_observation.copy()
observations = [observation]
actions = []

for _ in range(50):
    action = np.array([5])
    actions.append(action)
    observation, _, terminated, truncated, _ = env.step(action)
    observations.append(observation)
    if terminated or truncated:
        env.reset()
        break

observations = np.stack(observations)
actions = np.stack(actions)

In [None]:
Q = np.diag([1])
R = np.diag([1]) * env.dt
# Initial State Covariance
P0 = np.diag([100, 1000])
estimator = ct.create_estimator_iosystem(mass_spring_damper, Q, R, P0=P0)
estimator.name = "estimator"
print(estimator)

In [None]:
dt = env.dt
T = np.arange(0, len(observations)*dt - dt, dt)
U = np.concatenate([observations[1:, [1]], actions], axis=1).transpose()
X0 = observations[1, :]
estimator_response = ct.input_output_response(estimator, T, U, X0)

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, sharex=True)
ax1.errorbar(
    estimator_response.time,
    estimator_response.outputs[0],
    estimator_response.states[estimator.find_state("P[0,0]")],
    fmt="r-",
    label="Estimated"
)
ax1.plot(estimator_response.time, observations[1:, 0], label="Ground Truth")
ax1.set_xlabel("Time")
ax1.set_ylabel("Position")
ax1.legend()
ax2.errorbar(
    estimator_response.time,
    estimator_response.outputs[1],
    estimator_response.states[estimator.find_state("P[1,1]")],
    fmt="r-",
    label="Estimated"
)
ax2.plot(estimator_response.time, observations[1:, 1], label="Ground Truth")
ax2.set_xlabel("Time")
ax2.set_ylabel("Velocity");
ax2.legend()
fig.tight_layout()

In [None]:
T_predict = np.arange(T[-1], T[-1] + 1.0 + dt, dt)
U_predict = np.outer(U[:, -1], np.ones_like(T_predict))
predicted_response = ct.input_output_response(
    estimator, T_predict, U_predict, estimator_response.states[:, -1],
    params={'correct': False}
)

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, sharex=True)
ax1.errorbar(
    estimator_response.time,
    estimator_response.outputs[0],
    estimator_response.states[estimator.find_state("P[0,0]")],
    fmt="r-",
    label="Estimated"
)
ax1.plot(estimator_response.time, observations[1:, 0], label="Ground Truth")
ax1.errorbar(
    predicted_response.time,
    predicted_response.outputs[0],
    predicted_response.states[estimator.find_state("P[0,0]")],
    fmt="o-",
    label="Predicted"
)
ax1.set_xlabel("Time")
ax1.set_ylabel("Position")
ax1.legend()
ax2.errorbar(
    estimator_response.time,
    estimator_response.outputs[1],
    estimator_response.states[estimator.find_state("P[1,1]")],
    fmt="r-",
    label="Estimated"
)
ax2.plot(estimator_response.time, observations[1:, 1], label="Ground Truth")
ax2.errorbar(
    predicted_response.time,
    predicted_response.outputs[1],
    predicted_response.states[estimator.find_state("P[1,1]")],
    fmt="o-",
    label="Predicted"
)
ax2.set_xlabel("Time")
ax2.set_ylabel("Velocity");
ax2.legend()
fig.tight_layout()

### Inverted Pendulum

We collect some data from the environment

In [None]:
env = create_inverted_pendulum_environment(max_steps=200)
initial_observation, _ = env.reset()
K = 300.0

observation = initial_observation.copy()
observations = [observation]
actions = []

for _ in range(200):
    theta = observation[[1]]
    action = K * theta
    actions.append(action)
    observation, _, terminated, truncated, _ = env.step(action)
    observations.append(observation)
    if terminated or truncated:
        env.reset()
        break

observations = np.stack(observations)
actions = np.stack(actions)

In [None]:
Q = np.diag([10])
R = np.diag([1]) * env.dt
# Initial State Covariance
P0 = np.diag([0.0, 10000])
estimator = ct.create_estimator_iosystem(inverted_pendulum, Q, R, P0=P0)
estimator.name = "estimator"
print(estimator)

And run the estimator on the trajectory

In [None]:
dt = env.dt
T = np.arange(0, len(observations)*dt - dt, dt)
U = np.concatenate([observations[1:, [1]], actions], axis=1).transpose()
X0 = np.zeros(2)
estimator_response = ct.input_output_response(estimator, T, U, X0)

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, sharex=True)
ax1.errorbar(
    estimator_response.time,
    estimator_response.outputs[0],
    estimator_response.states[estimator.find_state("P[0,0]")],
    fmt="r-",
    label="Estimated"
)
ax1.plot(estimator_response.time, observations[1:, 1], label="Ground Truth")
ax1.set_xlabel("Time")
ax1.set_ylabel("Angle (rad)")
ax1.legend()
ax2.errorbar(
    estimator_response.time,
    estimator_response.outputs[1],
    estimator_response.states[estimator.find_state("P[1,1]")],
    fmt="r-",
    label="Estimated"
)
ax2.plot(estimator_response.time, observations[1:, 3], label="Ground Truth")
ax2.set_xlabel("Time")
ax2.set_ylabel("Angular Velocity (rad/s)");
ax2.legend()
fig.tight_layout()

We also run predictions in the future to see what happens next

In [None]:
T_predict = np.arange(T[-1], T[-1] + 0.2 + dt, dt)
U_predict = np.outer(U[:, -1], np.ones_like(T_predict))
predicted_response = ct.input_output_response(
    estimator, T_predict, U_predict, estimator_response.states[:, -1],
    params={'correct': False}
)

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, sharex=True)
ax1.errorbar(
    estimator_response.time,
    estimator_response.outputs[0],
    estimator_response.states[estimator.find_state("P[0,0]")],
    fmt="r-",
    label="Estimated"
)
ax1.plot(estimator_response.time, observations[1:, 1], label="Ground Truth")
ax1.errorbar(
    predicted_response.time,
    predicted_response.outputs[0],
    predicted_response.states[estimator.find_state("P[0,0]")],
    fmt="o-",
    label="Predicted",
)
ax1.legend()
ax1.set_xlabel("Time")
ax1.set_ylabel("Angle (rad)")
ax2.errorbar(
    estimator_response.time,
    estimator_response.outputs[1],
    estimator_response.states[estimator.find_state("P[1,1]")],
    fmt="r-",
    label="Estimated"
)
ax2.plot(estimator_response.time, observations[1:, 3], label="Ground Truth")
ax2.errorbar(
    predicted_response.time,
    predicted_response.outputs[1],
    predicted_response.states[estimator.find_state("P[1,1]")],
    fmt="o-",
    label="Predicted",
)
ax2.legend()
ax2.set_xlabel("Time")
ax2.set_ylabel("Angular Velocity (rad/s)")
fig.tight_layout()

# Controllability

The state controllability condition implies that it is possible - by admissible inputs - to steer the states from any initial value to any final value within some finite time window.

For a continuous linear time-invariant system, the $n \times nr$ controllability matrix is given by:

$R=\begin{bmatrix}B & AB & A^{2}B & \dots & A^{n-1}B\end{bmatrix}$

The system is controllable if the controllability matrix has full row rank (i.e. $\operatorname{rank}(R)=n$). 

### Mass-Spring-Damper

In [None]:
R = ct.ctrb(mass_spring_damper.A, mass_spring_damper.B)
print(R)

In [None]:
np.linalg.matrix_rank(R)

### Inverted Pendulum

In [None]:
R = ct.ctrb(inverted_pendulum.A, inverted_pendulum.B)
print(R)

In [None]:
np.linalg.matrix_rank(R)

# Controller Design

## Full State Feedback

Full state feedback (FSF), or pole placement, is a method employed in feedback control system theory to place the closed-loop poles of a system in pre-determined locations in the s-plane. Placing poles is desirable because the location of the poles corresponds directly to the eigenvalues of the system, which control the characteristics of the response of the system. The system must be considered controllable in order to implement this method. 

We want to design a controller such that we can place the poles (eigenvalues) of our system at desired locations.

We choose the following control law:

$$
\mathbf{u}(t) = k_r \mathbf{r}(t) - K \mathbf{x}(t)
$$

By replacing this into the system's state-space representation we obtain the closed-loop dynamics:

$$
\dot{\mathbf{x}}(t) =  (A - BK) \mathbf{x}(t) + k_r  B \mathbf{r}(t) \\
\mathbf{y}(t) = (C - DK) \mathbf{x}(t) + k_r  D \mathbf{r}(t)
$$

The poles of this new system are the eigenvalues of $A - BK$. 

The constant $k_r$ is given by:

$$
k_r = \frac{-1}{C(A - BK)^{-1}B}
$$

The location of the eigenvalues determines the behavior of the closed loop
dynamics and hence where we place the eigenvalue is the main design decision to be made. As with all other feedback design problems, there are tradeoffs between the magnitude of the control inputs, the robustness of
the system to perturbations and the closed loop performance of the system,
including step response, disturbance attenuation and noise injection.

### Mass-Spring-Damper

In [None]:
K = ct.place(mass_spring_damper.A, mass_spring_damper.B, np.array([-0.5, -0.9]))
K

In [None]:
kr = -mass_spring_damper.C @ np.linalg.inv(mass_spring_damper.A - mass_spring_damper.B * K) @ mass_spring_damper.B
kr = kr.item()
kr

In [None]:
A_fsfbk = mass_spring_damper.A - mass_spring_damper.B * K
B_fsfbk = kr * mass_spring_damper.B
C_fsfbk = mass_spring_damper.C - mass_spring_damper.D * K
D_fsfbk = kr * mass_spring_damper.D
closed_loop = ct.ss(A_fsfbk, B_fsfbk, C_fsfbk, D_fsfbk, dt=mass_spring_damper.dt)
closed_loop

In [None]:
n_steps = 30
x0 = np.zeros(closed_loop.nstates)
x0[0] = 0.0
u0 = np.ones_like(T) * 100
T = np.arange(0, n_steps) * closed_loop.dt
response = ct.input_output_response(
    closed_loop, T, u0, x0
)

In [None]:
fig, ax = plt.subplots(1, 1, sharex=True)
ax.plot(response.time, response.outputs)
ax.set_xlabel("Time")
ax.set_ylabel("Position")
fig.tight_layout()

### Evaluation

In [None]:
env = create_mass_spring_damper_environment(max_steps=50)
initial_observation, _ = env.reset()

observation = initial_observation.copy()
observations = [observation]
actions = []

r = 500.0

for _ in range(50):
    action = kr * r - K @ observation
    actions.append(action)
    observation, _, terminated, truncated, _ = env.step(action)
    observations.append(observation)
    if terminated or truncated:
        frames = env.render()
        break

observations = np.stack(observations)
env.close()

In [None]:
media.show_video(frames, fps=1/env.dt)

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, sharex=True)
T = np.arange(observations.shape[0]) * env.dt
ax1.plot(T, observations[:, 0])
ax1.set_xlabel("Time")
ax1.set_ylabel("Position")
ax2.plot(T, observations[:, 1])
ax2.set_xlabel("Time")
ax2.set_ylabel("Velocity")
ax3.plot(T[1:], actions)
ax3.set_xlabel("Time")
ax3.set_ylabel("Force")
fig.tight_layout()

### Inverted Pendulum

In [None]:
ct.pzmap(inverted_pendulum);

In [None]:
K = ct.place(inverted_pendulum.A, inverted_pendulum.B, np.array([-0.9, -0.5]))
K

In [None]:
A_fsfbk = inverted_pendulum.A - inverted_pendulum.B * K
B_fsfbk = np.zeros(2)
C_fsfbk = inverted_pendulum.C - inverted_pendulum.D * K
D_fsfbk = np.zeros(1)
closed_loop = ct.ss(A_fsfbk, B_fsfbk, C_fsfbk, D_fsfbk, dt=inverted_pendulum.dt)
closed_loop

In [None]:
ct.pzmap(closed_loop);

In [None]:
n_steps = 1000
x0 = np.zeros(closed_loop.nstates)
x0[0] = 0.01
u0 = 0.0
T = np.arange(0, n_steps) * closed_loop.dt
response = ct.input_output_response(
    closed_loop, T, u0, x0
)

In [None]:
fig, ax = plt.subplots(1, 1, sharex=True)
ax.plot(response.time, response.outputs)
ax.set_xlabel("Time")
ax.set_ylabel("Angle (rad)")
fig.tight_layout()

### Evaluation

In [None]:
env = create_inverted_pendulum_environment(max_steps=1000)
initial_observation, _ = env.reset()

observation = initial_observation.copy()
observations = [observation]
actions = []

for _ in range(1000):
    # errors = estimator.updfcn(0.0, errors, observation[[1, 3]], params)
    # action = pid_controller.outfcn(0.0, errors, action, params)
    action = -K @ observation[[1, 3]]
    actions.append(action)
    observation, _, terminated, truncated, _ = env.step(action)
    observations.append(observation)
    if terminated or truncated:
        frames = env.render()
        break

observations = np.stack(observations)
env.close()

In [None]:
media.show_video(frames, fps=1/env.dt)

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, sharex=True)
T = np.arange(observations.shape[0]) * env.dt
ax1.plot(T, observations[:, 0])
ax1.set_xlabel("Time")
ax1.set_ylabel("Position (m)")
ax2.plot(T, observations[:, 1])
ax2.set_xlabel("Time")
ax2.set_ylabel("Angle (rad)")
ax3.plot(T[1:], actions)
ax3.set_xlabel("Time")
ax3.set_ylabel("Force")
fig.tight_layout()

## PID Controller

Proportional–integral–derivative (PID) controller is a control loop mechanism employing feedback that is widely used in industrial control systems and a variety of other applications requiring continuously modulated control.

The overall control function: 

$$
u(t) = K_{\text{p}}e(t) + K_{\text{i}}\int _{0}^{t}e(\tau )\,\mathrm {d} \tau +K_{\text{d}}{\frac {\mathrm {d} e(t)}{\mathrm {d} t}},
$$

where $e(t) = y(t) - r(t)$, $K_{\text{p}}$, $K_{\text{i}}$, and $K_{\text{d}}$, all non-negative, denote the coefficients for the proportional, integral, and derivative terms respectively.

The use of the PID algorithm does not guarantee optimal control of the system or its control stability but in practice it works really well for simple systems. It is broadly applicable since it relies only on the response of the measured process variable, not on knowledge or a model of the underlying process.

### Mass-Spring-Damper

In [None]:
dt = mass_spring_damper.dt

def pid_controler_update(t: float, x: NDArray, u: NDArray, params: dict):        
    e = u[0]
    e_i = x[1] + e * dt
    e_d = (e - x[0]) /  dt
    return np.array([e, e_i, e_d])

def pid_controler_output(t: float, x: NDArray, u: NDArray, params: dict):
    Kp = params.get('Kp', 1.0)
    Ki = params.get('Ki', 0)
    Kd = params.get('Kd', 0)
    out = np.array([Kp * x[0] + Ki * x[1] + Kd * x[2]])
    clamped_out = np.clip(out, -5, 5)
    return out, clamped_out


pid_controller = ct.NonlinearIOSystem(
    pid_controler_update,
    pid_controler_output,
    name="pid",
    inputs=('e[0]'),
    states=("x[0]", "x[1]", "x[2]"),
    outputs=('u[0]', 'u[1]'),
    dt=dt,
)
print(pid_controller)

In [None]:
error_junction = ct.summing_junction(inputs=["r[0]", "-y[0]"], output="e[0]")
closed_loop = ct.interconnect(
    [mass_spring_damper, pid_controller, error_junction],
    name="closed_loop",
    inputs=["r[0]"],
    outputs=["y[0]", "u[0]", "u[1]"],
)
print(closed_loop)

In [None]:
def tune_pid_mass_spring_damper(
    Kp = widgets.FloatSlider(min=0.0, max=1000.0, step=10, value=10.0),
    Ki = widgets.FloatSlider(min=0.0, max=100.0, step=10, value=0.0),
    Kd = widgets.FloatSlider(min=0.0, max=100.0, step=10, value=0.0),
    reference = widgets.FloatSlider(min=-0.5, max=0.5, step=.05, value=0.1),
):
    n_steps = 500
    T = np.arange(0, n_steps) * dt
    x0 = np.array([0.01, 0.0])
    u0 = reference
    params = {"Kp": Kp, "Ki": Ki, "Kd": Kd}
    response = ct.input_output_response(
        closed_loop, T, u0, x0, params=params
    )

    fig, (ax1, ax2) = plt.subplots(1, 2, sharex=True)
    ax1.plot(response.time, response.outputs[0])
    ax1.hlines(reference, response.time[0], response.time[-1], "r")
    ax1.set_xlabel("Time")
    ax1.set_ylabel("Position")
    ax2.plot(response.time, response.outputs[1], label="Desired Force")
    ax2.plot(response.time, response.outputs[2], label="Actual Force")
    ax2.set_xlabel("Time")
    ax2.set_ylabel("Input")
    ax2.legend()
    fig.tight_layout()

interact(tune_pid_mass_spring_damper);

In [None]:
n_steps = 500
T = np.arange(0, n_steps) * dt
x0 = np.array([0.01, 0.0])
u0 = 0.1
params = {"Kp": 25.0, "Ki": 50, "Kd": 1}
response = ct.input_output_response(
    closed_loop, T, u0, x0, params=params
)

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, sharex=True)
ax1.plot(response.time, response.outputs[0plt.gcf().canvas.draw()
])
ax1.set_xlabel("Time")
ax1.set_ylabel("Position")
ax2.plot(response.time, response.outputs[1])
ax2.set_xlabel("Time")
ax2.set_ylabel("Input")
fig.tight_layout()

### Evaluation

In [None]:
env = create_mass_spring_damper_environment(max_steps=200)
initial_observation, _ = env.reset()

observation = initial_observation.copy()
observations = [observation]
actions = []

errors = np.zeros(3)
action = np.zeros(1)

r = 0.1

for _ in range(200):
    error = r - observation
    errors = pid_controller.updfcn(0.0, errors, error, params)
    action = pid_controller.outfcn(0.0, errors, action, params)
    action = action[0]
    actions.append(action)
    observation, _, terminated, truncated, _ = env.step(action)
    observations.append(observation)
    if terminated or truncated:
        frames = env.render()
        env.reset()
        break

observations = np.stack(observations)
env.close()

In [None]:
media.show_video(frames, fps=1/env.dt)

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, sharex=True)
T = np.arange(observations.shape[0]) * env.dt
ax1.plot(T, observations[:, 0], label="Measured")
ax1.hlines(r, T[0], T[-1], "r", label="Reference")
ax1.legend()
ax1.set_xlabel("Time")
ax1.set_ylabel("Position")
ax2.plot(T, observations[:, 1])
ax2.set_xlabel("Time")
ax2.set_ylabel("Velocity")
ax3.plot(T[1:], actions, label="Desired")
ax3.plot(T[1:], np.clip(actions, -5, 5), label="Actual")
ax3.legend()
ax3.set_xlabel("Time")
ax3.set_ylabel("Force")
fig.tight_layout()

## Exercise

- Design a PID controller for the inverted pendulum system.

## Solution

### Inverted Pendulum

In [None]:
def pid_controler_update(t: float, x: NDArray, u: NDArray, params: dict):        
    e = u[0]
    e_i = x[1] + e * inverted_pendulum.dt
    e_d = (e - x[0]) /  inverted_pendulum.dt
    return np.array([e, e_i, e_d])

def pid_controler_output(t: float, x: NDArray, u: NDArray, params: dict):
    Kp = params.get('Kp', 1.0)
    Ki = params.get('Ki', 0)
    Kd = params.get('Kd', 0)
    return np.array([Kp * x[0] + Ki * x[1] + Kd * x[2]])

pid_controller = ct.NonlinearIOSystem(
    pid_controler_update,
    pid_controler_output,
    name="pid",
    inputs=('y[0]'),
    states=("e[0]", "e[1]", "e[2]"),
    outputs=('u[0]'),
    dt=inverted_pendulum.dt,
)
print(pid_controller)

In [None]:
closed_loop = inverted_pendulum.feedback(pid_controller, sign=1)
print(closed_loop)

In [None]:
def tune_pid_inverted_pendulum(
    Kp = widgets.FloatSlider(min=0.0, max=1000.0, step=10, value=10.0),
    Ki = widgets.FloatSlider(min=0.0, max=100.0, step=10, value=0.0),
    Kd = widgets.FloatSlider(min=0.0, max=100.0, step=10, value=0.0),
):
    n_steps = 500
    T = np.arange(0, n_steps) * dt
    x0 = np.array([0.01, 0.0])
    u0 = 0.0
    params = {"Kp": Kp, "Ki": Ki, "Kd": Kd}
    response = ct.input_output_response(
        closed_loop, T, u0, x0, params=params
    )

    fig, (ax1, ax2) = plt.subplots(1, 2, sharex=True)
    ax1.plot(response.time, response.outputs)
    ax1.set_xlabel("Time")
    ax1.set_ylabel("Angle (rad)")
    ax2.plot(response.time, response.inputs)
    ax2.set_xlabel("Time")
    ax2.set_ylabel("Input")
    fig.tight_layout()

interact(tune_pid_inverted_pendulum);

In [None]:
n_steps = 500
T = np.arange(0, n_steps) * dt
x0 = np.array([0.01, 0.0])
u0 = 0.0
params = {"Kp": 300, "Ki": 10, "Kd": 40}
response = ct.input_output_response(
    closed_loop, T, u0, x0, params=params
)

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, sharex=True)
ax1.plot(response.time, response.outputs)
ax1.set_xlabel("Time")
ax1.set_ylabel("Angle (rad)")
ax2.plot(response.time, response.inputs)
ax2.set_xlabel("Time")
ax2.set_ylabel("Input")
fig.tight_layout()

### Evaluation

In [None]:
env = create_inverted_pendulum_environment(max_steps=1000)
initial_observation, _ = env.reset()

observation = initial_observation.copy()
observations = [observation]
actions = []

errors = np.zeros(3)
action = np.zeros(1)

for _ in range(1000):
    errors = pid_controller.updfcn(0.0, errors, observation[[1, 3]], params)
    action = pid_controller.outfcn(0.0, errors, action, params)
    actions.append(action)
    observation, _, terminated, truncated, _ = env.step(action)
    observations.append(observation)
    if terminated or truncated:
        frames = env.render()
        env.reset()
        break

observations = np.stack(observations)
env.close()

In [None]:
media.show_video(frames, fps=1/env.dt)

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, sharex=True)
T = np.arange(observations.shape[0]) * env.model.opt.timestep
ax1.plot(T, observations[:, 0])
ax1.set_xlabel("Time")
ax1.set_ylabel("Position (m)")
ax2.plot(T, observations[:, 1])
ax2.set_xlabel("Time")
ax2.set_ylabel("Angle (rad)")
ax3.plot(T[1:], actions)
ax3.set_xlabel("Time")
ax3.set_ylabel("Force")
fig.tight_layout()

# Limitation of Linear Control Systems

There are inherent tradeoffs with the use of linear time-invariant control systems.

The following closed loop properties cannot be addressed independently by a (linear time invariant) controller:

- Speed of disturbance rejection
- Sensitivity to measurement noise
- Accumulated control error
- Required control amplitude
- Required control rate changes
- Overshoot, if the system is open-loop unstable
- Undershoot, if the system is non-minimum phase
- Sensitivity to parametric modeling errors
- Sensitivity to structural modeling errors

# Summary

- We have introduced many concepts in Control Theory and drew analogies with Reinforcement Learning.
- We have studied an inverted pendulum system.
- We have designed and implemented an observer for the angular velocity of the pendulum.
- We have designed and implemented 2 different controllers for the system.
- We have seen the inherent limitations of linear time-invariant controllers.

<img src="_static/images/aai-institute-cover.svg" alt="Snow" style="width:100%;">
<div class="md-slide title">Thank you for your attention!</div>

## References

- [<b id="duun_henriksen_2013">[Duun-Henriksen et al., 2013]</b>](#duun_henriksen_2013-back) [Model identification using stochastic differential equation grey-box models in diabetes](https://journals.sagepub.com/doi/abs/10.1177/193229681300700220) - Duun-Henriksen, Anne Katrine, et al. Journal of diabetes science and technology 7, no. 2 (2013): 431-440.

- [<b id="goodwin_control_2000">[Goodwin et al., 2000]</b>](#goodwin_control_2000-back) [Control System Design](https://ds.amu.edu.et/xmlui/bitstream/handle/123456789/17461/Graham%20C.%20Goodwin%2C%20Stefan%20F.%20Graebe%2C%20Mario%20E.%20Salgado-Control%20System%20Design%20-Prentice%20Hall%20%282000%29.pdf?sequence=1&isAllowed=y) - Goodwin, Graham C., Stefan F. Graebe, and Mario E. Salgado. (2000)

- [<b id="nijmeijer_nonlinear_1990">[Nijmeijer et al. 1990]</b>](#nijmeijer_nonlinear_1990-back) [Nonlinear dynamical control systems.](https://link.springer.com/book/10.1007/978-1-4757-2101-0) Nijmeijer, Henk, and Arjan Van der Schaft. Vol. 464, no. 2. New York: Springer-verlag, 1990.

- [<b id="wiki_inverted_pendulum">[Wiki Inverted Pendulum, 2023]</b>](#wiki_inverted_pendulum-back) [Inverted pendulum](https://en.wikipedia.org/w/index.php?title=Inverted_pendulum&oldid=1152479964) Wikipedia, The Free Encyclopedia (accessed September 24, 2023).

- [<b id="wiki_mass_spring_damper">[Wiki Mass-Spring-Damper, 2023]</b>](#wiki_mass_spring_damper-back) [Mass-spring-damper model](https://en.wikipedia.org/w/index.php?title=Mass-spring-damper_model&oldid=1177169218) Wikipedia, The Free Encyclopedia (accessed September 28, 2023). 

- [<b id="wiki_state_space">[Wiki State-Space, 2023]</b>](#wiki_state_space-back) [State-space representation](https://en.wikipedia.org/w/index.php?title=State-space_representation&oldid=1175410959) Wikipedia, The Free Encyclopedia (accessed September 26, 2023). 