# Hamilton-Jacobi-Bellman Equation

## Dynamic Programming in Continuous Time

Consider a continuous optimal control problem described by the state equation:
\begin{align}
\dot{x}=f\big(x(t), u(t), t\big) \qquad (1.1)
\end{align}

To be controlled by minimizing the performance measure
\begin{align}
J &=h\big(x(t_f), t_f\big) + \int_{t_0}^{t_f}g\big(x(\tau), u(\tau), \tau\big) d\tau \qquad (1.2)\\
\textit{where:}&\\
g,h &\text{ are scalar functions}\\
t_0, t_f &\text{ are fixed time intervals}\\
\tau &\text{ is integration parameter}\\
u(t) \in U & \text{ set of constraint/utility functions}
\end{align}

The goal is to pick $u(\tau)$, $t \leq \tau \leq t_f$ to minimize (1.2).

### General Solution Steps
- Split the time interval $[t,t_f]$ into $[t, (t+\triangledown t)]$ and $[(t+\triangledown t), t_f]$ where ideally $\triangledown t \rightarrow 0$.
- Identify the optimal cost-to-go $J^*(x(t+\triangledown t),(t+\triangledown t)$.
- Determine the "stage cost" in time $[t, (t+\triangledown t)]$.
- Find the best strategy from time $t \rightarrow (t+\triangledown t)$.
- Manipulate result into HJB Equation

## Derivation of HJB Equation
The continuous optimal control problem will be included in a larger class of problems by considering the performance measure
\begin{align}
J\big(x(t), {u(\tau)}_{t\leq\tau\leq T}, t\big) = h\big(x(t_f), t_f\big) + \int_{t}^{t_f} g\big(x(\tau), u(\tau), \tau\big)d\tau \qquad (1.3)
\end{align}

where $t$ can be any value less than or equal to $t_f$ and $x(t)$ can be any admissible state value. Notice that the performance measure will depend on the numerical values for $x(t)$ and $t$ on the optimal control history in the interval $[t, t_f]$ (Kirk, 2004).

Let us now attempt to determine the control that minimize (1.3) for all admissible $x(t)$ and all $t \leq t_f$. The minimum cost function is then:

\begin{align}
J^*\big(x(t), t\big) = \min_{u(\tau)} \left\{ h\big(x(t_f), t_f\big) + \int_{t}^{t_f}g\big(x(\tau), u(\tau), \tau\big) d\tau \right\} \qquad (1.4)
\end{align}

We shall split the time interval $[t, t_f]$ into $[t, (t+\triangledown t)]$ and $[(t+\triangledown t), t_f]$ and we are specifically interested where $\triangledown t \rightarrow 0$. By subdividing the time interval, we obtain:

\begin{align}
J^*(x(t),t)=\min_{u(\tau)} \left\{ h(x(t_f), t_f) + \int_{t}^{t+\triangledown t}g(x,u,\tau)d\tau + \int_{t+\triangledown t}^{t_f} g(x,u,\tau)d\tau\right\} \qquad (1.5)
\end{align}

The principle of optimality requires that
\begin{align}
J^*(x(t),t)=\min_u \left\{ \int_{t}^{t+\triangledown t} g(x,u,\tau)d\tau + J^*(x(t+\triangledown t), t+\triangledown t) \right\} \qquad (1.6)
\end{align}

where $J^*\big(x(x+\triangledown t), x+\triangledown t\big)$ is the minimum cost of the process for the time interval $(t+\triangledown t) \leq \tau \leq t_f$ with initial state $x(t+\triangledown t)$.

Assuming $J^*$ has bounded second derivatives in both arguments, we can expand $J^*(x(t+\triangledown t), t+\triangledown t)$ in a Taylor series about the point $(x(t), t)$ to obtain:

\begin{align}
J^*\big(x(t),t\big) &= \min_u \left\{ \int_{t}^{t+\triangledown t} g(x,u,\tau)d\tau + J^*(x(t), t) + \left[\frac{\partial J^*}{\partial t}(x(t),t)\right] \triangledown t \\+ \left[\frac{\partial J^*}{\partial x}(x(t),t)\right]^T [x(t+\triangledown t) - x(t)] \right\} \qquad (1.7)
\end{align}

For small $\triangledown t$:
\begin{align}
x(t+\triangledown t) - x(t) &= x(t) \cdot \triangledown t\\
\\
\int_{t}^{t+\triangledown t} g(x, u, \tau)d\tau &= g(x,u,t)\triangledown t
\end{align}


If $J^*(x(t),t)$ is independent of $u$, we can cancel the terms from the right and left hand sice of (1.7). $\dot{x}$ will be replaced using the state equation (1.1).

\begin{align}
0=\min_{u(\tau)} \left\{ g(x,u,t)\triangledown t + \frac{\partial J^*(x(t),t)}{\partial t} \triangledown t + \left[\frac{\partial J^*(x(t),t) }{\partial x} \right]^Tf(x,u,t)\triangledown t \right\} \qquad (1.8)
\end{align}

Dividing the above equation by $\triangledown t$ yields:

\begin{align}
0=\frac{\partial J^*(x(t),t)}{\partial t} + \min_{u(\tau)} \left\{ g(x,u,t) + \left[\frac{\partial J^*(x(t),t) }{\partial x} \right]^Tf(x,u,t) \right\} \qquad (1.9)
\end{align}

Define the Hamiltonian as

\begin{align}
H=g(x,u, \tau) + \left[\frac{\partial J^*(x(t),t)}{\partial x}\right]^T f(x, u, t) \qquad (1.10)
\end{align}

Then we write the partial differential equation of (1.9) as

\begin{align}
0=\frac{\partial J^*(x(t),t)}{\partial t} + \min_u H \qquad (1.11)
\end{align}

To find the boundary value for the differential equation set $t=t_f$ from (1.4) we have:

\begin{align}
J^*(x(t_f),t_f)=h(x(t_f), t_f) \qquad (1.12)
\end{align}

We have obtained the HJB equation (1.11) subject to boundary conditions (1.12). It provides the solution to the optimal control problems for general non-linear dynamical systems. However, analytical solution to the HJB Equation is difficult to obtain in most cases. A soluition can be obtained analitically by guessing a form of the minimum cost function. In general, the HJB equation must be solved by numerical techniques. Actually, a numerical solution involves some sort of a discrete approximation to the exact optimization relationship (1.11)(Kirk, 2004).

**Example 1**. A first order system is described by the differential equation

\begin{align}
\dot{x}(t) = x(t) + u(t)
\end{align}

It is desired to find the control law that minimizes the cost function

\begin{align}
J=\frac{1}{4}x^2(T) + \int_{0}^{T}\frac{1}{4}u^2(t)dt
\end{align}

The final time $T$ is specified and the admissible state and control values are constrained by any boundaries

\begin{align}
g\big(x(t), u(t), t\big) = \frac{1}{4}u^2(t) \\
f\big(x(t), u(t), t\big) = x(t) + u(t)
\end{align}

The Hamiltonian

\begin{align}
H\left(x(t), u(t), \frac{\partial J^*}{\partial x}\right)=\frac{1}{4}u^2t+\frac{\partial J^*(x(t), t)}{\partial x} (x(t) + u(t))
\end{align}

Since the control is unconstrained, the necessary condition that the optimal control must satisfy is

\begin{align}
\frac{\partial H}{\partial u} = \frac{1}{2}u(t) + \frac{\partial J^*}{\partial x} = 0
\end{align}

The control indeed minimizes the Hamiltonian function because

\begin{align}
\frac{\partial^2 H}{\partial u^2} =  \frac{1}{2} \gt 0
\end{align}

The optimal control
\begin{align}
u^*(t) = -2\frac{\partial J^*}{\partial x}
\end{align}

When substituted into the HJB equation

\begin{align}
0=\frac{\partial J^*}{\partial t} + \min_u H
\end{align}

Gives

\begin{align}
0&=\frac{\partial J^*}{\partial t} + \frac{1}{4}\left(-2\frac{\partial J^*}{\partial x}\right)^2 + \frac{\partial J^*}{\partial x}\left(x(t) - 2\frac{\partial J^*}{\partial x}\right)\\
\\
0&=\frac{\partial J^*}{\partial t} - \frac{\partial J^*}{\partial x} + \frac{\partial J^*}{\partial x}x(t) \qquad (1.13)
\end{align}

The boundary value of $J^*$ is

\begin{align}
J^*(x(T), T) = \frac{1}{4}x^2(T) \qquad (1.14)
\end{align}

One way to solve the HJB equation is to guess a form of the solution and see if it can be made to satisfy the differential equation  and the boundary conditions. Since $J^*(x(T), T)$ is quadratic in $x(T)$ guess:
\begin{align}
J^*(x(t), t) = \frac{1}{2}p(t)x^2(t) \qquad (1.15)
\end{align}

where $p(t)$ represents the unknown scalar function of $t$ that is to be determined, notice that
\begin{align}
\frac{J^*}{\partial x}=p(t)x(t)
\end{align}

which together witht the expression determined for $u^*(t)$ implies that
\begin{align}
u^*(t) = -2p(t)x(t)
\end{align}

Thus if $p(t)$ can be found such that (1.13) and (1.14) are satisfied, the optimal control is a linear feedback of the state-indeed this was the motivation for selection the form (1.15)

Substituting (1.15) and 
\begin{align}
\frac{\partial J^*}{\partial t} = \frac{1}{2}p(t)x^2(t)
\end{align}

into (1.13) gives
\begin{align}
0=\frac{1}{2}p(t)x^2t - p^2(t)x^2(t) + p(t)x^2(t)
\end{align}

Since this equation must be satisfied for all $x(t)$, $p(t)$ must be calculated from the differential equation

\begin{align}
\frac{1}{2}p(t)-p^2(t) + p(t) = 0 \qquad (1.18)
\end{align}

With the final condition $p(T)=\frac{1}{2}$. $p(t)$ is a scalar function of $t$, therefore the solution can be obtained using the transformation $z(t)=1/p(t)$ with the result:

\begin{align}
p(t)=\frac{1}{1+e^{2(t-T)}}
\end{align}

The solution of 1.18 is obtained as follows

\begin{align}
z(t)=\frac{1}{p(t)}; z(T)=\frac{1}{p(T)}=2\\
(1.18) \Rightarrow -\frac{\dot{z}}{z^2}-\frac{2}{z^2} + \frac{2}{z}=0\\
\dot{z} - 2z + 2 = 0
\end{align}

The solution of the homogenous part of the above equation is:

\begin{align}
\dot{z} -2z = 0, is z(t)=C_1e^{2t}
\end{align}

and the general solution

\begin{align}
z(t) = C_1e^{2t} + C_2
\end{align}

The constants are calculated by replacing the general solution into the equation and using the final condition

\begin{align}
2C_1e^{2t} - 2(C_1e^{2t} + C_2) + 2 &= 0 \\
-2C_2+2 &= 0 \\
C_2 &= 1\\
\\
z(t) &= C_1e^{2t} + 1\\
z(T) &= C_1e^{2T}+1 = 2\\
C_1 &= e^{-2t}
\end{align}

Then the general solution yields

\begin{align}
z(t)=e^{2(t-T)}+1 \Rightarrow p(t)=\frac{1}{e^{2(t-T)}+1}
\end{align}

The optimal control law is then

\begin{align}
u^*(t)=-2\frac{\partial J^*(x,t)}{\partial x}=-2p(t)x(t)\\
u^*(t)=- \frac{2}{1+e^{2(t-T)}}x(t)
\end{align}

![Optimal Control Diagram](images/optimal_control_sample_diagram.jpg)

Notice that as $T\rightarrow \infty$, the linear time varying feedback approaches constant feedback $p(t)$ and controlled system

\begin{align}
\dot{x}(t) &= x(t)-2x(t)\\
&= -x(t)
\end{align}
is stable.















### References
[1] [Hamilton-Jacobi-Bellman Equation, wiki](https://en.wikipedia.org/wiki/Hamilton%E2%80%93Jacobi%E2%80%93Bellman_equation)

[2] [Optimal Control Theory. D. Kirk](https://www.amazon.com/Optimal-Control-Theory-Introduction-Engineering/dp/0486434842)

[3] [Dynamic Programming, Richard Bellman, Dover](https://books.google.com.ph/books/about/Dynamic_Programming.html?id=CG7CAgAAQBAJ&redir_esc=y)

[4] [Dynamic Programming and Optimal Control, Vol 1 & 2, D.P. Bertsekas](https://www.mit.edu/~dimitrib/dpbook.html)

[5] [Applied Optimal Control. Bryson & Ho](https://books.google.com.ph/books/about/Applied_Optimal_Control.html?id=P4TKxn7qW5kC&redir_esc=y)