# Differentiable Pysics for Initial Value Problems for ODEs

We are given

- $\mathbf{u} \in \mathbb{R} \rightarrow \mathbb{R}^n$ which is the state of the system depending on time $t$.
- $\theta \in \mathbb{R}^p$ which are the parameters of the system.
- $f: \mathbb{R}^n \times \mathbb{R} \times \mathbb{R}^p \rightarrow \mathbb{R}^n$ which is the right hand side of the ODE.

and the 1st order ODE is then

$$
\frac{d \mathbf{u}}{dt} = f(\mathbf{u}, t, \theta)
$$

where

$$\mathbf{u(t=0)} = \mathbf{u}_0.$$

Furthermore there is an objective formulated by a loss function

$$ J(\mathbf{u}, \theta)  = \int_0^T \mathbf{g}(\mathbf{u}, \theta) dt $$

where $\mathbf{g}$ is a scalar function of $\mathbf{u}$ and $\theta$ for example the quadratic loss function

$$ g(\mathbf{u}, \theta) = \int_0^T \frac{d}{d \theta} \mathbf(g)(\mathbf{u}, \theta) dt $$




We are interested in the total derivate 

$$ 
\frac{d J}{d \theta} = \int_0^T \frac{\partial \mathbf{g}}{\partial \theta} dt + \frac{\partial \mathbf{g}}{\partial \mathbf{u}} \frac{d \mathbf{u}}{d \theta} dt$$

where the integral stays, as the limits do not depend on the parameter vector $\theta$. We then simply put the total derivative inside the integral. We can then take the derivate of $g$ where the first term is due to the explicit dependence on $\theta$ and the second term comes from the implicit dependence

We use the forward method to derive $\frac{d \mathbf{u}}{d \theta} dt$, first lets rewrite the Jacobian as a column vector

$$
\frac{d \mathbf{u}}{d \theta} = \begin{bmatrix}
\frac{\partial \mathbf{u}}{\partial \theta_0} &&
\frac{\partial \mathbf{u}}{\partial \theta_1} &&
\frac{\partial \mathbf{u}}{\partial \theta_2} &&
\dots 
\end{bmatrix}.
$$

We now derive the initial ODE with respect to $\frac{d}{d\theta_i}$


$$
\frac{d}{d\theta_i} \frac{d \mathbf{u}}{dt} = \frac{d}{d\theta_i} f(\mathbf{u}, t, \theta)
$$

and initial conditions

$$\frac{d \mathbf{u}}{d\theta_i}(0) = \frac{d u_0}{d\theta_i}.$$



We can swap the order of the derivates 

$$
\frac{d}{d t} \frac{d \mathbf{u}}{d \theta_i} = 
\frac{\partial f}{\partial \mathbf{u}} \frac{\partial \mathbf{u}}{\partial \theta_i} + 
\frac{\partial f}{\partial \theta_i}
$$

where the initial values reamin the same. Essentially we have a system of ODE's for each of the columns in the Jacobian, which becomes clear if we replace $\frac{d \mathbf{u}}{d \theta_i}$ by $\mathbf{s}_i$

$$
\frac{d}{d t} \mathbf{s}_i = 
\frac{\partial f}{\partial \mathbf{u}} \mathbf{s}_i + 
\frac{\partial f}{\partial \theta_i}
$$

This can be solved, the Jocobian built and the sensitivy can be done. However we have to solve $p$ such systems, which becomes infeasible for a large number of parameters.

Let us frame the task as an optimization problem:

$$ \min_{\theta} J(\mathbf{u}, \theta) $$

$$ \text{s.t.} \quad \frac{d \mathbf{u}}{dt} = f(\mathbf{u}, t, \theta) $$

Which we can formulate as an equality constrained optimization problem

$$ \mathcal{L}(\mathbf{u}, \theta) = J(\mathbf{u}, \theta) + \int_0^T \lambda^T(t) \left( \frac{d \mathbf{u}}{dt} - f(\mathbf{u}, t, \theta) \right) dt $$

where $\lambda$ is the Lagrange multiplier. We can then derive the total derivative of the Lagrangian with respect to $\theta$