<a href="https://colab.research.google.com/github/aidancrilly/AIMSLecture/blob/main/03_Assessment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assessment

## Quiz questions

1) Consider the following system of equations:

$$
  \frac{d^2}{dt^2}\begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} -\lambda & \omega \\ \omega & -λ \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} \ , \ x(0) = x_0 \ , \ y(0) = y_0
$$

(a) How many parameters would be solved for in the forward pass?

(b) How many parameters would be solved for in the reverse/adjoint pass?

***

Answer:

***

2) For the transport equation in exercise 2, consider the situation where the dimensionality is reduced further and one is only interested in the differential equation for $N$ as defined as:

$$
 N(t) = \int_{-\infty}^{+\infty} \phi(x,t) dx
$$

Does this reduced model contain any unknowns which would need to be learnt? If so, what are they?

***



Answer:

***

3) (Multiple Choice, choose all that apply) Which of the following are valid finite differences of a first order derivative, $df/dx$:

(a) $\frac{f_{i+1}-f_{i-1}}{\Delta x}$

(b) $\frac{f_{i+1}-2f_{i}+f_{i-1}}{\Delta x^2}$

(c) $\frac{f_{i+1}-f_{i}}{\Delta x}$

(d) $\frac{f_{i}-f_{i-1}}{\Delta x}$

***



Answer:

***

4) From the first exercise, we solved a simple decay problem using the forward Euler method:

$$ y^{i+1} = \left( 1 - \frac{dt}{\tau} \right) y^{i} $$

This is known as a 'time-explicit' method.

We could also take the backwards finite difference of the time derivative, arriving at backwards Euler (a 'time-implicit' method):

$$ y^{i+1} = \frac{y^{i}}{1 + \frac{dt}{\tau}} $$

(a) Which method gives a better answer for a single long time step ($dt \gg \tau$)?

(b) As in the lecture, we can compute the gradient w.r.t. $\tau$ across a single time step. For the forward Euler step, this was:

$$ \frac{d y^{i+1}}{d \tau} = \frac{dt}{\tau^2}y^{i} + \left( 1 - \frac{dt}{\tau} \right) \frac{d y^{i}}{d \tau} $$

Perform the same for the backwards Euler.

***



Answer:

***

5) Displayed are two computational graphs, one of which is the AD of the other:

Graph 1:

![](https://drive.google.com/uc?export=view&id=1nA-WEbd08HmctWCY0I6Buv4mOhvSo80b)

Graph 2:
![](https://drive.google.com/uc?export=view&id=1WQQ4SpPpMXC3mUTnuaRtivV54Q-exqwH)


a) What are the shapes of the inputs and outputs of these graphs?

b) (Multiple Choice, choose one answer only) Which of the following is the base function:

(i) $f(\underline{x},\underline{y}) = \underline{y}^{T} \cdot \underline{x}$

(ii) $f(\underline{x},\underline{\underline{A}}) = \underline{\underline{A}}\cdot \underline{x}$

(iii) $f(\underline{x},\underline{\underline{A}}) = \underline{x}^{T} \cdot \underline{\underline{A}}\cdot \underline{x}$

c) Which graph displays the AD of this function?

***

Answer:

***

6) Gradients are also useful for solving other problems in scientific computing beyond optimisation. One equation is root-finding.

In root-finding, we want to find a zero of a function, for this question lets consider just a simple 1D function f(x):

$$ f(x) = 0 $$

We note that the optimisation problem can be recast as a root-finding problem as the optimum of $g(x)$ is defined as:

$$ g'(x) = 0 $$

Therefore, if we take $f(x) \equiv g'(x)$ and root-find $f(x)$, this is equivalent to optimising $g(x)$.

The Newton-Raphson method is an iterative method that uses the local tangent to take a step towards the root:

$$ x_{n+1} = x_{n} - \frac{f(x_n)}{f'(x_{n})}$$

(a) (Multiple Choice, choose one answer only) This differs from a gradient descent method, which of the following denotes the gradient descent method for the analagous optimisation problem of $g(x)$:

(i) $ x_{n+1} = x_{n} + \alpha f(x_{n+1})$

(ii) $ x_{n+1} = x_{n} - \alpha f(x_{n})$

(iii) $ x_{n+1} = x_{n} - \alpha f'(x_n)$

(b) Consider the following function (known as the Forrester function), where $x \in [0,1]$:

$$ g(x) = (6x - 2)^2 \sin(12x-4) $$

Below is some code written to optimise it via gradient descent and Newton-Raphson.

```python3
import jax
import jax.numpy as jnp

def g(x):
  # Forrester function
  return (6*x-2)**2*jnp.sin(12*x-4)

# Starting point
x = 0.0

def f(x):
  return ???

def fprime(x):
  return ???

# Gradient Descent (GD) step
x_GD = ???

# Newton-Raphson step
x_NR = x - f(x)/fprime(x)
```

(Multiple Choice, choose one answer only) What are the missing lines:

(i)
```python3
def f(x):
  return jax.grad(g)

def fprime(x):
  return jax.grad(f)

# Gradient Descent (GD) step
x_GD = x + 0.001*f(x)

# Newton-Raphson step
x_NR = x - f(x)/fprime(x)
```

(ii)
```python3
def f(x):
  return jax.grad(g)

def fprime(x):
  return jax.grad(f)

# Gradient Descent (GD) step
x_GD = x - 0.001*f(x)

# Newton-Raphson step
x_NR = x - f(x)/fprime(x)
```

(iii)
```python3
def f(x):
  return jax.grad(g)(x)

def fprime(x):
  return jax.grad(f)(x)

# Gradient Descent (GD) step
x_GD = x - 0.001*f(x)

# Newton-Raphson step
x_NR = x - f(x)/fprime(x)
```

(c) Can you identify a starting point, $x^*$, which will cause issue for both gradient descent and Newton-Raphson for the Forrester function? What is the origin of the issue?

***


Answer:

***

7) (Advanced) Consider the following differential equation (known as a neural differential equation):

$$
\frac{d y}{d t} = \mathcal{N}_\theta(y)
$$

where $\mathcal{N}_\theta(y)$ is a neural network and $y$ is a 1D vector.

(a) (Multiple Choice, choose one answer only) If we use constant Euler time stepping ($\Delta t$ = const.) to solve this equation, what class of neural network does this resemble:

$$
  y_{t+1} = y_{t} + \Delta t \mathcal{N}_\theta(y_{t})
$$

(i) Convolutional Neural Network

(ii) Recurrent Neural Network

(iii) Feedforward Neural Network

(b) (True/False) If the neural network $\mathcal{N}_\theta(y)$ contains a single convolutional layer, kernel size = 3 with 1 input channel ($y$) and 1 output channel ($dy/dt$), one can learn a heat equation for y:

$$
\frac{d y}{d t} = D \frac{d^2 y}{d x^2}
$$

***

Answer:

## Coding exercise
Scenario: we are presented with data that gives the abundance of 30 different materials as a function of time. These materials are part of a reaction network which involves only simple decays. There are 30 datasets each having at time = 0 only one of the materials present. A mathematical model of this system is therefore:

$$
\frac{d \underline{f}}{dt} = \underline{\underline{D}} \cdot \underline{f} \ , \\\underline{f}_0(t=0) = [1,0,...,0]\ , \ \underline{f}_1(t=0) = [0,1,...,0] \ \mathrm{etc.}
$$

Where the matrix $\underline{\underline{D}}$ give the various rates of decay between the materials. To properly represent the physics, $\underline{\underline{D}}$ must have the following properties:

1. $\underline{\underline{D}}$ has semi-positive ($\geq$ 0) off-diagonal elements.
2. The diagnonal elements obey the following relation $D_{ii} = -\sum_{j \ne i} D_{j,i}$


Unfortunately we do not know any of the rate coefficents a-priori. Your task is the complete the following code below in order to learn from the data the elements of $\underline{\underline{D}}$.

Hints:

1. The off-diagonal elements in $\underline{\underline{D}}$ are ≲ 1
2. The fraction of different materials can vary by orders of magnitude, however we aim to minimise relative error. Choose a loss function which can accomplish this - typical MSE might not be the best choice.

In [None]:
# Import necessary libraries
import ***

def reaction_equation_dydt(t,y,args):
  """
  To be completed, N.B. reaction matrix D is in args
  """
  return res

def construct_D_matrix(D_coefs,N):
  """
  Don't alter this code!

  D_coefs : 1D JAX array of length N**2-N
  N : number of materials

  This code creates the D matrix based on the coefficients

  Semi-positivity is enforced by squaring the coefficients
  The diagonal is constructed from the off-diagonal elements
  """
  # Make semi-positive using squaring operation
  D_semipos = D_coefs**2

  # Create indices for the upper triangular part (excluding diagonal)
  rows, cols = jnp.triu_indices(N, k=1)

  matrix = jnp.zeros((N,N))

  # Fill the upper triangle
  matrix = matrix.at[rows, cols].set(D_semipos[:len(rows)])

  # Fill the lower triangle
  matrix = matrix.at[cols, rows].set(D_semipos[len(rows):])

  diags = jnp.arange(N)
  matrix = matrix.at[diags,diags].set(-jnp.sum(matrix,axis=0))

  return matrix

In [None]:
# Load training data
training_data = jnp.load('fs_truth.npy')
_,Nt,Nreac = training_data.shape

t0 = 0.0
t1 = 5.0
dt0 = 0.01

ireacs = jnp.arange(Nreac)

# At what time points you want to save the solution
saveat = diffrax.SaveAt(ts=jnp.linspace(t0,t1,Nt))

def solve_reaction_equation(D_coeffs,Nreac,ireac0):
  """
  To be completed:

   - Set the correct initial conditions
   - Complete the diffrax diffeqsolve

  D_coeffs : 1D JAX array of length N**2-N, D matrix coefficients
  Nreac : number of materials
  ireac0 : index of reaction which has a non-zero initial condition

  """
  y0 =
  args = {'D' : construct_D_matrix(D_coeffs,Nreac)}
  sol = diffrax.diffeqsolve(...)

  return sol.ys

# Maps solve_reaction_equation over the different initial conditions
# Note that D_coeffs,Nreac are not mapped over as these are the same for each set of initial conditions
# The output of this vmapped_solve will be shape (Nreac,Nt,Nreac), the same as the training data
vmapped_solve = lambda D,N : jax.vmap(solve_reaction_equation,
                                      in_axes=(None,None,0))(D,N,ireacs)

In [None]:
def reaction_loss(D_coeffs,Nreac,data):
  fs = vmapped_solve(D_coeffs,Nreac)
  loss = # Error between fs and data
  return loss
