<a href="https://colab.research.google.com/github/RCortez25/Scientific-Machine-Learning/blob/main/Differential_equations/1DWaveEquation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

# Code walkthrough

In [None]:
# A
using NeuralPDE, Lux, Optimization, OptimizationOptimJL, Plots

# B
import ModelingToolkit: Interval



*   **A**: list of the packages to use in this problem
    * `NeuralPDE` for defining PDEs symbolically and train PINNs
    * `Lux` lightweight NN library to define PINNs architectures
    * `Optimization` interface for setting up loss function from `NeuralPDE` and `Lux`
    * `OptimizationOptimJL` for plugging `Optim.jl`'s algorithms into the `Optimization` interface.
*   **B**: `import A: B` brings only `B` into the scope, in this case, only `Interval` is imported, which is for specifying the domains of the independent variables



In [None]:
@parameters x t
@variables u(..)
@derivatives Dt' ~ t
@derivatives Dtt'' ~ t
@derivatives Dxx'' ~ x

We start setting up the problem.


*   `@parameters` for specifying the independent variables, in this case, $x$ and $t$
*   `@variables` for specifying the state variable `u(..)`, where `(..)` means that the dependency will be given later
*   `@derivatives` defines the differential operators to be used in the problem. Note that it is the same as

`Dt = Differential(t)`
`Dtt = Differential(t)^2`
`Dxx = Differential(x)^2`

but this macro is commonly used in NeuralPDE problems.

In [None]:
# A
boundary_conditions = [
    u(0, t) ~ 0,
    u(1, t) ~ 0,
    u(x, 0) ~ x * (1 - x),
    Dt(u(x, 0)) ~ 0
]

# B
domains = [
    x ∈ Interval(0.0, 1.0),
    t ∈ Interval(0.0, 1.0)
]



**A**: Definition of the initial and boundary conditions. We have
* $u(0,t)=0$, at the left boundary $x=0$. This is a Dirichlet BC and together with the next one at $x=1$ indicates that the wave will be fixed at both ends.
* $u(1,t)=0$, at the right boundary $x=1$, Dirchlet BC as well.
* $u(x,0)=x(1-x)$, the initial displacement of the wave at $t=0$. This is the initial shape of the wave and it's a parabola.
* $∂_tu(x,0)=0$, initial velocity at $t=0$, released from rest.


**B**: Definition of the rectangular domain $(x,t)\in[0,1]\times[0,1]$.

In [None]:
# A
const c = 1.0

# B
equation = [
    Dtt(u(x, t)) ~ (c^2) * Dxx(u(x, t))
]

**A**: Wave speed, in this case $1$ m/s, defined at the global scope with `const` for keeping things stable and fast.

**B**: Definition of the wave equation in symbolic form

$$
∂_{tt}u=c^2\partial_{xx}u
$$

In [None]:
# A
input_dimension = 2 # x and t
output_dimension = 1

# B
layer_0 = Lux.Dense(input_dimension, 16, Lux.tanh)
layer_1 = Lux.Dense(16, 16, Lux.tanh)
layer_2 = Lux.Dense(16, output_dimension)

# C
NN = Lux.Chain(
    layer_0,
    layer_1,
    layer_2
)

**A** - The input dimension for the neural network will be 2, which is the number of independent variables $x$ and $t$. The output will be the value of the field $u(x,t)$, that is, the network maps

$$\mathbb{R}^2⟶\mathbb{R}:[x,t]→u(x,t)$$

because $u$ is a scalar field.

**B** - Creating of the fully connected layers of the NN, in the form `Lux.Dense(in, out, activation)`.
*   `layer_0` accepts 2 inputs $(x,y)$, has 16 outputs (this is arbitrary and can be changed), and uses `Lux.tanh` which is a tanh activation function.
*   `layer_1` accepts 16 inputs from the previous layer, has 16 outputs (this is arbitrary and can be changed), and uses `Lux.tanh` which is a tanh activation function.
*   `layer_2` accepts 16 inputs from the previous layer and has 1 output, the value of the field $u(x,t)$. In this case the layer is linear (no activation function) because one is predicting any real number (no need for tanh to squeeze the numbers, or ReLU, etc, that constraint the ourput number to a certain set).

**C** - Assembling the NN with the defined layers. The weights and biases are not created yet, this is just the architecture.

In [None]:
#A
dx = 0.1

# B
discretization = PhysicsInformedNN(NN, GridTraining(dx))

**A** - Choosing the grid spacing for the domains, in this case the step will then be $0.0, 0.1, 0.2, ..., 1.0$ along each axis $x$ and $t$. Since these correspond to 11 values for each axis, one will have $11\times11=121$ interior + boundary points. Recall that smaller `dx` implies better resolution (more grid points) but more compute power.
**B** - The `PhysicsInformedNN` takes the architecrure and a training strategy (here `GridTraining()`, there are other options) to create a discretization object that will be used for building the loss (PDE residuals + BC/IC residuals).
*   `GridTraining(dx)` samples collocation points on the Cartesian grid with spacing `dx` using the BCs and ICs. The interior points enforce PDE residuals while the boundary and initial points enforce the BCs and ICs defined above. Other training strategies like `QuasiRandomTraining()` make use of space-filling random sampling instead of a grid.

In [None]:
# A
@named pde_system = PDESystem(
    equation,
    boundary_conditions,
    domains,
    [x, t],
    [u(x, t)]
)

# B
problem = discretize(pde_system, discretization)

**A** - Assembling the PDE system withe the `PDESystem` constructor passing it the equation or systems of equations to be solved, the BC/IC, the domains of the independend variables, the independet variables and the dependent variables. `@named` is used to give the system an internal name for bookeeping.

**B** - Producing the problem to be solved by appliyng the "instructions" of the `discretization` object to the PDE system just created. The object returned is compatible with `Optimization.solve`, and it contains:
*   The NN forward pass
*   The residuals at all sampled points
*   The summed loss ready for an optimizer

In [None]:
# A
optimizer = OptimizationOptimJL.BFGS()

# B
callback = function (p, loss)
    println("Current loss: $loss")
    return false
end

# C
solution = Optimization.solve(problem, optimizer, callback = callback, maxiters = 1000)

# D
phi = discretization.phi

**A** - Choosing the quasi-Newton BFGS optimizer. Can also be ADAM or LBFGS. This particular optimizer uses gradient + an internal Hessian approximation.

**B** - The *callback* is a function that runs after each optimizer step. The arguments are:
*   `p` the current parameter vector, all the weights and biases flattened.
*   `loss` current loss value (PDE + IC/BC residuals)

In this case one just prints the loss value for tracking. `return false` ensures it keeps running, if set to `true` the optimizer stops early.

**C** - This kicks-off training and gives the solution of the training, containing
*   `solution.minimizer`, which stores the trained parameter vector, that is, the trained weights.
*   `solution.minimum`, the final loss value
*   `solution.retcode` termination reason, e.g., `Success`

**D** - This is the model evaluator of the discretization, this is uded for evaluating the field on points on the grid or a other points for plotting and validationg, that is, `phi` is $\hat{u}(x,t)$ for any given point $(x,t)$. In the case of PINNs, this needs the input point and the trainable parameters, that is, it is more like $\hat{u}(x,t,\theta)$ for the vector of parameters $\theta$.

In [None]:
u_exact(x, t; K=100) = begin
    s = 0.0
    for k in 0:K-1
        n = 2k + 1
        coefficient = 8.0 / ((pi*n)^3)
        term = coefficient * cos(n*pi*c*t) * sin(n*pi*x)
        s += term
    end
    s
end

This generates points for the analytic solution. In this case, it is given by

$$
u(x,t)=\sum_{n=1,odd}^{\infty}{\frac{8}{(n\pi)^3}\cos(n\pi ct)\sin(n\pi x)}
$$

So this loop takes in $x$ and $t$ values and starts iterating for 100 steps to approximate the value of the field $u$ at that particular point. This is necessary for plotting againts the predicted values.

In [None]:
#A
θ = solution.minimizer

# B
x_dom, t_dom = domains
x_points = infimum(x_dom.domain):(dx/10):supremum(x_dom.domain)
t_points = infimum(t_dom.domain):(dx/10):supremum(t_dom.domain)

# C
u_predicted = [first(phi([x, t], θ)) for x in x_points, t in t_points]

# D
u_real = [u_exact(x, t; K=200) for x in x_points, t in t_points]

# E
difference_of_u = @.abs(u_predicted - u_real)

**A** - Retrieving the learned weights to use with the evaluator `phi`

**B** - Generate the range of values for the two domians, for plotting.
*   `x_dom, t_dom = domains` unpacks the domains as define above
*   `infimum(x_dom.domain)` retrieves the lowest value of the $x$ domain, using the `domain` method
*   `dx/10` defines the step-size, in this case, $0.1/10=0.01$
*   `supremum(x_dom.domain`) retrieves the max value of the domain

The syntax is then `initial_point:step_size:final_point`. It is worth noting that in this case we already know the min and max values, so we could also write

```
x_points = 0.0:(dx/10):1.0
t_points = 0.0:(dx/10):1.0
```

but the form used in the code helps when one doesn't know those values.

**C** - Loop over every point $(x,t)$ of the evaluation grid `x_points, t_points`, creating a matrix with `x=rows, t=columns`. At every iteration, that is, at every point on the grid, the evaluator is called with the point in question and the parameters `phi([point], parameters)=phi([x,t], theta)`. Then `first(...)` grabs the first element of whatever `phi` returns, for example, if it returns `SVector(0.123)` we want the value $0.123$ and not the whole container `SVector(...)`, the same with `[0.123]`, we want the value $0.123$, not the container itself `[...]`.

**D** - Evaluation of the analytical solution on the same grid as the predicted field. Increasing `K` until the series visually converges.

**E** - Elementwise operation for taking the absolute value of the difference between real and predicted values. This will also help for visualizing the differences between the two. The macro `@.` makes the operation elementwise.

## Plots

In [None]:
# A
cl = (minimum(u_real), maximum(u_real))

# B
p1 = plot(x_points, t_points, u_real;
          linetype=:contourf, title="Analytic",
          xlabel="x", ylabel="t", colorbar=true, clims=cl)

# C
p2 = plot(x_points, t_points, u_predicted;
          linetype=:contourf, title="Predicted",
          xlabel="x", ylabel="t", colorbar=true, clims=cl)

# D
p3 = plot(x_points, t_points, difference_of_u;
          linetype=:contourf, title="Error",
          xlabel="x", ylabel="t", colorbar=true,
          clims=(0, maximum(difference_of_u)))

# E
plot(p1, p2, p3; layout=(1,3), link=:both, size=(1200, 380))

**A** - Retrieving the min and max values of the analytic solution fo force plots to be on that scale, so that we can compare them on the same scale.

**B** - Plotting the analytic solution. `clims` stands for _color limits_, and in this case we use the scale obtained with the min and max values of the analytic solution.

**C** - Same for the PINN predicted solution. Here we use the same solor limits as the analytic solution for comparing the both.

**D** - Plot of the difference between analytic and the predicted solution, that is, the error. In this case, we force the scale to start at 0 up to the max value of the error (there are no values $<0$ because those were calculated using the absolute value).

**E** - Plotting the 3 plots in 1 row and 3 columns. `link=:both` keeps $x$ and $t$ axes synced for the 3 plots when zooming.

![Model Diagram](Images/1DWave.png)