<h1 style="text-align: center; font-size: 2.5em;"> Operator Learning - DeepONet </h1>
<h1 style="text-align: center; font-size: 2em;"> Problem 1.A (the antiderivative operator) </h1>

### 1. Operator Learning Formation

- **differential equation**:

    $$
    \frac{ds(x)}{dx} = g(s(x), u(x), x), \quad x \in (0, 1]
    $$

- **initial condition**:

    $$
    s(0) = 0
    $$

- **target mapping**:

    $$
    u(x) \mapsto s(x), \quad \text{for all } x \in [0, 1]
    $$

- **simplification**:
    1. choosing:
        $$
        g(s(x), u(x), x) = u(x)
        $$

    2. the equation became:
        $$
        \frac{ds(x)}{dx} = u(x), \quad s(0) = 0
        $$

        which is the definition of the antiderivative:  

        $$
        s(x) = \int_0^x u(\tau)\, d\tau
        $$

    3. the **operator** $G$ to learn was defined as:

        $$
        G : u(x) \mapsto s(x) = \int_0^x u(\tau)\, d\tau
        $$

- **it's simple and pedagogical**:
    1. **Explicit solution**:

        this ODE:$\frac{ds(x)}{dx} = u(x), \quad s(0) = 0$ has a closed-form solution as: $s(x) = \int_0^x u(\tau)\, d\tau$.

    2. **Linear operator**:  

        the operator $G : u \mapsto s$ here is linear (they had further nonlinear and stochastic operators).

    3. **No coupling between $s$ and $u$**:  

        In more complex examples (e.g., Problem 1.B), $g$ depends on both $s(x)$ and $u(x)$, introducing feedback and nonlinearity.

    4. **One-dimensional domain**:  

        $x \in [0, 1]$ is just a scalar input-output mapping over a 1D domain — far simpler than PDEs over 2D or 3D spatial domains.

    5. **pedagogical**:
        
        DeepONet learns operator mapping functions to functions, branch net and trunk net separation works, off-line training and on-line inference stages applies...

### 2. DeepONet Architecture

DeepONet is to be trained to approximate the target ground truth:

$$
s(x) = \int_0^x u(\tau)\, d\tau
$$

- **Branch Net**: takes values of $u(x)$ at sensor points $x_1, \ldots, x_m$
- **Trunk Net**: takes an evaluation point $x$
- **Dot Product**: outputs the value $s(x)$


### 3. Implementation
- their implementation: with library DeepXDE
- implementation here: with pytorch
    1. data generator for function pairs
    2. DeepONet model
    3. loss function (MSE)
    4. training
    5. testing

In [1]:
import torch
import torch.nn as nn

# BranchNet: encodes the input function u sampled at m sensor points
class BranchNet(nn.Module):
    def __init__(self, input_dim, latent_dim, hidden_dim=64):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.Tanh(),
            nn.Linear(hidden_dim, latent_dim)
        )

    def forward(self, u):
        return self.net(u)

# TrunkNet: encodes the evaluation point y
class TrunkNet(nn.Module):
    def __init__(self, input_dim=1, latent_dim=30, hidden_dim=64):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.Tanh(),
            nn.Linear(hidden_dim, latent_dim)
        )

    def forward(self, y):
        return self.net(y)

# DeepONet: combines branch and trunk outputs via dot product
class DeepONet(nn.Module):
    def __init__(self, branch_net, trunk_net):
        super().__init__()
        self.branch = branch_net
        self.trunk = trunk_net

    def forward(self, u, y):
        b = self.branch(u)  # [batch, p]
        t = self.trunk(y)   # [batch, p]
        return torch.sum(b * t, dim=-1, keepdim=True)  # scalar output

    def encode_branch(self, u):
        return self.branch(u)

    def encode_trunk(self, y):
        return self.trunk(y)

    def evaluate_with_gradients(self, u, y):
        u = u.clone().detach().requires_grad_(True)
        y = y.clone().detach().requires_grad_(True)
        s = self.forward(u, y)
        s.backward(torch.ones_like(s), retain_graph=True)
        return {
            "value": s.detach(),
            "∂s/∂u": u.grad.detach(),
            "∂s/∂y": y.grad.detach()
        }
