# DeepONet for Acoustic Wave Propagation - Hands-On Tutorial

**PhD Autumn School - Scientific Machine Learning**

## Learning Objectives

In this 1-hour hands-on exercise, you will:
1. Understand the DeepONet architecture for operator learning
2. Implement branch and trunk networks in PyTorch
3. Implement the inner product operation that combines branch and trunk outputs
4. Compare different activation functions (ReLU vs Sine)
5. Investigate the impact of Fourier feature expansions

## Introduction to DeepONet

**Deep Operator Networks (DeepONet)** learn operators that map functions to functions, rather than just vectors to vectors like traditional neural networks.

### The Problem

In acoustic wave propagation, we want to learn the operator $\mathcal{G}$ that maps:
- **Input function** $u(x)$: Initial pressure distribution (source configuration)
- **Output function** $s(x, y, t)$: Pressure field at any location $(x, y)$ and time $t$

Mathematically: $s = \mathcal{G}(u)$

### DeepONet Architecture

```
                    ┌─────────────┐
u(x) ─────────────> │ Branch Net  │ ───> [b₁, b₂, ..., bₚ]
  (function)        └─────────────┘            │
                                               │
                                               ├──> Inner Product ──> s(y)
                                               │
y = (x,y,t) ────> ┌─────────────┐            │
  (coordinates)   │  Trunk Net  │ ───> [t₁, t₂, ..., tₚ]
                  └─────────────┘
```

- **Branch network**: Encodes the input function $u$ into a latent representation
- **Trunk network**: Encodes the query coordinates $(x, y, t)$
- **Inner product**: Combines both representations: $s(y) = \sum_{i=1}^{p} b_i(u) \cdot t_i(y) + b_0$

### Key Insight

By learning this decomposition, the network can:
- Generalize to new source configurations $u$ (never seen during training)
- Evaluate at arbitrary query locations $(x, y, t)$
- Reduce computational cost compared to traditional PDE solvers

## Setup and Imports

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import sys

# Add project root to path
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader

# Import project data handlers
from deeponet_acoustics.datahandlers.datagenerators import (
    DataH5Compact,
    DatasetStreamer,
    pytorch_collate  # PyTorch collator for data loading
)

print(f"PyTorch version: {torch.__version__}")
print(f"Device: {'cuda' if torch.cuda.is_available() else 'cpu'}")

## 1. Data Loading (Pre-filled)

We'll use the existing data loading infrastructure to load 2D acoustic wave propagation data.
The data consists of:
- Training simulations with different source configurations
- Pressure fields computed at various spatial locations and time steps

In [None]:
# NOTE: Update these paths to point to your data
train_data_path = "path/to/training/data"  # TODO: Update this path
test_data_path = "path/to/test/data"       # TODO: Update this path

# Load training data
data_train = DataH5Compact(
    train_data_path,
    tmax=0.015,      # Maximum time to consider
    t_norm=343.0,    # Speed of sound for normalization
    flatten_ic=True, # Flatten initial conditions
    norm_data=True,  # Normalize spatial coordinates
)

# Load test data
data_test = DataH5Compact(
    test_data_path,
    tmax=0.015,
    t_norm=343.0,
    flatten_ic=True,
    norm_data=True,
)

print(f"Training sources: {data_train.N}")
print(f"Test sources: {data_test.N}")
print(f"Mesh points: {data_train.P_mesh}")
print(f"Time steps: {len(data_train.tsteps)}")
print(f"Input (u) shape: {data_train.u_shape}")

In [None]:
# Create datasets and dataloaders
batch_size_branch = 4    # Number of different sources per batch
batch_size_coord = 100   # Number of coordinate points per batch

dataset_train = DatasetStreamer(data_train, batch_size_coord=batch_size_coord)
dataset_test = DatasetStreamer(data_test, batch_size_coord=-1)  # Full dataset for testing

# Use pytorch_collate to convert to PyTorch tensors
dataloader_train = DataLoader(
    dataset_train,
    batch_size=batch_size_branch,
    shuffle=True,
    collate_fn=pytorch_collate,
    drop_last=True,
)

dataloader_test = DataLoader(
    dataset_test,
    batch_size=1,
    shuffle=False,
    collate_fn=pytorch_collate,
)

# Get a sample batch to understand dimensions
sample_batch = next(iter(dataloader_train))
(u_sample, y_sample), s_sample, _, _ = sample_batch

print(f"\nBatch shapes:")
print(f"  u (branch input): {u_sample.shape}  # [batch_branch, u_dim]")
print(f"  y (trunk input):  {y_sample.shape}  # [batch_branch, batch_coord, coord_dim]")
print(f"  s (output):       {s_sample.shape}  # [batch_branch, batch_coord]")

## 2. Fourier Feature Expansion (Pre-filled)

Fourier features help neural networks learn high-frequency functions by mapping inputs through:
$$\text{FourierFeatures}(y) = [y, \cos(2\pi f_1 y), \sin(2\pi f_1 y), \cos(2\pi f_2 y), \sin(2\pi f_2 y), ...]$$

This is particularly useful for learning periodic or oscillatory patterns like acoustic waves.

In [None]:
def fourier_feature_expansion(freqs=[]):
    """
    Create a Fourier feature expansion function.
    
    Args:
        freqs: List of frequencies for Fourier features
               Empty list means no Fourier features (just return input)
    
    Returns:
        Function that applies Fourier feature expansion
    """
    if len(freqs) == 0:
        return lambda y: y
    
    def expand(y):
        # y shape: [batch, coord_dim] or [batch, n_points, coord_dim]
        features = [y]
        for f in freqs:
            features.append(torch.cos(2 * np.pi * f * y))
            features.append(torch.sin(2 * np.pi * f * y))
        return torch.cat(features, dim=-1)
    
    return expand

# Example: no Fourier features
feat_fn_none = fourier_feature_expansion(freqs=[])

# Example: with Fourier features at specific frequencies
feat_fn_fourier = fourier_feature_expansion(freqs=[1.0, 2.0])

# Test
test_input = torch.randn(4, 3)  # [batch=4, coord_dim=3]
print(f"Input shape: {test_input.shape}")
print(f"Without Fourier features: {feat_fn_none(test_input).shape}")
print(f"With Fourier features: {feat_fn_fourier(test_input).shape}  # 3 + 2*2*3 = 15")

## 3. Network Components - YOUR TASK

Now it's your turn! Implement the branch and trunk networks.

### Task 3.1: Branch Network

The branch network takes the initial condition $u$ (a flattened vector) and produces a latent representation.

**TODO**: Complete the `BranchNet` class below.

In [None]:
class BranchNet(nn.Module):
    """
    Branch network for DeepONet.
    
    Architecture: input -> hidden -> hidden -> ... -> output
    """
    def __init__(self, input_dim, hidden_dim, output_dim, num_hidden_layers, activation='relu'):
        """
        Args:
            input_dim: Dimension of input function u
            hidden_dim: Number of neurons in hidden layers
            output_dim: Dimension of latent representation (p)
            num_hidden_layers: Number of hidden layers
            activation: 'relu' or 'sine'
        """
        super().__init__()
        
        # TODO: Create a list of layers
        # Structure: input_dim -> hidden_dim -> hidden_dim -> ... -> output_dim
        # Hint: Use nn.Linear for each layer
        # Hint: You'll need num_hidden_layers + 1 total layers
        
        layers = []
        
        # === YOUR CODE HERE ===
        # First layer: input_dim -> hidden_dim
        
        
        # Hidden layers: hidden_dim -> hidden_dim (repeat num_hidden_layers-1 times)
        
        
        # Output layer: hidden_dim -> output_dim
        
        # === END YOUR CODE ===
        
        self.layers = nn.ModuleList(layers)
        
        # Select activation function
        if activation == 'relu':
            self.activation = nn.ReLU()
        elif activation == 'sine':
            self.activation = lambda x: torch.sin(x)
        else:
            raise ValueError(f"Unknown activation: {activation}")
    
    def forward(self, u):
        """
        Forward pass through branch network.
        
        Args:
            u: Input function [batch_size, input_dim]
        
        Returns:
            Branch latent representation [batch_size, output_dim]
        """
        # TODO: Implement forward pass
        # Apply activation after each layer EXCEPT the last one
        
        x = u
        
        # === YOUR CODE HERE ===
        
        
        # === END YOUR CODE ===
        
        return x

### Task 3.2: Trunk Network

The trunk network takes coordinate points $(x, y, t)$ and produces a latent representation.

**TODO**: Complete the `TrunkNet` class below. It's very similar to BranchNet!

In [None]:
class TrunkNet(nn.Module):
    """
    Trunk network for DeepONet.
    
    Architecture: input -> hidden -> hidden -> ... -> output
    """
    def __init__(self, input_dim, hidden_dim, output_dim, num_hidden_layers, activation='relu'):
        """
        Args:
            input_dim: Dimension of coordinate input y (e.g., 3 for x,y,t)
            hidden_dim: Number of neurons in hidden layers
            output_dim: Dimension of latent representation (p) - must match branch output!
            num_hidden_layers: Number of hidden layers
            activation: 'relu' or 'sine'
        """
        super().__init__()
        
        # TODO: Implement trunk network layers (same structure as branch network)
        
        layers = []
        
        # === YOUR CODE HERE ===
        
        
        # === END YOUR CODE ===
        
        self.layers = nn.ModuleList(layers)
        
        if activation == 'relu':
            self.activation = nn.ReLU()
        elif activation == 'sine':
            self.activation = lambda x: torch.sin(x)
        else:
            raise ValueError(f"Unknown activation: {activation}")
    
    def forward(self, y):
        """
        Forward pass through trunk network.
        
        Args:
            y: Coordinate points [batch_size, n_points, input_dim]
        
        Returns:
            Trunk latent representation [batch_size, n_points, output_dim]
        """
        # TODO: Implement forward pass
        # Note: y has an extra dimension (n_points) compared to branch input
        
        x = y
        
        # === YOUR CODE HERE ===
        
        
        # === END YOUR CODE ===
        
        return x

## 4. DeepONet Model - YOUR TASK

### Task 4.1: Inner Product Operation

The key innovation of DeepONet is combining branch and trunk outputs via an inner product:

$$s(y) = \sum_{i=1}^{p} b_i(u) \cdot t_i(y) + b_0$$

where:
- $b_i(u)$ are the branch network outputs (one value per source $u$)
- $t_i(y)$ are the trunk network outputs (one value per coordinate $y$)
- $b_0$ is a learnable bias term

**TODO**: Complete the inner product operation in the `DeepONet` class below.

In [None]:
class DeepONet(nn.Module):
    """
    Deep Operator Network combining branch and trunk networks.
    """
    def __init__(self, branch_net, trunk_net, use_bias=True):
        """
        Args:
            branch_net: Branch network (BranchNet instance)
            trunk_net: Trunk network (TrunkNet instance)
            use_bias: Whether to use bias term b0
        """
        super().__init__()
        self.branch_net = branch_net
        self.trunk_net = trunk_net
        
        # Learnable bias term
        self.b0 = nn.Parameter(torch.zeros(1)) if use_bias else 0.0
    
    def forward(self, u, y):
        """
        Forward pass through DeepONet.
        
        Args:
            u: Branch input [batch_branch, u_dim]
            y: Trunk input [batch_branch, n_points, coord_dim]
        
        Returns:
            Predictions [batch_branch, n_points]
        """
        # Get branch and trunk latent representations
        branch_output = self.branch_net(u)  # [batch_branch, p]
        trunk_output = self.trunk_net(y)    # [batch_branch, n_points, p]
        
        # TODO: Implement inner product operation
        # Goal: Compute sum_i( branch_output[:,i] * trunk_output[:,:,i] )
        # Result should have shape [batch_branch, n_points]
        # 
        # Hints:
        #   - branch_output has shape [batch_branch, p]
        #   - trunk_output has shape [batch_branch, n_points, p]
        #   - You need to align dimensions for element-wise multiplication
        #   - Use .unsqueeze() to add dimensions
        #   - Use .sum(dim=...) to sum over the p dimension
        
        # === YOUR CODE HERE ===
        
        s_pred = None  # Replace with your implementation
        
        # === END YOUR CODE ===
        
        # Add bias
        if isinstance(self.b0, nn.Parameter):
            s_pred = s_pred + self.b0
        
        return s_pred

### Test Your Implementation

Let's test your implementation with random inputs:

In [None]:
# Create test networks
test_branch = BranchNet(input_dim=100, hidden_dim=64, output_dim=40, num_hidden_layers=2, activation='relu')
test_trunk = TrunkNet(input_dim=3, hidden_dim=64, output_dim=40, num_hidden_layers=2, activation='relu')
test_model = DeepONet(test_branch, test_trunk)

# Create random test data
test_u = torch.randn(4, 100)   # [batch_branch=4, u_dim=100]
test_y = torch.randn(4, 50, 3) # [batch_branch=4, n_points=50, coord_dim=3]

# Forward pass
test_output = test_model(test_u, test_y)

print(f"Branch output shape: {test_branch(test_u).shape}")
print(f"Trunk output shape: {test_trunk(test_y).shape}")
print(f"DeepONet output shape: {test_output.shape}")
print(f"Expected output shape: [4, 50]")

assert test_output.shape == (4, 50), "Output shape is incorrect!"
print("\n✓ Test passed! Your implementation is correct.")

## 5. Training Loop (Pre-filled)

Now we'll train the model. The training loop is provided for you.

In [None]:
def train_deeponet(model, dataloader_train, dataloader_test, num_epochs, learning_rate, device='cpu'):
    """
    Train DeepONet model.
    
    Args:
        model: DeepONet instance
        dataloader_train: Training data loader
        dataloader_test: Test data loader
        num_epochs: Number of training epochs
        learning_rate: Learning rate for optimizer
        device: 'cpu' or 'cuda'
    
    Returns:
        Dictionary with training history
    """
    model = model.to(device)
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    criterion = nn.MSELoss()
    
    history = {'train_loss': [], 'test_loss': []}
    
    for epoch in range(num_epochs):
        # Training
        model.train()
        train_loss_epoch = 0
        num_batches = 0
        
        for batch in dataloader_train:
            (u, y), s_true, _, _ = batch
            u, y, s_true = u.to(device), y.to(device), s_true.to(device)
            
            # Forward pass
            s_pred = model(u, y)
            loss = criterion(s_pred, s_true)
            
            # Backward pass
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            train_loss_epoch += loss.item()
            num_batches += 1
        
        train_loss_epoch /= num_batches
        history['train_loss'].append(train_loss_epoch)
        
        # Testing
        model.eval()
        test_loss_epoch = 0
        num_test_batches = 0
        
        with torch.no_grad():
            for batch in dataloader_test:
                (u, y), s_true, _, _ = batch
                u, y, s_true = u.to(device), y.to(device), s_true.to(device)
                
                s_pred = model(u, y)
                loss = criterion(s_pred, s_true)
                
                test_loss_epoch += loss.item()
                num_test_batches += 1
        
        test_loss_epoch /= num_test_batches
        history['test_loss'].append(test_loss_epoch)
        
        # Print progress
        if (epoch + 1) % 10 == 0 or epoch == 0:
            print(f"Epoch [{epoch+1}/{num_epochs}] - Train Loss: {train_loss_epoch:.6f}, Test Loss: {test_loss_epoch:.6f}")
    
    return history

In [None]:
def plot_training_history(history):
    """Plot training and test loss over epochs."""
    plt.figure(figsize=(10, 5))
    plt.semilogy(history['train_loss'], label='Train Loss', linewidth=2)
    plt.semilogy(history['test_loss'], label='Test Loss', linewidth=2)
    plt.xlabel('Epoch', fontsize=12)
    plt.ylabel('MSE Loss', fontsize=12)
    plt.title('Training History', fontsize=14)
    plt.legend(fontsize=12)
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

## 6. Experiments - YOUR TASK

Now that you have a working DeepONet implementation, let's investigate how design choices affect performance!

### Experiment 6.1: ReLU vs Sine Activation

**Research Question**: Do sine activations help for learning smooth, periodic acoustic wave functions?

**TODO**: Train two models and compare results.

In [None]:
# Hyperparameters
input_dim_branch = data_train.u_shape[0]  # From data
input_dim_trunk = 3  # x, y, t coordinates
hidden_dim = 64
output_dim = 64  # latent dimension p
num_hidden_layers = 3
learning_rate = 1e-3
num_epochs = 50

device = 'cuda' if torch.cuda.is_available() else 'cpu'

# === YOUR CODE HERE ===
# Create two models:
# 1. model_relu: using activation='relu'
# 2. model_sine: using activation='sine'

# Model 1: ReLU activation


# Model 2: Sine activation


# Train both models
print("Training model with ReLU activation...")
# history_relu = ...

print("\nTraining model with Sine activation...")
# history_sine = ...

# === END YOUR CODE ===

# Plot comparison
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.semilogy(history_relu['train_loss'], label='ReLU - Train', linewidth=2)
plt.semilogy(history_sine['train_loss'], label='Sine - Train', linewidth=2, linestyle='--')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss Comparison')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.semilogy(history_relu['test_loss'], label='ReLU - Test', linewidth=2)
plt.semilogy(history_sine['test_loss'], label='Sine - Test', linewidth=2, linestyle='--')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Test Loss Comparison')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print final results
print(f"\nFinal Test Loss:")
print(f"  ReLU: {history_relu['test_loss'][-1]:.6f}")
print(f"  Sine: {history_sine['test_loss'][-1]:.6f}")

### Experiment 6.2: Impact of Fourier Features

**Research Question**: Do Fourier features improve the network's ability to learn high-frequency wave patterns?

**TODO**: Train models with and without Fourier features.

In [None]:
# === YOUR CODE HERE ===
# Create datasets with different feature expansions:
# 1. No Fourier features: feat_fn_none = fourier_feature_expansion(freqs=[])
# 2. With Fourier features: feat_fn_fourier = fourier_feature_expansion(freqs=[1.0, 2.0, 4.0])

# Create DatasetStreamer with y_feat_extract_fn parameter
# Hint: dataset = DatasetStreamer(data_train, batch_size_coord=100, y_feat_extract_fn=feat_fn)

# Create DataLoaders

# Create models (remember to adjust input_dim_trunk based on Fourier features!)
# Without Fourier: input_dim_trunk = 3
# With Fourier [1.0, 2.0, 4.0]: input_dim_trunk = 3 + 2*3*3 = 21

# Train and compare

# === END YOUR CODE ===

### Discussion Questions

After running the experiments, discuss:

1. **Activation Functions**:
   - Which activation (ReLU vs Sine) performed better? Why might this be?
   - How did convergence speed differ between the two?
   - What physical properties of acoustic waves might favor one activation over another?

2. **Fourier Features**:
   - Did Fourier features improve performance? By how much?
   - What is the computational cost of Fourier features (hint: look at input dimensions)?
   - When would you recommend using Fourier features?

3. **DeepONet Architecture**:
   - Why is the operator learning approach useful for PDEs?
   - What are the advantages of DeepONet vs traditional PDE solvers?
   - What are the limitations?

## 7. Bonus Challenges (If Time Permits)

If you finish early, try these extensions:

### Challenge 1: Visualization
Visualize the predicted vs true pressure field for a test case. Create an animation over time.

### Challenge 2: Architecture Search
Experiment with:
- Different latent dimensions (p)
- Different numbers of hidden layers
- Different hidden layer widths

### Challenge 3: Advanced Features
- Implement adaptive learning rate scheduling
- Try different optimizers (SGD, AdamW)
- Add L2 regularization

### Challenge 4: Physical Constraints
- How could you incorporate physical constraints (e.g., energy conservation) into the loss function?
- Implement a physics-informed loss term

## Summary

In this tutorial, you:
- ✓ Learned about DeepONet architecture for operator learning
- ✓ Implemented branch and trunk networks in PyTorch
- ✓ Implemented the crucial inner product operation
- ✓ Compared ReLU vs Sine activations for smooth functions
- ✓ Investigated the impact of Fourier feature expansions

**Key Takeaways**:
1. DeepONet learns operators (function-to-function mappings) rather than just vectors
2. The architecture separates encoding of inputs (branch) and query locations (trunk)
3. Activation functions and feature engineering significantly impact performance
4. Operator learning is a powerful tool for surrogate modeling of PDEs

**Further Reading**:
- Original DeepONet paper: Lu et al. (2021), "Learning nonlinear operators via DeepONet"
- Physics-Informed DeepONet: Wang et al. (2021)
- Fourier Features: Tancik et al. (2020), "Fourier Features Let Networks Learn High Frequency Functions"