# Physics-Informed Neural Networks (PINNs) - Comprehensive Theory

## 1. Introduction

Physics-Informed Neural Networks (PINNs) are a class of neural networks that incorporate physical laws, described by partial differential equations (PDEs), directly into the training process. This approach bridges machine learning with scientific computing.

### Key Innovation
Instead of purely data-driven learning, PINNs embed the governing physics equations as soft constraints in the loss function, allowing the network to learn solutions that satisfy both data and physical laws.

## 2. Mathematical Foundation

### 2.1 General PDE Form

Consider a general PDE in space-time domain:

```
∂u/∂t + N[u; λ] = 0,  x ∈ Ω, t ∈ [0, T]
```

Where:
- `u(x, t)` is the solution we seek
- `N[·]` is a nonlinear differential operator
- `λ` are parameters of the PDE
- `Ω` is the spatial domain

### 2.2 Boundary and Initial Conditions

**Initial Condition (IC):**
```
u(x, 0) = u₀(x)
```

**Boundary Conditions (BC):**
```
B[u] = g(x, t)  on ∂Ω
```

## 3. PINN Architecture

### 3.1 Neural Network Approximation

The solution u(x, t) is approximated by a deep neural network:

```
u(x, t) ≈ û(x, t; θ)
```

Where θ represents all trainable parameters (weights and biases).

### 3.2 Network Structure

```
Input Layer: [x, t]
    ↓
Hidden Layer 1: Linear → Activation
    ↓
Hidden Layer 2: Linear → Activation
    ↓
    ...
    ↓
Hidden Layer N: Linear → Activation
    ↓
Output Layer: u(x, t)
```

**Common Activation Functions:**
- Hyperbolic tangent: tanh(x)
- Sine: sin(x)
- Swish: x · sigmoid(x)

## 4. Automatic Differentiation

PINNs leverage automatic differentiation to compute derivatives of the network output with respect to inputs:

```
∂û/∂t, ∂û/∂x, ∂²û/∂x², etc.
```

This allows the network to evaluate the PDE residual at any point without numerical differentiation.

### Example in PyTorch:
```python
u = network(x, t)
u_t = torch.autograd.grad(u, t, grad_outputs=torch.ones_like(u), 
                          create_graph=True)[0]
u_x = torch.autograd.grad(u, x, grad_outputs=torch.ones_like(u), 
                          create_graph=True)[0]
```

## 5. Loss Function Formulation

The total loss is a weighted sum of multiple components:

```
L_total = w_data · L_data + w_pde · L_pde + w_ic · L_ic + w_bc · L_bc
```

### 5.1 Data Loss
```
L_data = (1/N_data) Σ |û(x_i, t_i) - u_i|²
```

### 5.2 PDE Residual Loss
```
L_pde = (1/N_pde) Σ |∂û/∂t + N[û]|²
```

### 5.3 Initial Condition Loss
```
L_ic = (1/N_ic) Σ |û(x_i, 0) - u₀(x_i)|²
```

### 5.4 Boundary Condition Loss
```
L_bc = (1/N_bc) Σ |B[û](x_i, t_i) - g(x_i, t_i)|²
```

## 6. Training Process

### 6.1 Collocation Points

PINNs use collocation points sampled from the domain:

- **Data points**: Where measurements are available
- **PDE collocation points**: Interior points where PDE must be satisfied
- **Boundary points**: Points on domain boundaries
- **Initial points**: Points at t = 0

### 6.2 Optimization

**Common Optimizers:**
1. **Adam**: Fast convergence in early stages
2. **L-BFGS**: Better convergence for fine-tuning

**Two-Stage Training:**
```
Stage 1: Adam optimizer (epochs: 10,000-50,000)
Stage 2: L-BFGS optimizer (iterations: 1,000-5,000)
```

### 6.3 Training Algorithm

```
1. Initialize network parameters θ
2. For each epoch:
   a. Sample collocation points
   b. Forward pass: compute û and derivatives
   c. Compute loss components
   d. Backward pass: compute gradients
   e. Update parameters: θ ← θ - α∇L
3. Repeat until convergence
```

## 7. Advantages of PINNs

1. **Data Efficiency**: Can learn from sparse data by incorporating physics
2. **Mesh-Free**: No need for discretization of space-time domain
3. **Inverse Problems**: Can infer unknown parameters in PDEs
4. **Continuous Solution**: Provides solution at any point in domain
5. **Physical Consistency**: Solutions respect conservation laws

## 8. Challenges and Solutions

### 8.1 Training Difficulties

**Problem**: Imbalanced gradients between loss terms

**Solution**: 
- Adaptive loss weighting
- Gradient normalization
- Learning rate scheduling

### 8.2 Spectral Bias

**Problem**: NNs learn low-frequency features first

**Solution**:
- Fourier feature mapping
- Periodic activation functions
- Multi-scale architectures

### 8.3 Hard Constraints

**Problem**: Soft constraints may not be exactly satisfied

**Solution**:
- Penalty methods with high weights
- Transform network output to enforce constraints
- Sequential training strategies

## 9. Extensions and Variants

### 9.1 Conservative PINNs (cPINNs)
Enforce conservation laws exactly through network architecture

### 9.2 Variational PINNs (VPINNs)
Use variational formulation for better accuracy

### 9.3 Extended PINNs (XPINNs)
Domain decomposition for large-scale problems

### 9.4 Bayesian PINNs
Quantify uncertainty in predictions

## 10. Key Hyperparameters

| Parameter | Typical Range | Description |
|-----------|---------------|-------------|
| Hidden Layers | 4-8 | Depth of network |
| Neurons/Layer | 20-50 | Width of network |
| Learning Rate | 1e-3 to 1e-4 | Step size for optimizer |
| Collocation Points | 10,000-50,000 | Points for PDE evaluation |
| Loss Weights | 0.1-10 | Balance between loss terms |
| Activation | tanh, sin | Nonlinear function |

## 11. Practical Implementation Tips

1. **Normalization**: Scale inputs and outputs to [-1, 1] or [0, 1]
2. **Initialization**: Use Xavier or He initialization
3. **Monitoring**: Track individual loss components
4. **Visualization**: Plot predictions regularly during training
5. **Residual Plotting**: Visualize where PDE is violated
6. **Convergence**: Use relative L2 error for assessment

## 12. Applications

- **Fluid Dynamics**: Navier-Stokes, Euler equations
- **Heat Transfer**: Diffusion, convection problems
- **Solid Mechanics**: Elasticity, wave propagation
- **Quantum Mechanics**: Schrödinger equation
- **Finance**: Black-Scholes equation
- **Biology**: Reaction-diffusion systems

## 13. Performance Metrics

### L2 Relative Error
```
Error = ||u_true - u_pred||₂ / ||u_true||₂
```

### Mean Squared Error (MSE)
```
MSE = (1/N) Σ (u_true - u_pred)²
```

### Maximum Absolute Error
```
Max Error = max|u_true - u_pred|
```

## 14. References and Further Reading

1. Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks
2. Karniadakis, G. E., et al. (2021). Physics-informed machine learning
3. Cuomo, S., et al. (2022). Scientific machine learning through physics-informed neural networks

---

**Next Steps**: Apply this theory to implement PINNs for Burger's equation, which combines nonlinear convection with diffusion.