A drop-in PINN architecture that mitigates spectral bias and gradient pathologies by adaptively partitioning the input domain through soft fuzzy rules and softmax attention.
- Why ePINN-AF?
- Method at a glance
- Repository structure
- Installation
- Quick start
- Benchmarks
- How it works (deeper dive)
- Reproducing the paper
- Diagnostics
- Citation
- Contact
- Acknowledgments
- License
Physics-Informed Neural Networks (PINNs, Raissi et al. 2019) are an elegant framework for solving PDEs: parameterize the solution by an MLP, take analytic derivatives via autograd, and minimize the residual at collocation points. In practice three well-documented pathologies make plain PINNs fragile:
| Pathology | Symptom | Fix in ePINN-AF |
|---|---|---|
| Spectral bias | Network learns low-frequency components and ignores high-frequency residuals (NTK eigen gap). | Per-rule heads h_j(z) over a shared backbone give each rule its own spectrum. |
| Gradient pathologies (Wang et al. 2021) | ‖∇θ L_pde‖ ≫ ‖∇θ L_data‖, optimisation stalls in the residual term. | Soft fuzzy gating γ_j = α_j·μ_j localizes residual gradients per region. |
| Multi-regime / stiff problems | One MLP must fit pre-shock + post-shock or several disparate length-scales simultaneously. | Fuzzy rules partition the input space; attention selects which rules fire where. |
ePINN-AF keeps the same loss form as a standard PINN
(L = MSE_data + MSE_pde, no adaptive weights, no curriculum) and addresses
the above purely through the architecture. The architectural cost is modest
— a few extra hundred parameters for the attention sub-net and M fuzzy
heads — and the training pipeline (Adam → L-BFGS) is identical.
For an input z ∈ ℝ^d (e.g. (x, t) or (x, y, t)), the network outputs
û(z) = Σ_{j=1}^{M} α_j(z) · μ_j(z) · h_j(z; θ_h) + b
└─────────┬───────────┘└──────┬──────┘
adaptive gate per-rule head
where
| Symbol | Definition | Role |
|---|---|---|
μ_j(z) |
exp(-½ Σᵢ ((zᵢ - c_{ji}) / σ_{ji})²) |
Soft Gaussian rule over the input domain |
α_j(z) |
softmax_j(W₂ · tanh(W₁ z + b₁) + b₂) |
Attention weights — what each rule should focus on |
γ_j(z) |
μ_j(z) · α_j(z) |
Combined gating per rule |
h(z; θ_h) |
MLP backbone, tanh on every hidden layer | Shared latent features |
h_j(z) |
W_j · h(z) |
Per-rule projection of the shared features |
b |
Trainable output bias | — |
The fuzzy centers c_j and widths σ_j are learnable — the network
discovers where to place its rules. With M = 4–16 rules ePINN-AF already
matches or beats much larger plain PINNs on every benchmark in this repo.
Two optional switches (off by default for 1-D PDEs, on for Navier-Stokes):
partition_dims— restrictμ_jto a subset of input axes. Example:partition_dims=[2]on a(x, y, t)problem makes rules localize along time only, ideal for periodically-shedding flows.use_direct_head— adds a parallelW_d · h(z)path that bypasses the fuzzy gate. Guarantees a clean gradient highway and helps on multi-output problems.
ePINN-AF/
git clone https://github.com/amiin10/ePINN-AF.git
cd ePINN-AF
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txtTested with Python 3.9–3.12, PyTorch ≥ 2.0 on both CPU and CUDA. A single mid-range GPU (e.g. T4, 3060) is enough for every experiment; runtime per script ranges from ~3 minutes (Burgers) to ~40 minutes (Navier-Stokes with 20 000 Adam iterations).
Four .mat files are needed (Burgers, Allen-Cahn, KdV, Navier-Stokes).
The Poisson experiments use an analytical solution and need no data. See
datasets/README.md for the download links.
datasets/
├── README.md
├── burgers_shock.mat
├── AC.mat
├── KdV.mat
└── cylinder_nektar_wake.mat
Each PDE folder contains one or more train_*.py scripts. They are
self-contained — pick one and run it:
The training script saves a pickle (e.g. burgers_base_results.pkl) with
the prediction, reference, errors, loss history, and configuration. The
matching plot_*.py script consumes that pickle and renders a 4-panel
figure: reference, prediction, absolute error, and loss curve.
The package is small and unopinionated:
import torch
from ePINN_AF import AFPINN, seed_torch, get_device
seed_torch(0)
device = get_device()
model = AFPINN(
input_dim = 2, # e.g. (x, t)
backbone_layers = [200, 200, 200, 200], # tanh on every hidden layer
n_rules = 8, # M fuzzy rules
attn_hidden = 64, # width of attention sub-net
output_dim = 1, # scalar PDE
lb = [-1.0, 0.0], # domain bounds (for normalization)
ub = [ 1.0, 1.0],
).to(device)
u_hat = model(torch.randn(128, 2, device=device)) # [128, 1]You only need to pair it with a PDE-specific residual (typically a handful
of torch.autograd.grad calls) and an optimizer. See any of the
train_*.py scripts for a complete, minimal template.
The numbers below are representative values from the configurations in this repo; exact figures depend on hardware, PyTorch version, and seed. All errors are relative L² unless otherwise noted.
| PDE | Script (this repo) |
|---|---|
| Burgers | Burgers/ |
| Allen-Cahn | AllenCahn/ |
| KdV | KdV/ |
| Poisson 2-D | Poisson/ |
| Poisson 3-D | Poisson/ |
NS cylinder wake — u |
NavierStokes/ |
NS cylinder wake — v |
(same) |
NS cylinder wake — p |
(same) |
Refer to the manuscript for head-to-head comparisons against PINN, APINN, FPINN, SA-PINN and CausalPINN under identical settings.
The trick is in γ_j(z) = α_j(z) · μ_j(z).
The fuzzy term μ_j(z) is spatially local — it lights up only where the
input is close to the rule's learned center c_j. The attention term
α_j(z) is content-driven — it can suppress or amplify a rule based on
the full input. Their product gives a soft, learnable partition of the
input space:
- In smooth regions where one rule's center is dominant, that rule's head
h_j(z)carries the prediction. - Near sharp features (shocks, interfaces, vortices), several rules can overlap and their weighted sum captures the local behaviour at higher effective resolution than a single MLP.
Because the gating is fully differentiable, gradients flow back into the
centers c_j, widths σ_j, attention weights, backbone parameters, and
per-rule heads all at once. There are no auxiliary losses, no manual
weight tuning, no curriculum.
Spectral bias. The per-rule heads h_j(z) = W_j · h(z) give each rule
its own linear projection of the backbone features. In the NTK regime this
amounts to widening the effective spectral support of the network: rather
than K_uu being a single rank-D kernel, it becomes a mixture of M
rank-D kernels selected per location by γ_j. The result is a flatter NTK
eigenvalue decay and a lower condition number, both of which correlate
with the residual term being well-conditioned (Wang, Wang & Perdikaris,
2022).
The training scripts in this repo are simplified to make ePINN-AF easy to read and re-use. The original full-experiment scripts (including baselines PINN/APINN/FPINN/SA-PINN/CausalPINN and the editor-requested diagnostics) were used to produce the manuscript's tables and figures. Those baseline codes are not included here — this repository is focused on the proposed method only.
If you would like the comparison scripts, please open an issue or contact the author (see Contact).
ePINN_AF.utils also exposes the three diagnostic toolkits used in the
paper's rebuttal section. They depend only on a wrapper exposing
model, loss_fn, net_u, net_f (and optionally checkpoints):
from ePINN_AF import (
split_grad_norms, # Wang-Teng-Perdikaris (2021) gradient flow
per_layer_grad_stats,
loss_landscape_1d, # Li et al. (2018) filter-normalised
loss_landscape_2d,
compute_ntk_kernels, # PINN-NTK (Wang, Wang & Perdikaris, 2022)
evaluate_ntk_at_checkpoints,
)These are not enabled by default in the streamlined per-PDE scripts to
keep the runtime short. See the docstrings in ePINN_AF/utils.py for
how to wire them in.
If you use this code in your research, please cite both the software and
the paper. The GitHub "Cite this repository" button reads from
CITATION.cff.
Aminhosseini ✉ [amin.hosseini1@aut.ac.ir]
🔬 Department of Mechanical Engineering, Amirkabir University of Technology (Tehran Polytechnic)
Bug reports, feature requests, and questions are very welcome via GitHub Issues.
- The benchmark datasets (
burgers_shock.mat,AC.mat,KdV.mat,cylinder_nektar_wake.mat) come from the original PINN and HFM repositories by Maziar Raissi and collaborators. - The gradient-flow diagnostic follows Wang, Teng & Perdikaris (2021, Understanding and mitigating gradient flow pathologies in physics-informed neural networks).
- The filter-normalised loss-landscape visualization follows Li, Xu, Taylor, Studer & Goldstein (2018, Visualizing the Loss Landscape of Neural Nets).
- The PINN-NTK decomposition follows Wang, Wang & Perdikaris (2022, When and why PINNs fail to train: a Neural Tangent Kernel perspective).
This project is released under the MIT License.
The benchmark datasets remain under their original licenses; please
refer to the upstream repositories listed in
datasets/README.md.
Built with ❤ for the scientific-machine-learning community.