Proof of Concept: This repository demonstrates how to use GPU acceleration to speed up single-agent pathfinding using PyTorch. This is a research prototype and I'm open to discussions about extensions, optimizations, and applications.
A novel approach to pathfinding using GPU-accelerated neural networks with convolutional activation propagation and gradient-based path tracing.
This implements maximally parallelized dynamic programming. Instead of sequentially exploring individual states (like A* or Dijkstra), the entire state space is updated simultaneously on the GPU. Each iteration propagates activation to all neighbors in parallel, making the algorithm blazingly fast.
The core is surprisingly simple - a single-layer neural network with a custom propagation kernel:
class SimpleNN(nn.Module):
def __init__(self, in_channels, out_channels, map_mask, start_xy, goal_xy, map_name):
super().__init__()
self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1)
self.relu = nn.ReLU()
self.map_mask = map_mask
self.map_mask_4d = map_mask.unsqueeze(0).unsqueeze(0)
self.start_xy = start_xy
self.goal_xy = goal_xy
self.map_name = map_name
def forward(self, x):
saved_tensors = []
goal_y, goal_x = self.goal_xy
for i in range(int(1e6)):
x = self.relu(self.conv(x) * self.map_mask_4d)
x = clip_preserve_grad(x, min_val=0.0, max_val=1.0)
x.retain_grad()
saved_tensors.append(x)
if x[0, 0, goal_y, goal_x] > 0:
return x, True, saved_tensors
return x, False, saved_tensorsHow it works:
- Conv layer: Propagates activation to neighbors (4-connected kernel: up/down/left/right)
- ReLU: Only positive values propagate forward
- Mask multiplication: Zeros out obstacles
- Custom gradient clipping: Preserves gradient flow for path tracing while preventing numerical explosion
- Iteration: Continues until goal receives positive activation
- โฑ๏ธ Time Complexity: O(length of optimal path) - search time scales with solution length, not map size
- ๐ฏ All Optimal Paths: Exposes all equally optimal paths simultaneously
- ๐จ Simple Implementation: ~50 lines of core logic, no complex data structures
- ๐ง No Heuristics Required: Works out-of-the-box on unseen grids of any type
- ๐บ๏ธ Universal: Works with any map representable as a tensor
- โก Blazingly Fast: Massive parallelization on GPU delivers exceptional performance
Visual comparison of expanded nodes (exploration) vs. optimal paths (solution) on different map types:
- Expanded Nodes: Shows all cells explored during forward pass when goal is reached
- Optimal Paths: Shows the traced path(s) extracted via gradient accumulation
- Avg Runtime: Average execution time over 100 runs (forward + backward pass only, excludes initialization and I/O)
Benchmark Details: Measured on MPS (Apple Silicon) with 100 runs per map. Times include forward pass (activation propagation) and backward pass (gradient-based path extraction), excluding map loading and visualization. Note: These benchmarks were run on Mac with MPS backend. CUDA-enabled GPUs typically provide significantly better performance. See
benchmarks.txtfor detailed statistics.
This project implements pathfinding as a neural network forward-backward pass:
1. Forward Pass (Activation Propagation):
- Place activation at start position
- Iteratively apply convolution (spreads to neighbors) + ReLU
- Multiply by map mask (blocks obstacles)
- Custom gradient-preserving clipping prevents numerical explosion
- Continue until activation reaches goal
2. Backward Pass (Path Extraction):
- Compute gradients from goal back to start
- Gradient flow reveals the path taken by activation
- Accumulate positive gradients to extract final path
3. GPU Acceleration:
- All operations run on GPU (CUDA/MPS/CPU fallback)
- Handles large 3D maps efficiently (tested on 896ร390ร255 voxels)
- ๐บ๏ธ 2D pathfinding on grid maps (.map format)
- ๐ง 3D pathfinding on voxel maps (.3dmap format)
- ๐ Interactive 3D visualizations using Plotly
- โก Automatic device selection (CUDA > MPS > CPU)
- ๐ฒ Random map selection for testing
- โ๏ธ Configurable memory/quality tradeoffs
- Python 3.12+
- uv package manager
- GPU with CUDA or Apple Silicon (MPS) for best performance
# Clone the repository
git clone <repository-url>
cd GPUPathfindingProject
# Install dependencies using uv
uv sync
# Activate the environment (optional, uv run handles this automatically)
source .venv/bin/activatepip install -r requirements.txttorch >= 2.11.0(with CUDA/MPS support)matplotlib >= 3.10.8plotly >= 6.6.0psutil >= 7.2.2
Run on a specific map:
uv run search2D_nn.py --map maze-128-128-1.mapRandom map selection:
uv run search2D_nn.pyWith custom seed and device:
uv run search2D_nn.py --map warehouse-20-40-10-2-2.map --seed 8535 --device cudaRun on a specific map:
uv run search3D_nn.py --map A1.3dmapWith custom settings:
uv run search3D_nn.py --map A1.3dmap --seed 8535 --save-freq 50 --device cudaNote: If you've activated the virtual environment with
source .venv/bin/activate, you can usepythoninstead ofuv run.
Common Parameters:
--map: Map file name (frommaps/2d/ormaps/3d/)--seed: Random seed for reproducibility--device: Force device (auto/cuda/mps/cpu) - available for both 2D and 3D
3D-Specific Parameters:
--save-freq: Save every Nth iteration (default: 100, lower = more memory, better path visualization)
GPUPathfindingProject/
โโโ maps/
โ โโโ 2d/ # 2D grid maps (.map format)
โ โโโ 3d/ # 3D voxel maps (.3dmap format)
โโโ outputs/ # Generated visualizations (HTML/PNG)
โโโ search2D_nn.py # 2D pathfinding script
โโโ search3D_nn.py # 3D pathfinding script
โโโ utils.py # 2D utilities (map loading, plotting)
โโโ utils3D.py # 3D utilities (map loading, 3D visualization)
โโโ pyproject.toml # uv project configuration
โโโ uv.lock # uv lock file
โโโ requirements.txt # Pip-compatible requirements
โโโ README.md # This file
type octile
height 32
width 32
map
@@@@@@@@...
...
.= walkable@,T, etc. = obstacles
voxel <width> <height> <depth>
<x> <y> <z>
...
- Lists occupied (obstacle) voxels
- All other voxels are walkable
Standard torch.clamp() kills gradients at boundaries. We use a custom autograd function that clips forward values while passing gradients unchanged:
class ClipWithFullGradients(torch.autograd.Function):
@staticmethod
def forward(ctx, input, min_val, max_val):
return torch.clamp(input, min=min_val, max=max_val)
@staticmethod
def backward(ctx, grad_output):
return grad_output, None, None # Full gradient passthrough- 2D: 4-connected (up, down, left, right)
- 3D: 18-connected (6 faces + 12 edges, excludes 8 corners)
For large 3D maps, saving every iteration is memory-prohibitive. We use checkpoint-based gradient accumulation:
- Save every Nth iteration (configurable via
--save-freq) - Detach intermediate tensors to break gradient chain
- Trade path visualization quality for memory efficiency
- 2D: PNG images with matplotlib visualization
{map_name}_expanded_nodes.png- All explored cells{map_name}_optimal_paths.png- Traced optimal path(s)
- 3D: Interactive HTML with Plotly (rotate, zoom, inspect voxels)
{map_name}_expanded_nodes.html- 3D exploration visualization{map_name}_optimal_paths.html- 3D path visualization{map_name}_start_goal_preview.html- Start/goal preview before pathfinding
- All outputs saved to
outputs/directory
- Use GPU: CUDA or MPS dramatically speeds up computation
- Adjust save frequency: Higher
--save-freq= less memory (100-1000 for 3D) - Map size: Larger maps take longer but are fully supported
- Device selection: Let
autochoose, or force with--device
- Initialize activation tensor with 1.0 at start position
- Apply Conv2d/Conv3d with neighbor connectivity kernel
- Apply ReLU activation
- Multiply by map mask to zero out obstacles
- Apply custom gradient-preserving clip to [0, 1]
- Check if goal position has positive activation
- If goal reached: backward pass from goal
- Accumulate positive gradients across saved checkpoints
- Visualize accumulated gradients as the discovered path
- Differentiable: The entire pathfinding process is differentiable
- Parallel: GPU acceleration for massive parallelism
- Flexible: Easy to modify connectivity patterns or cost functions
- Visualizable: Gradient flow shows exactly how activation propagated
- โ Trivially extends to Prioritized Planning (PrP): Each agent plans sequentially in a predefined order. The algorithm naturally handles this by planning one agent at a time.
- โ Non-trivial for other MAPF algorithms: Adapting to sophisticated algorithms like CBS (Conflict-Based Search) or ECBS requires significant modifications.
The repository includes a 3D implementation (search3D_nn.py), but faces memory constraints:
- Gradient explosion issue: The width ร height ร depth ร time dimensions exceed GPU memory during backward pass
- This is a scale/optimization problem, not a theoretical limitation
- โ Forward pass works: Can find the optimal solution length
- โ Backward pass (path extraction): Requires memory optimization techniques (checkpointing, gradient accumulation)
This is a proof-of-concept repository. I'm open to discussions about:
- Performance optimizations
- Memory-efficient 3D path extraction
- Extensions to MAPF algorithms
- Applications in robotics, game AI, or other domains
Feel free to open issues or reach out!
Contributions welcome! Areas for improvement:
- More connectivity patterns (8-connected 2D, 26-connected 3D)
- Memory-efficient backward pass for 3D
- Optimal subpath guarantees
- Benchmarking against A*/Dijkstra
The 2D and 3D map files used in this repository are taken from the Moving AI Lab Benchmarks, a comprehensive collection of pathfinding benchmark scenarios used widely in MAPF (Multi-Agent Pathfinding) research.
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this code in research, please cite:
@software{pertzovsky2026gpu,
author = {Pertzovsky, Arseniy},
title = {GPU-Accelerated Neural Network Pathfinding},
year = {2026},
publisher = {GitHub},
url = {https://github.com/Arseni1919/GPUPathfindingProject}
}Or in plain text:
Pertzovsky, A. (2026). GPU-Accelerated Neural Network Pathfinding. GitHub. https://github.com/Arseni1919/GPUPathfindingProject
Built with PyTorch ๐ฅ | Visualized with Plotly ๐ | Accelerated by GPUs ๐





