A TypeScript-based maze generation system that creates diverse, solvable mazes with optimal A* pathfinding solutions, designed specifically for LLM fine-tuning and benchmarking.
This project generates mazes of varying sizes (5x5, 10x10, 20x20) using three distinct algorithms, ensuring structural diversity and algorithmic variety. Each generated maze is guaranteed to be solvable and includes the optimal path from start (P) to destination (D).
-
Three Distinct Generation Algorithms:
- Recursive Backtracker (DFS-based): Long, winding corridors with many dead ends
- Prim's Algorithm (MST-based): Blockier, denser feel with room-like areas
- Cellular Automata: Organic, cave-like structures with irregular walls
-
Robust A Pathfinding*: Finds optimal solutions using Manhattan distance heuristic
-
Quality Guarantees:
- All generated mazes are solvable
- Start (P) and destination (D) are well-separated
- Single connected component (no isolated areas)
-
Dual Output Formats:
- Fine-tuning dataset (JSONL)
- Benchmark dataset (JSON with metadata)
bun installbun run generate
# or
bun startThis will generate:
data/finetuning_dataset.jsonl: 900 training samples (100 per algorithm per size)data/benchmark_maps.json: 90 benchmark samples (10 per algorithm per size)
bun run visualize
# or open visualizer.html in your browserOpens an interactive web-based visualizer to inspect generated mazes with:
- Visual grid display with color-coded cells (walls, empty, start, destination, path)
- Path visualization - show/hide optimal path, step through it move-by-move
- Filtering - view specific algorithms or sizes
- Navigation - Previous/Next buttons or keyboard arrows (β/β)
- Random selection - Jump to a random maze for spot-checking
- Keyboard shortcuts - Arrow keys to navigate, Space to toggle path
Visualizer Features:
- π¨ Beautiful, responsive UI with color-coded maze elements
- π Displays map metadata (ID, algorithm, size, path length)
- π Easy navigation between all 90 benchmark maps
- π² Random button for quick spot-checking
- βοΈ Step-by-step path visualization to trace the solution
- π± Mobile-friendly responsive design
Test how well different LLMs can solve the generated mazes:
# 1. Set up your OpenRouter API key
echo "OPENROUTER_API_KEY=your_key_here" > .env.local
# 2. Run the benchmark
bun run benchmarkFeatures:
- π€ Test multiple models in parallel (Claude, GPT-4, etc.)
- π Automatic tracking of accuracy, tokens, cost, and latency
- πΎ Results saved as JSON and CSV for analysis
- β‘ Configurable parallelism for fast execution
- π Progress reporting in real-time
Results include:
- Accuracy percentage per model
- Average response time
- Total tokens used and estimated cost
- Detailed per-test breakdown in CSV format
mazesolver/
βββ lib/ # π¦ Reusable libraries
β βββ llm-benchmark-runner/ # LLM benchmark library (can be extracted)
β βββ src/
β β βββ index.ts # Public API exports
β β βββ types.ts # Core types and interfaces
β β βββ openrouter.ts # OpenRouter API client
β β βββ runner.ts # Parallel benchmark runner
β βββ package.json # Library package config
β βββ README.md # Library documentation
βββ src/
β βββ types.ts # Type definitions
β βββ index.ts # Main generation script
β βββ utils/
β β βββ gridOperations.ts # Grid utilities (walls, P/D placement)
β βββ pathfinding/
β β βββ astar.ts # A* pathfinding algorithm
β βββ mapGenerator/
β β βββ recursiveBacktracker.ts # DFS-based maze generator
β β βββ prim.ts # Prim's algorithm generator
β β βββ cellularAutomata.ts # Cellular automata generator
β βββ benchmark/ # Maze benchmarking
β βββ mazeSolver.ts # Maze-specific logic
β βββ run.ts # Main benchmark script
βββ data/ # Generated datasets
βββ results/ # Benchmark results
βββ visualizer.html # Interactive maze viewer
βββ serve-visualizer.ts # Local web server for visualizer
βββ package.json
#: Wall (impassable)P: Player start positionD: Destination (goal).: Walkable tile (empty space)
#######
#P....#
#.###.#
#.#...#
#.#.#.#
#...#D#
#######
Note: Map dimensions (5x5, 10x10, 20x20) refer to the inner walkable grid, excluding the outer perimeter walls. A 5x5 inner grid becomes a 7x7 grid with walls.
Description: Creates "perfect" mazes with no loops or isolated sections. Each pair of cells has exactly one path between them.
Characteristics:
- Long, winding corridors
- Many dead ends
- Spindly, organic branching
- High path complexity
Use Case: Testing LLM's ability to handle complex, maze-like navigation with many decision points.
Description: Also creates perfect mazes but with a different structural feel.
Characteristics:
- Blockier, denser layout
- More interconnected "room-like" areas
- Wider passages
- Bulbous cavern structures
Use Case: Testing LLM's spatial reasoning with more open, room-based layouts.
Description: Generates organic, natural-looking cave systems using Game of Life-like rules.
Algorithm:
- Initialize with random wall distribution (~45%)
- Apply cellular automata rules for 5-8 iterations
- Ensure single connected component via flood fill
- Smooth edges with additional iterations
Characteristics:
- Organic, irregular shapes
- Natural cavern feel
- Open areas with scattered obstacles
- Less predictable structure
Use Case: Testing LLM's adaptability to non-traditional, organic layouts.
Each line is a JSON object:
{"input": "Map:\n#######\n#P....#\n#.....#\n#.....#\n#....D#\n#######", "output": "DOWN,DOWN,DOWN,RIGHT,RIGHT,RIGHT,RIGHT"}JSON array of objects with metadata:
[
{
"id": "rb_5x5_bm_001",
"algorithm": "recursiveBacktracker",
"size": "5x5",
"map": "#######\n#P....#\n#.....#\n#.....#\n#....D#\n#######",
"optimalPath": ["DOWN", "DOWN", "DOWN", "RIGHT", "RIGHT", "RIGHT", "RIGHT"]
}
]Edit src/index.ts to customize generation:
const CONFIG = {
numMapsPerAlgorithmPerSize: {
finetuning: 100, // Maps per algorithm per size for training
benchmark: 10 // Maps per algorithm per size for testing
},
mapSizes: [5, 10, 20], // Inner dimensions
algorithms: ['recursiveBacktracker', 'prim', 'cellularAutomata']
};The benchmark system uses @fallom/llm-benchmark-runner - a reusable library in lib/llm-benchmark-runner/.
Library Features (in lib/llm-benchmark-runner/):
- β OpenRouter API integration with automatic cost/token tracking
- β Parallel execution with configurable concurrency
- β Progress reporting and error handling
- β Flexible output parsing and evaluation
- β Results export to JSON and CSV
- β Ready to extract as a standalone npm package
Maze-Specific Logic (in src/benchmark/):
- Output parsing for direction sequences
- Path evaluation (exact match, length comparison)
- Prompt formatting for maze problems
The library is structured to be easily extracted and published. See lib/llm-benchmark-runner/README.md for library documentation.
Example: Adding a new model
Edit src/benchmark/run.ts:
const models: ModelConfig[] = [
{
id: 'anthropic/claude-3.5-sonnet',
name: 'Claude 3.5 Sonnet',
settings: { temperature: 0, max_tokens: 1000 }
},
{
id: 'openai/gpt-4o', // Add your model
name: 'GPT-4o',
settings: { temperature: 0, max_tokens: 1000 }
}
];- Only placed on walkable tiles (
.) - Minimum separation distance ensures meaningful paths:
- Small maps (5x5): minimum 2 cells apart
- Larger maps: minimum ~1/3 of grid size apart
- Up to 1000 attempts to find well-separated positions
- Automatically added after inner grid generation
- A* pathfinding runs on the full grid (with outer walls)
- Inner dimensions + 2 = final grid dimensions
- Every generated map is validated with A*
- Maps without valid paths are discarded and regenerated
- Ensures 100% solvability of output dataset
- Recursive Backtracker & Prim's: Perfect mazes by design, additional connectivity pass adds strategic loops
- Cellular Automata: Flood fill ensures single connected component, isolated areas converted to walls
- Small maps (5x5): ~1-2ms per map
- Medium maps (10x10): ~5-10ms per map
- Large maps (20x20): ~20-50ms per map
Total generation time for default configuration (990 maps): ~1-5 minutes
The project uses strict TypeScript with:
- Strict null checks
- No unchecked indexed access
- Comprehensive type definitions in
types.ts
import { generateRecursiveBacktrackerMaze } from './src/mapGenerator/recursiveBacktracker';
import { addOuterWalls, placePandD, mapToString } from './src/utils/gridOperations';
import { findPath } from './src/pathfinding/astar';
const innerGrid = generateRecursiveBacktrackerMaze(10, 10);
const withWalls = addOuterWalls(innerGrid);
const result = placePandD(withWalls);
if (result) {
const path = findPath(result.grid, result.start, result.end);
console.log(mapToString(result.grid));
console.log('Path:', path);
}- Heuristic: Manhattan distance (optimal for 4-directional grid movement)
- Movement: 4-directional (UP, DOWN, LEFT, RIGHT) - no diagonals
- Cost: Uniform cost of 1 per move
- Optimality: Guaranteed to find shortest path
- Initialization: 45% wall density
- Rule: Cell becomes wall if β₯5 neighbors are walls, otherwise floor
- Iterations: Adaptive based on grid size (typically 5-8)
- Post-processing: Flood fill + connectivity check + smoothing
This is a benchmark generation tool. Contributions welcome for:
- Additional maze generation algorithms
- Performance optimizations
- Extended validation checks
- Output format variations
MIT License - see LICENSE file for details
Built with:
- Bun - Fast JavaScript runtime
- TypeScript - Type-safe development
- Classic maze generation algorithms
Generated with β€οΈ for LLM training and benchmarking