Skip to content

Fallomai/mazesolver

Repository files navigation

πŸ—ΊοΈ LLM Maze Benchmark Data Generator

A TypeScript-based maze generation system that creates diverse, solvable mazes with optimal A* pathfinding solutions, designed specifically for LLM fine-tuning and benchmarking.

πŸ“‹ Overview

This project generates mazes of varying sizes (5x5, 10x10, 20x20) using three distinct algorithms, ensuring structural diversity and algorithmic variety. Each generated maze is guaranteed to be solvable and includes the optimal path from start (P) to destination (D).

🎯 Features

  • Three Distinct Generation Algorithms:

    • Recursive Backtracker (DFS-based): Long, winding corridors with many dead ends
    • Prim's Algorithm (MST-based): Blockier, denser feel with room-like areas
    • Cellular Automata: Organic, cave-like structures with irregular walls
  • Robust A Pathfinding*: Finds optimal solutions using Manhattan distance heuristic

  • Quality Guarantees:

    • All generated mazes are solvable
    • Start (P) and destination (D) are well-separated
    • Single connected component (no isolated areas)
  • Dual Output Formats:

    • Fine-tuning dataset (JSONL)
    • Benchmark dataset (JSON with metadata)

πŸš€ Quick Start

Installation

bun install

Generate Mazes

bun run generate
# or
bun start

This will generate:

  • data/finetuning_dataset.jsonl: 900 training samples (100 per algorithm per size)
  • data/benchmark_maps.json: 90 benchmark samples (10 per algorithm per size)

Visualize Mazes

bun run visualize
# or open visualizer.html in your browser

Opens an interactive web-based visualizer to inspect generated mazes with:

  • Visual grid display with color-coded cells (walls, empty, start, destination, path)
  • Path visualization - show/hide optimal path, step through it move-by-move
  • Filtering - view specific algorithms or sizes
  • Navigation - Previous/Next buttons or keyboard arrows (←/β†’)
  • Random selection - Jump to a random maze for spot-checking
  • Keyboard shortcuts - Arrow keys to navigate, Space to toggle path

Visualizer Features:

  • 🎨 Beautiful, responsive UI with color-coded maze elements
  • πŸ“Š Displays map metadata (ID, algorithm, size, path length)
  • πŸ”„ Easy navigation between all 90 benchmark maps
  • 🎲 Random button for quick spot-checking
  • ⏭️ Step-by-step path visualization to trace the solution
  • πŸ“± Mobile-friendly responsive design

Run LLM Benchmark

Test how well different LLMs can solve the generated mazes:

# 1. Set up your OpenRouter API key
echo "OPENROUTER_API_KEY=your_key_here" > .env.local

# 2. Run the benchmark
bun run benchmark

Features:

  • πŸ€– Test multiple models in parallel (Claude, GPT-4, etc.)
  • πŸ“Š Automatic tracking of accuracy, tokens, cost, and latency
  • πŸ’Ύ Results saved as JSON and CSV for analysis
  • ⚑ Configurable parallelism for fast execution
  • πŸ”„ Progress reporting in real-time

Results include:

  • Accuracy percentage per model
  • Average response time
  • Total tokens used and estimated cost
  • Detailed per-test breakdown in CSV format

πŸ“ Project Structure

mazesolver/
β”œβ”€β”€ lib/                                  # πŸ“¦ Reusable libraries
β”‚   └── llm-benchmark-runner/             # LLM benchmark library (can be extracted)
β”‚       β”œβ”€β”€ src/
β”‚       β”‚   β”œβ”€β”€ index.ts                  # Public API exports
β”‚       β”‚   β”œβ”€β”€ types.ts                  # Core types and interfaces
β”‚       β”‚   β”œβ”€β”€ openrouter.ts             # OpenRouter API client
β”‚       β”‚   └── runner.ts                 # Parallel benchmark runner
β”‚       β”œβ”€β”€ package.json                  # Library package config
β”‚       └── README.md                     # Library documentation
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ types.ts                          # Type definitions
β”‚   β”œβ”€β”€ index.ts                          # Main generation script
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   └── gridOperations.ts             # Grid utilities (walls, P/D placement)
β”‚   β”œβ”€β”€ pathfinding/
β”‚   β”‚   └── astar.ts                      # A* pathfinding algorithm
β”‚   β”œβ”€β”€ mapGenerator/
β”‚   β”‚   β”œβ”€β”€ recursiveBacktracker.ts       # DFS-based maze generator
β”‚   β”‚   β”œβ”€β”€ prim.ts                       # Prim's algorithm generator
β”‚   β”‚   └── cellularAutomata.ts           # Cellular automata generator
β”‚   └── benchmark/                        # Maze benchmarking
β”‚       β”œβ”€β”€ mazeSolver.ts                 # Maze-specific logic
β”‚       └── run.ts                        # Main benchmark script
β”œβ”€β”€ data/                                 # Generated datasets
β”œβ”€β”€ results/                              # Benchmark results
β”œβ”€β”€ visualizer.html                       # Interactive maze viewer
β”œβ”€β”€ serve-visualizer.ts                   # Local web server for visualizer
└── package.json

🧩 Map Representation

Map Elements

  • #: Wall (impassable)
  • P: Player start position
  • D: Destination (goal)
  • .: Walkable tile (empty space)

Example 5x5 Map

#######
#P....#
#.###.#
#.#...#
#.#.#.#
#...#D#
#######

Note: Map dimensions (5x5, 10x10, 20x20) refer to the inner walkable grid, excluding the outer perimeter walls. A 5x5 inner grid becomes a 7x7 grid with walls.

🎲 Generation Algorithms

1. Recursive Backtracker (DFS-based)

Description: Creates "perfect" mazes with no loops or isolated sections. Each pair of cells has exactly one path between them.

Characteristics:

  • Long, winding corridors
  • Many dead ends
  • Spindly, organic branching
  • High path complexity

Use Case: Testing LLM's ability to handle complex, maze-like navigation with many decision points.

2. Prim's Algorithm (MST-based)

Description: Also creates perfect mazes but with a different structural feel.

Characteristics:

  • Blockier, denser layout
  • More interconnected "room-like" areas
  • Wider passages
  • Bulbous cavern structures

Use Case: Testing LLM's spatial reasoning with more open, room-based layouts.

3. Cellular Automata

Description: Generates organic, natural-looking cave systems using Game of Life-like rules.

Algorithm:

  1. Initialize with random wall distribution (~45%)
  2. Apply cellular automata rules for 5-8 iterations
  3. Ensure single connected component via flood fill
  4. Smooth edges with additional iterations

Characteristics:

  • Organic, irregular shapes
  • Natural cavern feel
  • Open areas with scattered obstacles
  • Less predictable structure

Use Case: Testing LLM's adaptability to non-traditional, organic layouts.

πŸ“Š Output Formats

Fine-tuning Dataset (finetuning_dataset.jsonl)

Each line is a JSON object:

{"input": "Map:\n#######\n#P....#\n#.....#\n#.....#\n#....D#\n#######", "output": "DOWN,DOWN,DOWN,RIGHT,RIGHT,RIGHT,RIGHT"}

Benchmark Dataset (benchmark_maps.json)

JSON array of objects with metadata:

[
  {
    "id": "rb_5x5_bm_001",
    "algorithm": "recursiveBacktracker",
    "size": "5x5",
    "map": "#######\n#P....#\n#.....#\n#.....#\n#....D#\n#######",
    "optimalPath": ["DOWN", "DOWN", "DOWN", "RIGHT", "RIGHT", "RIGHT", "RIGHT"]
  }
]

πŸ”§ Configuration

Edit src/index.ts to customize generation:

const CONFIG = {
  numMapsPerAlgorithmPerSize: {
    finetuning: 100,  // Maps per algorithm per size for training
    benchmark: 10     // Maps per algorithm per size for testing
  },
  mapSizes: [5, 10, 20],  // Inner dimensions
  algorithms: ['recursiveBacktracker', 'prim', 'cellularAutomata']
};

πŸ§ͺ Benchmark Framework

The benchmark system uses @fallom/llm-benchmark-runner - a reusable library in lib/llm-benchmark-runner/.

Library Features (in lib/llm-benchmark-runner/):

  • βœ… OpenRouter API integration with automatic cost/token tracking
  • βœ… Parallel execution with configurable concurrency
  • βœ… Progress reporting and error handling
  • βœ… Flexible output parsing and evaluation
  • βœ… Results export to JSON and CSV
  • βœ… Ready to extract as a standalone npm package

Maze-Specific Logic (in src/benchmark/):

  • Output parsing for direction sequences
  • Path evaluation (exact match, length comparison)
  • Prompt formatting for maze problems

The library is structured to be easily extracted and published. See lib/llm-benchmark-runner/README.md for library documentation.

Example: Adding a new model

Edit src/benchmark/run.ts:

const models: ModelConfig[] = [
  {
    id: 'anthropic/claude-3.5-sonnet',
    name: 'Claude 3.5 Sonnet',
    settings: { temperature: 0, max_tokens: 1000 }
  },
  {
    id: 'openai/gpt-4o',  // Add your model
    name: 'GPT-4o',
    settings: { temperature: 0, max_tokens: 1000 }
  }
];

🎯 Key Implementation Details

P & D Placement

  • Only placed on walkable tiles (.)
  • Minimum separation distance ensures meaningful paths:
    • Small maps (5x5): minimum 2 cells apart
    • Larger maps: minimum ~1/3 of grid size apart
  • Up to 1000 attempts to find well-separated positions

Outer Walls

  • Automatically added after inner grid generation
  • A* pathfinding runs on the full grid (with outer walls)
  • Inner dimensions + 2 = final grid dimensions

Validation

  • Every generated map is validated with A*
  • Maps without valid paths are discarded and regenerated
  • Ensures 100% solvability of output dataset

Connectivity

  • Recursive Backtracker & Prim's: Perfect mazes by design, additional connectivity pass adds strategic loops
  • Cellular Automata: Flood fill ensures single connected component, isolated areas converted to walls

πŸ“ˆ Performance

  • Small maps (5x5): ~1-2ms per map
  • Medium maps (10x10): ~5-10ms per map
  • Large maps (20x20): ~20-50ms per map

Total generation time for default configuration (990 maps): ~1-5 minutes

πŸ› οΈ Development

Type Safety

The project uses strict TypeScript with:

  • Strict null checks
  • No unchecked indexed access
  • Comprehensive type definitions in types.ts

Testing a Single Map

import { generateRecursiveBacktrackerMaze } from './src/mapGenerator/recursiveBacktracker';
import { addOuterWalls, placePandD, mapToString } from './src/utils/gridOperations';
import { findPath } from './src/pathfinding/astar';

const innerGrid = generateRecursiveBacktrackerMaze(10, 10);
const withWalls = addOuterWalls(innerGrid);
const result = placePandD(withWalls);

if (result) {
  const path = findPath(result.grid, result.start, result.end);
  console.log(mapToString(result.grid));
  console.log('Path:', path);
}

πŸ“ Algorithm Details

A* Pathfinding

  • Heuristic: Manhattan distance (optimal for 4-directional grid movement)
  • Movement: 4-directional (UP, DOWN, LEFT, RIGHT) - no diagonals
  • Cost: Uniform cost of 1 per move
  • Optimality: Guaranteed to find shortest path

Cellular Automata Rules

  • Initialization: 45% wall density
  • Rule: Cell becomes wall if β‰₯5 neighbors are walls, otherwise floor
  • Iterations: Adaptive based on grid size (typically 5-8)
  • Post-processing: Flood fill + connectivity check + smoothing

🀝 Contributing

This is a benchmark generation tool. Contributions welcome for:

  • Additional maze generation algorithms
  • Performance optimizations
  • Extended validation checks
  • Output format variations

πŸ“„ License

MIT License - see LICENSE file for details

πŸ™ Acknowledgments

Built with:

  • Bun - Fast JavaScript runtime
  • TypeScript - Type-safe development
  • Classic maze generation algorithms

Generated with ❀️ for LLM training and benchmarking

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published