A reinforcement learning racing game where an AI agent learns to drive around tile-based race tracks using PPO (Proximal Policy Optimization). Built with Pygame and Stable-Baselines3. Includes 10 themed maps, parallel training, and manual play mode.
# Install dependencies
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # Linux/Mac
pip install pygame gymnasium stable-baselines3 numpy
# Generate tilesets (required first time)
python generate_tiles.py
# Play manually
python main.py --map 1
# Train a model (runs until Ctrl+C)
python racerlearn.py --map 0 --envs 4
# Watch a trained model
python loadracerai.py --model models/<timestamp>/50000.zip --map 2Drive with WASD keys on any map.
python main.py --map 3
python main.py --map 0 # random map| Argument | Default | Description |
|---|---|---|
--map |
1 |
Map number (1-10), or 0 for random |
Trains a PPO model with checkpoints. Runs indefinitely until Ctrl+C (auto-saves on exit). Supports parallel environments and resuming from checkpoints.
# Train on random maps with 4 parallel cars
python racerlearn.py --map 0 --envs 4
# Train for exactly 500k steps
python racerlearn.py --map 0 --envs 4 --timesteps 500000
# Resume training from a checkpoint
python racerlearn.py --map 0 --envs 4 --resume models/1743285600/50000.zip| Argument | Default | Description |
|---|---|---|
--map |
1 |
Map number (1-10), or 0 for random maps each episode |
--envs |
1 |
Number of parallel environments |
--timesteps |
infinite | Total timesteps (omit to run until Ctrl+C) |
--save-interval |
10000 |
Save checkpoint every N timesteps |
--resume |
— | Path to model .zip to continue training from |
Models save to models/<timestamp>/ with checkpoints at each interval. When using --map 0, each episode picks a random map with 50% chance of flipped direction (clockwise/counter-clockwise) for better generalization.
python loadracerai.py --model models/1743285600/50000.zip --map 2
python loadracerai.py --model models/1743285600/50000.zip --map 0 # random maps| Argument | Default | Description |
|---|---|---|
--model |
required | Path to model .zip file |
--map |
1 |
Map number (1-10), or 0 for random |
--episodes |
infinite | Number of episodes to run |
Edit maps interactively and save them back to maps.py.
python mapeditor.py| Key | Action |
|---|---|
0-7 |
Place basic tiles (grass, straights, 1x1 corners, start lines) |
D / F / G / H |
Place diagonal connectors |
Q / W / A / S |
Place wide 2x2 corners (TL / TR / BL / BR) |
| Arrow keys | Switch between maps |
Ctrl+S |
Save changes to maps.py |
Run after modifying tile themes or tile generation code. Outputs 10 PNG tilesets to tilesets/.
python generate_tiles.py10 maps with unique themes and layouts. All maps are 8x14 tiles (1792x1024 pixels).
| # | Name | Style | Layout |
|---|---|---|---|
| 1 | Classic Circuit | Red curbs, green grass | Offset rectangle with chicanes |
| 2 | Desert Sprint | Orange curbs, sandy terrain | S-shaped with elevation changes |
| 3 | Night Circuit | Yellow curbs, dark surroundings | Compact winding path |
| 4 | Snow Drift | Blue curbs, snowy terrain | 3-pass zigzag snake |
| 5 | Sunset Speedway | Orange-red curbs, warm tones | Stepped S-shape |
| 6 | Neon Circuit | Cyan curbs, dark background | Converging funnel |
| 7 | Forest Trail | Brown curbs, deep green | Figure-8 dual loop |
| 8 | Urban Circuit | Yellow curbs, concrete gray | Internal corridor maze |
| 9 | Tropical Lagoon | Coral curbs, bright green | Hook/G-shape with detour |
| 10 | Volcanic Run | Orange curbs, dark terrain | Winding asymmetric path |
The car has 7 distance probes that shoot outward at different angles (0, +/-20, +/-45, +/-90 degrees from forward). Each probe extends until it hits a boundary color, returning the distance. The observation space is:
- Car momentum (1 value)
- 7 probe distances
- Last 100 actions taken
Total: 108-dimensional observation vector.
Tracks are built from a 13-tile vocabulary on a grid:
| Tiles | Type |
|---|---|
| 0 | Grass |
| 1-2 | Horizontal/vertical straights |
| 3-6 | Quarter-circle curves (4 directions) |
| 7-8 | Start/finish lines (horizontal/vertical) |
| 9-12 | Diagonal connections (4 directions) |
Each tile is 128x128 pixels with an 80px road band and 8px boundary strips. Probes detect boundary colors to know where the track edges are — each map theme uses a unique boundary color.
Per timestep:
- +1 base reward
- -1 for each probe detecting a boundary within 20px
- +2x speed bonus (up to 4) when no probes are close to boundaries
- -10,000 on crash (any probe < 5px from boundary)
Momentum-based acceleration with rotation speed proportional to the 4th root of velocity. The car decelerates gradually when no input is given.
Key constants in config.py:
| Constant | Value | Description |
|---|---|---|
WIN_WIDTH |
1792 | Window width (14 tiles x 128px) |
WIN_HEIGHT |
1024 | Window height (8 tiles x 128px) |
TILESIZE |
128 | Tile size in pixels |
PLAYERSIZE |
46 | Car sprite size in pixels |
FPS |
60 | Game framerate |
P_MAX_ACC |
0.5 | Maximum acceleration |
P_ROT_ACC |
2 | Rotation acceleration factor |
--map 0in training randomizes the map each episode AND has 50% chance to flip start direction by 180 degrees. This is intentional for generalization.- Boundary width matters — boundary strips are 8px wide and the probe coarse step is 5px. Making boundaries thinner risks probes skipping over them.
- Old models won't load if the observation space changed. Use
custom_objects={"observation_space": env.observation_space}inPPO.load()(already handled inloadracerai.py). - Parallel training (
--envs > 1) opens multiple pygame windows. For faster headless training, a render toggle would need to be added. racerlearncont.pyis legacy — useracerlearn.py --resumeinstead.
- Pick a theme — add colors to
THEMESdict ingenerate_tiles.py - Run
python generate_tiles.pyto create the tileset PNG - Add a map entry to
MAPSdict inmaps.pywith grid, boundary colors, start position, start rotation, and tileset path - Grid must form a closed loop — trace tile connections to verify before testing
start_rotation: 0=up, 90=left, 180=down, 270=right
| Package | Purpose |
|---|---|
pygame |
Game rendering and input |
gymnasium |
RL environment API |
stable-baselines3 |
PPO algorithm and training |
numpy |
Numerical operations |