Training neural network agents to play Slay the Spire using supervised learning.
This project uses sts-lightspeed, a high-performance RNG-accurate C++ simulator, to generate training data and evaluate agents. The ultimate goal is to build an agent that can eventually play Ascension 20 Ironclad.
This currently is a supervised learning pipeline that trains neural network agents to play Slay the Spire:
- Generate diverse battle scenarios from real game data
- Run a baseline agent to collect demonstrations
- Extract state-action pairs from battle snapshots
- Train a neural network to imitate the baseline agent's behavior
- Export the model for C++ inference and evaluation
Currently focused on benchmarking different approaches.
- C++17 compiler and CMake 3.10+ (for the sts-lightspeed simulator)
- just (recommended for running commands)
- Python 3.8+ with uv
just buildCreate randomized battle scenarios from real game data:
uv run randomize_scenarios.py --count 20This creates variations of base scenarios by randomizing HP, deck composition, and RNG seeds.
Run SimpleAgent on scenarios and capture battle snapshots:
just run-agent simple --snapshot --scenario=jaw_wormBattle progression data is saved to data/agent_battles/simpleagent/.
Extract state-action pairs from battle snapshots:
cd AutoClad
uv run data_parser.pyCreates jaw_worm_data.npz with feature vectors (game state) and action labels (which card was played).
cd AutoClad
# Train with plotting and early stopping
uv run main.py --plot --early-stopping
# Or with default settings
uv run main.pyThe trained model is exported as jaw_worm_model_traced.pt (TorchScript format) for C++ inference.
# Requires LibTorch
LIBTORCH_PATH=~/Downloads/libtorch just run-agent neural --scenario=jaw_wormWorking:
- Data generation from baseline agent demonstrations
- Neural network training on Jaw Worm encounters
- C++ inference with trained models
In Progress:
- Expanding to more encounter types
- Improving agent performance metrics
- Recording intermediate combat states for richer training data
The end goal is to train an agent that can play the full game at a high level. The approach:
-
Start with battles: If you can predict whether you'll win specific fights, you can make better decisions about everything else (pathing, card choices, shops, etc.)
-
Generate quality data: Available human gameplay data is incomplete and outdated. Instead, use tree search and simpler agents to generate training data.
-
Scale up with RL: Once we have a decent baseline from supervised learning, use reinforcement learning to reach superhuman play.
├── AutoClad/ # Neural network training code (Python)
│ ├── main.py # Training script
│ ├── data_parser.py # Parse battle snapshots to training data
│ └── *.pth, *.pt # Trained models
├── randomize_scenarios.py # Generate randomized battle scenarios
├── battle/
│ ├── generated_scenarios/ # Base scenarios from real games
│ └── randomized_scenarios/ # Training scenario variations
├── data/agent_battles/ # Battle snapshots from agent runs
└── [sts-lightspeed files] # C++ simulator (apps/, src/, include/, etc.)