This benchmark folder contains the implementation of offline reinforcement learning algorithms on Variable Speed Limit (VSL) data. The benchmark includes multiple state-of-the-art offline RL algorithms and provides data and tools for training. The evaluation is conducted in TransModeler, a commercial microsimulation software.
i24-vsl-orl/
├── code/ # Source code
│ ├── algorithm/ # Offline RL algorithm implementations
│ │ ├── BC.py # Behavior Cloning
│ │ ├── BCQ.py # Batch Constrained Q-learning
│ │ ├── CQL.py # Conservative Q-learning
│ │ ├── IQL.py # Implicit Q-learning
│ │ └── TD3_BC.py # TD3 + Behavior Cloning
│ ├── training.py # Main training script
│ └── utils.py # Utility functions and ReplayBuffer
├── dataset/ # Preprocessed datasets
│ ├── expert_small.pkl # Small expert dataset
│ ├── expert_medium.pkl # Medium expert dataset
│ ├── expert_large.pkl # Large expert dataset
│ ├── mixed_small.pkl # Small mixed dataset
│ ├── mixed_medium.pkl # Medium mixed dataset
│ └── mixed_large.pkl # Large mixed dataset
├── saved_models/ # Trained model checkpoints
│ └── training_with_*/ # Models organized by dataset size
│ └── run_*/ # Multiple training runs
└── README.md # This file
The complete VSL operation dataset is available on Zenodo: Offline Reinforcement Learning Dataset from a Field Deployed Variable Speed Limit Control System (DOI: 10.5281/zenodo.16376854). This massive dataset contains ~100 million data samples from 18 months of continuous VSL operation data on Interstate 24 (I-24) with 67 VSL controllers operating on both directions of the freeway, covering the period from March 9, 2024, to September 9, 2025. The dataset has been converted to offline RL format for research purposes.
The benchmark includes three preprocessed dataset sizes for different experimental needs:
Dataset | Size | Description |
---|---|---|
expert_small.pkl |
~5K samples | Small expert dataset for quick testing |
expert_medium.pkl |
~10K samples | Medium expert dataset for moderate experiments |
expert_large.pkl |
~100K samples | Large expert dataset for full evaluation |
mixed_small.pkl |
~5K samples | Small mixed dataset (expert + suboptimal) |
mixed_medium.pkl |
~10K samples | Medium mixed dataset |
mixed_large.pkl |
~100K samples | Large mixed dataset |
Each dataset contains:
- States: 5-dimensional state vectors
- Actions: Discrete actions (5 possible values)
- Rewards: Scalar reward values
- Next States: 5-dimensional next state vectors
- Python 3.7+
- PyTorch
- NumPy
- Pickle (built-in)
-
Navigate to the project root directory:
cd /path/to/i24-vsl-orl
-
Install required dependencies:
pip install torch numpy
The main training script supports command-line arguments for easy configuration:
python code/training.py [OPTIONS]
--dataset_size {small,medium,large}
: Choose dataset size (default: large)--max_training_iters INT
: Number of training iterations (default: 100000)--seed INT
: Random seed for reproducibility (default: 123)
# Train with small dataset, 5000 iterations
python code/training.py --dataset_size small --max_training_iters 5000
# Train with medium dataset, custom seed
python code/training.py --dataset_size medium --seed 456
# Train with large dataset (default settings)
python code/training.py
Trained models are automatically saved to:
saved_models/training_with_{dataset_size}/run_{i}/
├── BC_expert_{iterations}.pth
├── BC_mixed_{iterations}.pth
├── BCQ_expert_{iterations}.pth
├── BCQ_mixed_{iterations}.pth
├── CQL_expert_{iterations}.pth
├── CQL_mixed_{iterations}.pth
├── IQL_expert_{iterations}.pth
├── IQL_mixed_{iterations}.pth
├── TD3_BC_expert_{iterations}.pth
└── TD3_BC_mixed_{iterations}.pth
- All algorithms use the same neural network architecture for fair comparison
- Training is performed offline using pre-collected datasets
- Models are saved automatically after training completion