Official code repository for SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control.
SAVGO studies continuous-control reinforcement learning through the geometry of state-action values. Instead of relying only on scalar value estimates, the method learns a representation space where cosine similarity captures useful structure among state-action pairs and supports policy improvement. This repository contains the implementation used for the paper, including the main SAVGO/DistanceRL agent, ablations, representation models, and Stable-Baselines3 PPO/TD3/SAC/TQC comparison baselines.
Create a Python environment, then install the dependencies:
pip install -r requirements.txtFor GPU runs, install a PyTorch build that matches your CUDA version before running experiments.
Train DistanceRL on a Box2D task:
python main.py --algo DistAgent --env-id LunarLanderContinuous-v3 --device cudaTrain DistanceRL on a MuJoCo task:
python main.py --algo DistAgent --env-id HalfCheetah-v5 --device cuda --total-steps 1000000Run a Stable-Baselines3 baseline:
python main.py --algo sac --env-id HalfCheetah-v5 --device cuda --total-steps 1000000Use CPU instead of CUDA:
python main.py --algo DistAgent --env-id LunarLanderContinuous-v3 --device cpuOptional Weights & Biases logging:
python main.py --algo DistAgent --env-id HalfCheetah-v5 --log_to_wandb --project_name DistRLModel checkpoints are written to saved_models/, and logs are written to logs/.
If you use this code, please cite the accompanying publication.
@misc{orfanoudakis2026savgolearningstateactionvalue,
title={SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control},
author={Stavros Orfanoudakis and Pedro P. Vergara},
year={2026},
eprint={2605.00787},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2605.00787},
}