Skip to content

StavrosOrf/DistanceRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

173 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning State-Action Value Geometry with Cosine Similarity for Continuous Control

Official code repository for SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control.

Arxiv Link of the Paper

SAVGO studies continuous-control reinforcement learning through the geometry of state-action values. Instead of relying only on scalar value estimates, the method learns a representation space where cosine similarity captures useful structure among state-action pairs and supports policy improvement. This repository contains the implementation used for the paper, including the main SAVGO/DistanceRL agent, ablations, representation models, and Stable-Baselines3 PPO/TD3/SAC/TQC comparison baselines.

image

Installation

Create a Python environment, then install the dependencies:

pip install -r requirements.txt

For GPU runs, install a PyTorch build that matches your CUDA version before running experiments.

Run

Train DistanceRL on a Box2D task:

python main.py --algo DistAgent --env-id LunarLanderContinuous-v3 --device cuda

Train DistanceRL on a MuJoCo task:

python main.py --algo DistAgent --env-id HalfCheetah-v5 --device cuda --total-steps 1000000

Run a Stable-Baselines3 baseline:

python main.py --algo sac --env-id HalfCheetah-v5 --device cuda --total-steps 1000000

Use CPU instead of CUDA:

python main.py --algo DistAgent --env-id LunarLanderContinuous-v3 --device cpu

Optional Weights & Biases logging:

python main.py --algo DistAgent --env-id HalfCheetah-v5 --log_to_wandb --project_name DistRL

Model checkpoints are written to saved_models/, and logs are written to logs/.

Citation

If you use this code, please cite the accompanying publication.

@misc{orfanoudakis2026savgolearningstateactionvalue,
      title={SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control}, 
      author={Stavros Orfanoudakis and Pedro P. Vergara},
      year={2026},
      eprint={2605.00787},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2605.00787}, 
}

About

Experimenting with State-Action Distance RL

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages