GitHub

Online Episodic Convex Reinforcement Learning

Code to reproduce the results of the paper "Online Episodic Convex Reinforcement Learning".

To run an experiment save all files to a folder, install the requirements in requirements.txt and execute

python main.py --reward_type 'multi_objectives' --n_iterations 50 --samples 1

Variables:

reward_type can be 'multi_objectives' to run the multi-objective problem or 'constrained' to run the constrained problem;
n_iterations indicates the number of iterations;
samples indicates the number of repetitions per experiment.

The code runs the experiments for Greedy MD-CURL and Bonus O-MD-CURL. It creates a results folder containing state-action distribution images, log-loss and regret plots.

The optimal folder contains the values of the loss function for an approximate optimal policy to be used when calculating regret. These values are calculated using MD-CURL, an off-line method for CURL (using the true probability transition) from [Moreno et al. (2024)], over 2000 iterations.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
optimal		optimal
FourRooms.py		FourRooms.py
MiniGrid.py		MiniGrid.py
README.md		README.md
main.py		main.py
online_mirror_descent.py		online_mirror_descent.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Online Episodic Convex Reinforcement Learning

About

Uh oh!

Releases

Packages

Languages

biancammoreno/Convex_RL

Folders and files

Latest commit

History

Repository files navigation

Online Episodic Convex Reinforcement Learning

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages