This repository contains the solution to the third assignment of the course Reinforcement Learning from Leiden Univeristy. It contains a framework for experimenting with different policy-based reinforcement learning techniques on OpenAI's Gym environments. In our specific case we succesfully applied REINFORCE and Actor-Critic algorithms (in various configurations) to the "CartPole-V1" environment. For a detailed description of the methodologies used and the experiments carried out, please refer to the full report.
Dimitrios Ieronymakis, Jonathan Collu and Riccardo Majellaro
To run the available scripts, a Python 3
environment is required, together with the packages specified in the requirements.txt
file, in the main directory. In order to install the requirements, run the following command on the Python 3
environment:
pip install -r requirements.txt
All the experiments presented in the report are fully reproducible by running the following command from the main folder of the repository:
./experiment/basic_exps.sh
Remember to change the script permissions in order to make it executable as a program.
Note: Windows users can convert the above file to a .bat
by simply removing the shebang (first line), comments, converting "\" to "^" and deleting ";".
python experiment.py
from the main directory, along with the following possible arguments:
-run_name
: name of your choice for the configuration.-device
: where to execute the computations (e.g. "cpu" or "cuda").-optimizer
: choose an optimizer between "adam", "sgd" and "rms" for the policy net.-optim_lr
: learning rate of the optimizer.-optimizer_v
: choose an optimizer between "adam", "sgd" and "rms" for the value net.-optim_lr_v
: learning rate of the optimizer_v.-quantum
: use a quantum layer as an output layer for the policy network.-alg
: choose between "reinforce" and "AC_bootstrap".-epochs
: number of epochs (i.e. updates).-traces
: number of traces per epoch (averaged in a single update).-trace_len
: length of a trace.-n
: number of steps for bootstrapping.-gamma
: discount factor.-baseline
: to use baseline subtraction.-entropy
: to use entropy regularization.-entropy_factor
: entropy regularization factor.-use_es
: set to 0 or 1 to use evolutionary strategies as described in the report.
Run the command below from the main directory
python evaluate.py
along with the following arguments:
-run_name
, name of your choice for the configuration.-render
, to visualize the environment.-device
, to indicate where to execute the computations (e.g. "cpu" or "cuda").-quantum
, to use a quantum layer as an output layer for the policy network.