Deep Reinforcement Learning
Different algorithms have currently been implemented:
- Advantage Actor Critic
- Asynchronous Advantage Actor Critic (A3C)
- Deep Deterministic Policy Gradient (DDPG)
- Proximal Policy Optimization (PPO)
- Distributed Policy Optimization (DPPO)
- Trust Region Policy Optimization (TRPO)
- Distributed Trust Region Policy Optimization (DTRPO)
- REINFORCE (convolutional neural network part has not been tested yet)
- Cross-Entropy Method
- Sarsa with with function approximation and eligibility traces
- Karpathy's policy gradient algorithm (version using convolutional neural networks has not been tested yet)
- (Sequential) knowledge transfer
- Asynchronous knowledge transfer
Asynchronous Advantage Actor Critic (A3C)
The code for this algorithm can be found here. Example run after training using 16 threads for a total of 5 million timesteps on the PongDeterministic-v4 environment:
How to run
First, install the requirements using pip (you can first remove OpenCV from the
requirements.txt file if it is already installed):
pip install -r requirements.txt
You can run algorithms by passing the path to an experiment specification (which is a file in json format) to
python main.py <path_to_experiment_specification>
Examples of experiment specifications can be found in the experiment_specs folder.
Statistics can be plot using:
python misc/plot_statistics.py <path_to_stats>
<path_to_stats> can be one of 2 things:
- A json file generated using
gym.wrappers.Monitor, in case it plots the episode lengths and total reward per episode.
- A directory containing TensorFlow scalar summaries for different tasks, in which case all of the found scalars are plot.
Help about other arguments (e.g. for using smoothing) can be found by executing
python misc/plot_statistics.py -h.
Alternatively, it is also possible to use Tensorboard to show statistics in the browser by passing the directory with the scalar summaries as