Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



8 Commits

Repository files navigation


This repository contains an introductory course to Reinforcement Learning (RL) with hands-on classic examples of agents trained on gym environments. We start with Dynamic Programming algorithms: Value Iteration, Q-Iteration and Policy Iteration which we use to train an agent on the FrozenLake environment, then we we move on to Q-Learning and the cartpole environment; for Deep RL, we implement DQN and train an agent on the LunarLander environment. I will provide notes explaining the motivations, details, advantages and limitations of each method, along with documented python scripts, for Deep RL, I will be using pytorch. This is an ongoing project and I will include many more algorithms such as Reinforce and Actor-Critic variants...

Dynamic Programming

Use file to train a Dynamic Programming agent on the FrozenLake environment, argument algorithm specifies which algorithm to use between value_iteration, q_iteration and policy_oteration. Example:

cd dynamic-programming
python --map_name 4x4 --algorithm policy_iteration


Use file to train a Q-Learning agent on the cartpole environment, arguments n_bins and n_initialise are very important as they initialise the bins that will be used to discretise the state space. Example:

cd q-learning
python --n_train 20000


Use file to train a DQN agent on the LunarLander environment. Example:

cd dqn
python --n_train 1000

For Q-Learning and DQN, there is an argument log_dir that specifies the name of a folder where tensorboard events will be stored. To use tensorboard, let us suppose that we specified --log_dir runs_agent, then we can track the evolution of some variables during training by entering:

tensorboard --logdir runs_agent

A message will then be displayed to describe how to open the localhost to visualise the tracked variables.