Skip to content
Implementation of Deepmind's Neural Episodic Control
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
__pycache__
roms
DQNAgent.py
NECAgent.py
README.md
Roll-a-ball-RL.7z
environment.py
knn_dictionary.py
main.py
networks.py
ops.py
run_experiments.py
training.py
unity_env.py

README.md

Neural Episodic Control

This is my attempt at replicating DeepMind's Neural Episodic Control agent. It is currently set up for running with the ALE, but can easily be adapted for other environments (you may want to use my older implementation here as a reference).

To run the code (after installing all dependencies:

python main.py --rom [path/to/rom/file.bin]

Further options can be found using:

python main.py -h

There is currently only training, without any testing and saving or loading. Scores are reported per episode, which is once per life.

N.B: There are a number of differences between this implementation and the original paper:

  • New elements are checked against previous elements by looking to see if they are closer than a certain threshold. In the NEC paper apparently this is done instead by storing a hash of the game screen and checking for exact matches.
    • UPDATE: Elements are now not checked against previous elements in the dict, i.e. existing elements are not overwritten.
  • Existing elements of the dict are not updated by backpropagation, only the embedding network is.
  • The way the environment handles new starts is slightly different.
  • Various hyperparams may be slightly different.

Many thanks to all the authors whose code I've shamelessly ripped off, e.g. the knn-dictionary code and the environment wrapper (even though now they are probably unrecognisable). If you have a separate working implementation of NEC, I'd love to swap notes to see if I've made any errors or there are any good efficiency savings. Also, if you spot any (inevitable) bugs, please let me know.

Dependencies

You'll have to look up how to install these, but this project uses the following libraries:

  • numpy
  • scikit-learn (can be commented out)
  • annoy
  • tensorflow >1.0
  • OpenCV2 (only used in the preprocessors, could be replaced with a different library)
  • OpenAI Gym (if using gym)
  • https://github.com/mgbellemare/Arcade-Learning-Environment (if using ALE, you'll also need to grab any roms you need.)
  • tqdm

Running the Unity demo

This is to train an agent to play the Roll-a-ball game from the Unity tutorial. Video of agent here: https://www.youtube.com/watch?v=6O93BOMFdUI

  • Install the Unity Python socket API library found here: https://github.com/chetan51/unity-api
  • Unzip the Unity project
  • Run the proxy-server/run.py server from the python library
  • Compile and run the main scene from the Unity project (or just run the binary)
  • Run the agent with python main.py --unity_test 1

The Unity engine and the agent communicate by sending information to and from the running server. Observations from the engine are given in the form of list of relevant objects in the scene. Each object is turned into a feature vector encoding object class, position, and velocity.

N.B: There is a bug in the environment code which means that sometimes the environment doesn't reset properly. This shouldn't affect agent performance, but means the number of episodes is incorrectly reported.

TODO list:

Technical improvements:

  • Implement a better approximate KNN algorithm
    • Done!
  • Add support for other environments (and alternative models)
    • In progress, almost done!
    • Done!
  • Merge history handling with saved trajectories and replay memory to save memory
    • Done!
  • Replace saved trajectories (as list) with a trajectories class which also handles computing returns.
  • Add saving and loading capabilities to model+dictionary (this might include partially implementing the DND in tensorflow)
    • Done!

Experiments:

  • Decay old elements in the dictionary to simulate alpha-updates
    • Implemented and tested basic version
  • Devise a way to combine the DND with a DQN where the DQN takes over in the long term
  • Test with optimistically weighted value estimates as in Particle Value Functions
  • Test with a count-based exploration module
You can’t perform that action at this time.