GitHub - agakshat/maddpg: Implementation of Multi-Agent Deep Deterministic Policy Gradients

Note: Not Finished.

Implementation of multi-agent deep deterministic policy gradients.

It's been tested with the simple tag environment in the multiagent-particle-envs repo released by OpenAI, however that version does not have bounds on the environment and has not implemented a Done callback which means that each episode goes to 1000 steps even if the agents have all gone out of bound - which keeps happening and (in my opinion) slows down training. I have put in that done callback function (in the simple tag envt only - though doing it for others should be pretty easy). Please install my fork of the multiagent-particle-envs repository to use this repository properly. Main Requirements:

Tensorflow
Keras
agakshat/multiagent-particle-envs
numpy

How to use:

git clone this repo
Make sure you have the multiagent-particle-envs repo is installed, which means that import make_env in Python 3 should be working.
Go into the maddpg directory here and run python3 multiagent.py. Should run straight out of the box.

Code Breakdown:

training-code.py is the entry code which takes in user arguments for learning rates, episode length, discount factor etc, creates the actor and critic networks for each agent and calls the training function.
Train.py implements the actual MADDPG algorithm
actorcriticv2.py defines the Actor and Critic network classes
ReplayMemory.py defines the Replay Memory class
ExplorationNoise.py defines the Ornstein-Uhlenbeck Action Noise that has been used for exploration. I'm not sure if this is the right noise generation process that should be used.

To-Do

Instead of having a different policy for each agent, have one policy per team for the simple_tag environment, might be easier to learn. If anyone does this, please let me know of the results you got!
Change the noise process from Ornstein-Uhlenbeck to something like epsilon-greedy, or something more suitable to this domain (since the OU Noise is well-suited for continuous control problems like CartPole, and not this). Again, if you do this, please let me know of the results!

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.gitignore		.gitignore
ExplorationNoise.py		ExplorationNoise.py
README.md		README.md
ReplayMemory.py		ReplayMemory.py
Train.py		Train.py
act.py		act.py
actorcriticv2.py		actorcriticv2.py
training-code.py		training-code.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

ExplorationNoise.py

ExplorationNoise.py

README.md

README.md

ReplayMemory.py

ReplayMemory.py

Train.py

Train.py

act.py

act.py

actorcriticv2.py

actorcriticv2.py

training-code.py

training-code.py

Repository files navigation

Note: Not Finished.

About

Releases

Packages

Languages

agakshat/maddpg

Folders and files

Latest commit

History

Repository files navigation

Note: Not Finished.

About

Resources

Stars

Watchers

Forks

Languages