Add first draft of the P3 README

This is needed for the submission of the final `multi-agent` project.
SwamyDev · Mar 14, 2020 · d78ea9f · d78ea9f
1 parent 824b0ba
commit d78ea9f
Show file tree

Hide file tree

Showing 2 changed files with 33 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -1,10 +1,11 @@
 [![Build Status](https://travis-ci.com/SwamyDev/udacity-deep-rl-navigation.svg?branch=master)](https://travis-ci.com/SwamyDev/udacity-deep-rl-navigation) [![Coverage Status](https://coveralls.io/repos/github/SwamyDev/udacity-deep-rl-navigation/badge.svg?branch=master)](https://coveralls.io/github/SwamyDev/udacity-deep-rl-navigation?branch=master)
 # Udacity Projects
 
-This repository is part of the Udacity Reinforcement Learning Nanodegree. It contains solutions to the courses class projects `navigation` and `continuous control`. You can find more detailed explanations for each project and their environments in their dedicated README or Report files:
+This repository is part of the Udacity Reinforcement Learning Nanodegree. It contains solutions to the courses class projects `navigation`, `continuous control` and `multi-agent`. You can find more detailed explanations for each project and their environments in their dedicated README or Report files:
 
 - [Project Navigation](doc/README_p1_navigation.md)
 - [Project Continuous Control](doc/README_p2_continuous.md) 
+- [Project Mulit-Agent](doc/README_p3_multiagent.md) 
 
 ## Installation
 To run the code of the projects you need to install the repositories virtual environment. To make this as easy as possible I provide a `Makefile` using `GNU Make` to set up virtual environments and download dependencies. It requires a Linux environment. Under Ubuntu make is part of the `build-essential` package (`apt install build-essential`). Other dependencies are python3 virutalenv (`apt install python3-venv`) and pip (`apt install python3-pip`).

diff --git a/doc/README_p3_multiagent.md b/doc/README_p3_multiagent.md
@@ -0,0 +1,31 @@
+# Project: Continuous Control
+
+This project is part of the Udacity Reinforcement Learning Nanodegree. In this project, multiple `DDPG` agents are trained to solve a continuous control task. Specifically, each agent needs to control a tennis racket to pass a ball back and forth, keeping it in game as long as possible. The each agent receives a reward of each time it hits the ball over the net and gets penalized when the ball hits the ground or goes out of bounds. Hence, it is in the interest of both agents to keep the ball in play, making this a cooperative environment. The environment is considered solved, when the maximum score of each agent reaches an average of >0.5 points throughout 100 episodes.
+
+## Environment Setup
+### Reward Signal
+Each agent receives a reward of `0.1` when it hits the ball over the net, but get's a penalty of -0.01 each time the ball hits the ground or goes out of bounds. The goal for both agents is therefore to keep the ball in play as long as possible. 
+
+### Observation
+An observation state for each agent individually consists of the agent's current position and velocity and the position and velocity of the ball. The total observation of both agent is encoded in a 2x24 tensor (stacking the observations of both agents). 
+
+### Actions
+The action each agent can take consists of a 1x2 tensor corresponding to 2 continuous actions: Moving towards or away from the net, and jumping. The action values are normalized to a range between `-1` and `1`.
+
+## Exploring
+To explore the `Tennis_Linux` environment for 100 episodes run the following command from the root of the repository:
+```bash
+udacity-rl -e resources/environments/Tennis_Linux/Tennis.x86_64 explore -n 100
+```
+
+## Training
+To train the `multi-agent` agents run the following command from the root of the repository:
+```bash
+udacity-rl -e resources/environments/Tennis_Linux/Tennis.x86_64 train NDDPG 5000 -c configs/multi_ddpg_ann_a_2x256_c_2x256_1x128-2020-03-07.json 
+```
+
+## Running
+To observe the stored final agent run the following command from the root of the repository:
+```bash
+udacity-rl -e resources/environments/Tennis_Linux/Tennis.x86_64 run resources/models/p3_tennis_final/ 1
+```