Unity-Technologies · awjuliani · Sep 24, 2017 · Sep 24, 2017 · Sep 24, 2017 · Sep 24, 2017
diff --git a/README.md b/README.md
@@ -1,17 +1,17 @@
 <img src="images/unity-wide.png" align="middle" width="3000"/>
 
-# Unity ML - Agents
+# Unity ML - Agents (Beta)
 
 **Unity Machine Learning Agents** allows researchers and developers to
 create games and simulations using the Unity Editor which serve as
 environments where intelligent agents can be trained using
 reinforcement learning, neuroevolution, or other machine learning
 methods through a simple-to-use Python API. For more information, see
-the [wiki page](../../wiki).
+the [documentation page](docs).
 
 For a walkthrough on how to train an agent in one of the provided
 example environments, start
-[here](../../wiki/Getting-Started-with-Balance-Ball).
+[here](docs/Getting-Started-with-Balance-Ball.md).
 
 ## Features
 * Unity Engine flexibility and simplicity
@@ -27,12 +27,12 @@ example environments, start
 The _Agents SDK_, including example environment scenes is located in
 `unity-environment` folder. For requirements, instructions, and other
 information, see the contained Readme and the relevant
-[wiki page](../../wiki/Making-a-new-Unity-Environment).
+[documentation](docs/Making-a-new-Unity-Environment.md).
 
 ## Training your Agents
 
 Once you've built a Unity Environment, example Reinforcement Learning
 algorithms and the Python API are available in the `python`
 folder. For requirements, instructions, and other information, see the
 contained Readme and the relevant
-[wiki page](../../wiki/Unity-Agents---Python-API).
+[documentation](docs/Unity-Agents---Python-API.md).
diff --git a/docs/Agents-Editor-Interface.md b/docs/Agents-Editor-Interface.md
@@ -0,0 +1,71 @@
+# ML Agents Editor Interface
+
+This page contains an explanation of the use of each of the inspector panels relating to the `Academy`, `Brain`, and `Agent` objects.
+
+## Academy
+
+![Academy Inspector](../images/academy.png)
+
+* `Max Steps` - Total number of steps per-episode. `0` corresponds to episodes without a maximum number
+of steps. Once the step counter reaches maximum, the environment will reset.
+* `Frames To Skip` - How many steps of the environment to skip before asking Brains for decisions.
+* `Wait Time` - How many seconds to wait between steps when running in `Inference`.
+* `Configuration` - The engine-level settings which correspond to rendering quality and engine speed.
+    * `Width` - Width of the environment window in pixels.
+    * `Height` - Width of the environment window in pixels.
+    * `Quality Level` - Rendering quality of environment. (Higher is better)
+    * `Time Scale` - Speed at which environment is run. (Higher is faster)
+    * `Target Frame Rate` - FPS engine attempts to maintain. 
+* `Default Reset Parameters` - List of custom parameters that can be changed in the environment on reset.
+
+## Brain
+
+![Brain Inspector](../images/brain.png)
+
+* `Brain Parameters` - Define state, observation, and action spaces for the Brain.
+    * `State Size` - Length of state vector for brain (In _Continuous_ state space). Or number of possible
+values (in _Discrete_ state space).
+    * `Action Size` - Length of action vector for brain (In _Continuous_ state space). Or number of possible
+values (in _Discrete_ action space).
+    * `Memory Size` - Length of memory vector for brain. Used with Recurrent networks and frame-stacking CNNs.
+    * `Camera Resolution` - Describes height, width, and whether to greyscale visual observations for the Brain.
+    * `Action Descriptions` - A list of strings used to name the available actions for the Brain.
+* `State Space Type` - Corresponds to whether state vector contains a single integer (Discrete) or a series of real-valued floats (Continuous).
+* `Action Space Type` - Corresponds to whether action vector contains a single integer (Discrete) or a series of real-valued floats (Continuous).
+* `Type of Brain` - Describes how Brain will decide actions.
+    * `External` - Actions are decided using Python API.
+    * `Internal` - Actions are decided using internal TensorflowSharp model.
+    * `Player` - Actions are decided using Player input mappings.
+    * `Heuristic` - Actions are decided using custom `Decision` script, which should be attached to the Brain game object.
+
+### Internal Brain
+
+![Internal Brain Inspector](../images/internal_brain.png)
+
+   *  `Graph Model` : This must be the `bytes` file corresponding to the pretrained Tensorflow graph. (You must first drag this file into your Resources folder and then from the Resources folder into the inspector)
+   *  `Graph Scope` : If you set a scope while training your tensorflow model, all your placeholder name will have a prefix. You must specify that prefix here.
+   *  `Batch Size Node Name` : If the batch size is one of the inputs of your graph, you must specify the name if the placeholder here. The brain will make the batch size equal to the number of agents connected to the brain automatically.
+   *  `State Node Name` : If your graph uses the state as an input, you must specify the name if the placeholder here.
+   *  `Recurrent Input Node Name` : If your graph uses a recurrent input / memory as input and outputs new recurrent input / memory, you must specify the name if the input placeholder here.
+   *  `Recurrent Output Node Name` : If your graph uses a recurrent input / memory as input and outputs new recurrent input / memory, you must specify the name if the output placeholder here.
+   * `Observation Placeholder Name` : If your graph uses observations as input, you must specify it here. Note that the number of observations is equal to the length of `Camera Resolutions` in the brain parameters.
+   * `Action Node Name` : Specify the name of the placeholder corresponding to the actions of the brain in your graph. If the action space type is continuous, the output must be a one dimensional tensor of float of length `Action Space Size`, if the action space type is discrete, the output must be a one dimensional tensor of int of length 1.
+   * `Graph Placeholder` : If your graph takes additional inputs that are fixed (example: noise level) you can specify them here. Note that in your graph, these must correspond to one dimensional tensors of int or float of size 1.
+     * `Name` : Corresponds to the name of the placeholdder.
+     * `Value Type` : Either Integer or Floating Point.
+     * `Min Value` and `Max Value` : Specify the range of the value here. The value will be sampled from the uniform distribution ranging from `Min Value` to `Max Value` inclusive.
+
+
+### Player Brain
+
+![Player Brain Inspector](../images/player_brain.png)
+
+If the action space is discrete, you must map input keys to their corresponding integer values. If the action space is continuous, you must map input keys to their corresponding indices and float values.
+
+## Agent
+
+![Agent Inspector](../images/agent.png)
+
+* `Brain` - The brain to register this agent to. Can be dragged into the inspector using the Editor.
+* `Observations` - A list of `Cameras` which will be used to generate observations.
+* `Max Step` - The per-agent maximum number of steps. Once this number is reached, the agent will be reset if `Reset On Done` is checked.
diff --git a/docs/Example-Environments.md b/docs/Example-Environments.md
@@ -0,0 +1,58 @@
+# Example Learning Environments
+
+### About Example Environments
+Unity ML Agents currently contains three example environments which demonstrate various features of the platform. In the coming months more will be added. We are also actively open to adding community contributed environments as examples, as long as they are small, simple, demonstrate a unique feature of the platform, and provide a unique non-trivial challenge to modern RL algorithms. Feel free to submit these environments with a Pull-Request explaining the nature of the environment and task. 
+
+Environments are located in `unity-environment/ML-Agents/Examples`.
+
+## 3DBall
+
+![Balance Ball](../images/balance.png)
+
+* Set-up: A balance-ball task, where the agent controls the platform. 
+* Goal: The agent must balance the platform in order to keep the ball on it for as long as possible.
+* Agents: The environment contains 12 agents of the same kind, all linked to a single brain.
+* Agent Reward Function: 
+    * +0.1 for every step the ball remains on the platform. 
+    * -1.0 if the ball falls from the platform.
+* Brains: One brain with the following state/action space.
+    * State space: (Continuous) 8 variables corresponding to rotation of platform, and position, rotation, and velocity of ball.
+    * Action space: (Continuous) Size of 2, with one value corresponding to X-rotation, and the other to Z-rotation.
+    * Observations: 0
+* Reset Parameters: None
+
+## GridWorld
+
+![GridWorld](../images/gridworld.png)
+
+* Set-up: A version of the classic grid-world task. Scene contains agent, goal, and obstacles. 
+* Goal: The agent must navigate the grid to the goal while avoiding the obstacles.
+* Agents: The environment contains one agent linked to a single brain.
+* Agent Reward Function: 
+    * -0.01 for every step.
+    * +1.0 if the agent navigates to the goal position of the grid (episode ends).
+    * -1.0 if the agent navigates to an obstacle (episode ends).
+* Brains: One brain with the following state/action space.
+    * State space: (Continuous) 6 variables corresponding to position of agent and nearest goal and obstacle.
+    * Action space: (Discrete) Size of 4, corresponding to movement in cardinal directions.
+    * Observations: One corresponding to top-down view of GridWorld.
+* Reset Parameters: Three, corresponding to grid size, number of obstacles, and number of goals.
+
+
+## Tennis
+
+![Tennis](../images/tennis.png)
+
+* Set-up: Two-player game where agents control rackets to bounce ball over a net. 
+* Goal: The agents must bounce ball between one another while not dropping or sending ball out of bounds.
+* Agents: The environment contains two agent linked to a single brain.
+* Agent Reward Function (independent): 
+    * -0.1 To last agent to hit ball before going out of bounds or hitting ground/net (episode ends).
+    * +0.1 To agent when hitting ball after ball was hit by the other agent. 
+    * +0.1 To agent who didn't hit ball last when ball hits ground.
+* Brains: One brain with the following state/action space.
+    * State space: (Continuous) 6 variables corresponding to position of agent and nearest goal and obstacle.
+    * Action space: (Discrete) Size of 4, corresponding to movement toward net, away from net, jumping, and no-movement.
+    * Observations: None
+* Reset Parameters: One, corresponding to size of ball.
+
diff --git a/docs/Getting-Started-with-Balance-Ball.md b/docs/Getting-Started-with-Balance-Ball.md
@@ -0,0 +1,135 @@
+# Getting Started with the Balance Ball Example
+
+![Balance Ball](../images/balance.png)
+
+This tutorial will walk through the end-to-end process of installing Unity Agents, building an example environment, training an agent in it, and finally embedding the trained model into the Unity environment. 
+
+Unity ML Agents contains a number of example environments which can be used as templates for new environments, or as ways to test a new ML algorithm to ensure it is functioning correctly. 
+
+In this walkthrough we will be using the **3D Balance Ball** environment. The environment contains a number of platforms and balls. Platforms can act to keep the ball up by rotating either horizontally or vertically. Each platform is an agent which is rewarded the longer it can keep a ball balanced on it, and provided a negative reward for dropping the ball. The goal of the training process is to have the platforms learn to never drop the ball.
+
+Let's get started!
+
+## Getting Unity ML Agents
+### Start by installing **Unity 2017.1** or later (required)
+
+Download link available [here](https://store.unity.com/download?ref=update).
+
+If you are new to using the Unity Editor, you can find the general documentation [here](https://docs.unity3d.com/Manual/index.html).
+
+### Clone the repository
+Once installed, you will want to clone the Agents GitHub repository. References will be made throughout to `unity-environment` and `python` directories. Both are located at the root of the repository. 
+
+## Building Unity Environment
+Launch the Unity Editor, and log in, if necessary. 
+
+1. Open the `unity-environment` folder using the Unity editor.  *(If this is not first time running Unity, you'll be able to skip most of these immediate steps, choose directly from the list of recently opened projects)*
+    - On the initial dialog, choose `Open` on the top options
+    - On the file dialog, choose `unity-environment` and click `Open` *(It is safe to ignore any warning message about non-matching editor installation)*
+    - Once the project is open, on the `Project` panel (bottom of the tool), navigate to the folder `Assets/ML-Agents/Examples/3DBall/`
+    - Double-click the `Scene` icon (Unity logo) to load all environment assets
+2. Go to `Edit -> Project Settings -> Player`
+    - Ensure that `Resolution and Presentation -> Run in Background` is Checked.
+    - Ensure that `Resolution and Presentation -> Display Resolution Dialog` is set to Disabled.
+3. Expand the `Ball3DAcademy` GameObject and locate its child object `Ball3DBrain` within the Scene hierarchy in the editor. Ensure Type of Brain for this object is set to `External`.
+4. *File -> Build Settings*
+5. Choose your target platform:
+    - (optional) Select “Developer Build” to log debug messages.
+6. Click *Build*:
+    - Save environment binary to the `python` sub-directory of the cloned repository *(you may need to click on the down arrow on the file chooser to be able to select that folder)*
+
+## Installing Python API
+In order to train an agent within the framework, you will need to install Python 2 or 3, and the dependencies described below.
+
+### Windows Users
+
+If you are a Windows user who is new to Python/TensorFlow, follow [this guide](https://nitishmutha.github.io/tensorflow/2017/01/22/TensorFlow-with-gpu-for-windows.html) to set up your Python environment.
+
+### Requirements
+* Jupyter
+* Matplotlib
+* numpy
+* Pillow
+* Python (2 or 3)
+* scipy
+* TensorFlow (1.0+)
+
+### Installing Dependencies
+To install dependencies, go into the `python` directory and run:
+
+`pip install .`
+
+or 
+
+`pip3 install  .`
+
+If your Python environment doesn't include `pip`, see these [instructions](https://packaging.python.org/guides/installing-using-linux-tools/#installing-pip-setuptools-wheel-with-linux-package-managers) on installing it.
+
+Once dependencies are installed, you are ready to test the Ball balance environment from Python.
+
+### Testing Python API
+
+To launch jupyter, run in the command line:
+
+`jupyter notebook` 
+
+Then navigate to `localhost:8888` to access the notebooks. If you're new to jupyter, check out the [quick start guide](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/execute.html) before you continue.
+
+To ensure that your environment and the Python API work as expected, you can use the `python/Basics` Jupyter notebook. This notebook contains a simple walkthrough of the functionality of the API. Within `Basics`, be sure to set `env_name` to the name of the environment file you built earlier.
+
+## Training the Brain with Reinforcement Learning
+
+### Training with PPO
+In order to train an agent to correctly balance the ball, we will use a Reinforcement Learning algorithm called Proximal Policy Optimization (PPO). This is a method that has been shown to be safe, efficient, and more general purpose than many other RL algorithms, as such we have chosen it as the example algorithm for use with ML Agents. For more information on PPO, OpenAI has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/) explaining it. 
+
+In order to train the agents within the Ball Balance environment:
+
+1. Open `python/PPO.ipynb` notebook from Jupyter.
+2. Set `env_name` to whatever you named your environment file.
+3. (optional) Set `run_path` directory to your choice. 
+4. Run all cells of notebook except for final.
+
+### Observing Training Progress
+In order to observe the training process in more detail, you can use Tensorboard. 
+In your command line, run :
+
+`tensorboard --logdir='summaries`
+
+Then navigate to `localhost:6006`.
+
+From Tensorboard, you will see the summary statistics of six variables:
+* Cumulative Reward - The mean cumulative episode reward over all agents. Should increase during a successful training session.
+* Value Loss - The mean loss of the value function update. Correlates to how well the model is able to predict the value of each state. This should decrease during a succesful training session.
+* Policy Loss - The mean loss of the policy function update. Correlates to how much the policy (process for deciding actions) is changing. The magnitude of this should decrease during a succesful training session.
+* Episode Length - The mean length of each episode in the environment for all agents.
+* Value Estimates - The mean value estimate for all states visited by the agent. Should increase during a successful training session.
+* Policy Entropy - How random the decisions of the model are. Should slowly decrease during a successful training process. If it decreases too quickly, the `beta` hyperparameter should be increased.
+
+## Embedding Trained Brain into Unity Environment _[Experimental]_
+Once the training process displays an average reward of ~75 or greater, and there has been a recently saved model (denoted by the `Saved Model` message) you can choose to stop the training process by stopping the cell execution. Once this is done, you now have a trained TensorFlow model. You must now convert the saved model to a Unity-ready format which can be embedded directly into the Unity project by following the steps below.
+
+### Setting up TensorFlowSharp Support 
+Because TensorFlowSharp support is still experimental, it is disabled by default. In order to enable it, you must follow these steps. Please note that the `Internal` Brain mode will only be available once completing these steps.
+
+1. Make sure you are using Unity 2017.1 or newer.
+2. Make sure the TensorFlowSharp plugin is in your Asset folder. A Plugins folder which includes TF# can be downloaded [here](https://s3.amazonaws.com/unity-agents/TFSharpPlugin.unitypackage).
+3. Go to `Edit` -> `Project Settings` -> `Player`
+4. For each of the platforms you target (**`PC, Mac and Linux Standalone`**, **`iOS`** or **`Android`**):   
+	1. Go into `Other Settings`.
+	2. Select `Scripting Runtime Version` to `Experimental (.NET 4.6 Equivalent)` 
+	3. In `Scripting Defined Symbols`, add the flag `ENABLE_TENSORFLOW`
+5. Restart the Unity Editor.
+
+### Embedding the trained model into Unity
+
+1. Run the final cell of the notebook under "Export the trained TensorFlow graph" to produce an `<env_name >.bytes` file. 
+2. Move `<env_name>.bytes` from `python/models/...` into `unity-environment/Assets/ML-Agents/Examples/3DBall/TFModels/`.
+3. Open the Unity Editor, and select the `3DBall` scene as described above. 
+4. Select the `3DBallBrain` object from the Scene hierarchy. 
+5. Change the `Type of Brain` to `Internal`.
+6. Drag the `<env_name>.bytes` file from the Project window of the Editor to the `Graph Model` placeholder in the `3DBallBrain` inspector window.
+7. Set the `Graph Placeholder` size to 1. 
+8. Add a placeholder called `epsilon` with a type of `floating point` and a range of values from 0 to 0.
+9. Press the Play button at the top of the editor.
+
+If you followed these steps correctly, you should now see the trained model being used to control the behavior of the balance ball within the Editor itself. From here you can re-build the Unity binary, and run it standalone with your agent's new learned behavior built right in.