diff --git a/README.md b/README.md index 0b4e3efcc6..46004f1f8f 100755 --- a/README.md +++ b/README.md @@ -27,7 +27,7 @@ to the wider research and game developer communities. * Built-in support for Imitation Learning * Flexible Agent control with On Demand Decision Making * Visualizing network outputs within the environment -* Simplified set-up with Docker _(Experimental)_ +* Simplified set-up with Docker (Experimental) ## Documentation and References diff --git a/docs/Background-Machine-Learning.md b/docs/Background-Machine-Learning.md index 0bf6637eae..427391f3f5 100644 --- a/docs/Background-Machine-Learning.md +++ b/docs/Background-Machine-Learning.md @@ -194,7 +194,7 @@ natural choice for reinforcement learning tasks when a large amount of data can be generated, say through the use of a simulator or engine such as Unity. By generating hundreds of thousands of simulations of the environment within Unity, we can learn policies for very complex environments -(a complex environment is one where the number of observations an agent percieves +(a complex environment is one where the number of observations an agent perceives and the number of actions they can take are large). Many of the algorithms we provide in ML-Agents use some form of deep learning, built on top of the open-source library, [TensorFlow](Background-TensorFlow.md). diff --git a/docs/Getting-Started-with-Balance-Ball.md b/docs/Getting-Started-with-Balance-Ball.md index 78b0c780d7..927b5424d3 100644 --- a/docs/Getting-Started-with-Balance-Ball.md +++ b/docs/Getting-Started-with-Balance-Ball.md @@ -313,7 +313,7 @@ during a successful training session. ![Example TensorBoard Run](images/mlagents-TensorBoard.png) -## Embedding the Trained Brain into the Unity Environment _[Experimental]_ +## Embedding the Trained Brain into the Unity Environment (Experimental) Once the training process completes, and the training process saves the model (denoted by the `Saved Model` message) you can add it to the Unity project and diff --git a/docs/Learning-Environment-Best-Practices.md b/docs/Learning-Environment-Best-Practices.md index ac2e6e9328..dc444c2668 100644 --- a/docs/Learning-Environment-Best-Practices.md +++ b/docs/Learning-Environment-Best-Practices.md @@ -15,7 +15,7 @@ complexity over time. This can either be done manually, or via Curriculum Learni ## Vector Observations * Vector Observations should include all variables relevant to allowing the agent to take the optimally informed decision. -* Categorical variables such as type of object (Sword, Shield, Bow) should be encoded in one-hot fashion (ie `3` -> `0, 0, 1`). +* Categorical variables such as type of object (Sword, Shield, Bow) should be encoded in one-hot fashion (i.e. `3` -> `0, 0, 1`). * Besides encoding non-numeric values, all inputs should be normalized to be in the range 0 to +1 (or -1 to 1). For example, the `x` position information of an agent where the maximum possible value is `maxValue` should be recorded as `AddVectorObs(transform.position.x / maxValue);` rather than `AddVectorObs(transform.position.x);`. See the equation below for one approach of normalization. * Positional information of relevant GameObjects should be encoded in relative coordinates wherever possible. This is often relative to the agent position. @@ -23,4 +23,4 @@ complexity over time. This can either be done manually, or via Curriculum Learni ## Vector Actions * When using continuous control, action values should be clipped to an appropriate range. -* Be sure to set the Vector Action's Space Size to the number of used Vector Actions, and not greater, as doing the latter can interfere with the efficency of the training process. +* Be sure to set the Vector Action's Space Size to the number of used Vector Actions, and not greater, as doing the latter can interfere with the efficiency of the training process. diff --git a/docs/Learning-Environment-Create-New.md b/docs/Learning-Environment-Create-New.md index 102567687e..83097ed563 100644 --- a/docs/Learning-Environment-Create-New.md +++ b/docs/Learning-Environment-Create-New.md @@ -161,7 +161,6 @@ So far, our RollerAgent script looks like: public class RollerAgent : Agent { - Rigidbody rBody; void Start () { rBody = GetComponent(); @@ -195,7 +194,7 @@ The Agent sends the information we collect to the Brain, which uses it to make a In our case, the information our agent collects includes: -* Position of the target. In general, it is better to use the relative position of other objects rather than the absolute position for more generalizable training. Note that the agent only collects the x and z coordinates since the floor is aligned with the xz plane and the y component of the target's position never changes. +* Position of the target. In general, it is better to use the relative position of other objects rather than the absolute position for more generalizable training. Note that the agent only collects the x and z coordinates since the floor is aligned with the x-z plane and the y component of the target's position never changes. // Calculate relative position Vector3 relativePosition = Target.position - this.transform.position; diff --git a/docs/Learning-Environment-Design-Agents.md b/docs/Learning-Environment-Design-Agents.md index 1aa182edaf..b6926df60d 100644 --- a/docs/Learning-Environment-Design-Agents.md +++ b/docs/Learning-Environment-Design-Agents.md @@ -58,8 +58,6 @@ For examples of various state observation functions, you can look at the [Exampl AddVectorObs(ball.transform.GetComponent().velocity.z); } - - The feature vector must always contain the same number of elements and observations must always be in the same position within the list. If the number of observed entities in an environment can vary you can pad the feature vector with zeros for any missing entities in a specific observation or you can limit an agent's observations to a fixed subset. For example, instead of observing every enemy agent in an environment, you could only observe the closest five. When you set up an Agent's brain in the Unity Editor, set the following properties to use a continuous vector observation: @@ -94,11 +92,6 @@ Type enumerations should be encoded in the _one-hot_ style. That is, add an elem } - - #### Normalization For the best results when training, you should normalize the components of your feature vector to the range [-1, +1] or [0, 1]. When you normalize the values, the PPO neural network can often converge to a solution faster. Note that it isn't always necessary to normalize to these recommended ranges, but it is considered a best practice when using neural networks. The greater the variation in ranges between the components of your observation, the more likely that training will be affected. @@ -129,7 +122,7 @@ In addition, make sure that the Agent's Brain expects a visual observation. In t ### Discrete Vector Observation Space: Table Lookup -You can use the discrete vector observation space when an agent only has a limited number of possible states and those states can be enumerated by a single number. For instance, the [Basic example environment](Learning-Environment-Examples.md) in the ML Agent SDK defines an agent with a discrete vector observation space. The states of this agent are the integer steps between two linear goals. In the Basic example, the agent learns to move to the goal that provides the greatest reward. +You can use the discrete vector observation space when an agent only has a limited number of possible states and those states can be enumerated by a single number. For instance, the [Basic example environment](Learning-Environment-Examples.md) in ML-Agents defines an agent with a discrete vector observation space. The states of this agent are the integer steps between two linear goals. In the Basic example, the agent learns to move to the goal that provides the greatest reward. More generally, the discrete vector observation identifier could be an index into a table of the possible states. However, tables quickly become unwieldy as the environment becomes more complex. For example, even a simple game like [tic-tac-toe has 765 possible states](https://en.wikipedia.org/wiki/Game_complexity) (far more if you don't reduce the number of observations by combining those that are rotations or reflections of each other). @@ -310,7 +303,7 @@ To add an Agent to an environment at runtime, use the Unity `GameObject.Instanti ## Destroying an Agent -Before destroying an Agent Gameobject, you must mark it as done (and wait for the next step in the simulation) so that the Brain knows that this agent is no longer active. Thus, the best place to destroy an agent is in the `Agent.AgentOnDone()` function: +Before destroying an Agent GameObject, you must mark it as done (and wait for the next step in the simulation) so that the Brain knows that this agent is no longer active. Thus, the best place to destroy an agent is in the `Agent.AgentOnDone()` function: ```csharp public override void AgentOnDone() diff --git a/docs/Learning-Environment-Design-Brains.md b/docs/Learning-Environment-Design-Brains.md index f8f9574a05..b073b4e98c 100644 --- a/docs/Learning-Environment-Design-Brains.md +++ b/docs/Learning-Environment-Design-Brains.md @@ -24,7 +24,7 @@ The Brain Inspector window in the Unity Editor displays the properties assigned * `Space Type` - Corresponds to whether the observation vector contains a single integer (Discrete) or a series of real-valued floats (Continuous). * `Space Size` - Length of vector observation for brain (In _Continuous_ space type). Or number of possible values (in _Discrete_ space type). * `Stacked Vectors` - The number of previous vector observations that will be stacked before being sent to the brain. - * `Visual Observations` - Describes height, width, and whether to greyscale visual observations for the Brain. + * `Visual Observations` - Describes height, width, and whether to grayscale visual observations for the Brain. * `Vector Action` * `Space Type` - Corresponds to whether action vector contains a single integer (Discrete) or a series of real-valued floats (Continuous). * `Space Size` - Length of action vector for brain (In _Continuous_ state space). Or number of possible values (in _Discrete_ action space). diff --git a/docs/Learning-Environment-Design-External-Internal-Brains.md b/docs/Learning-Environment-Design-External-Internal-Brains.md index 799b47298f..a984a0fd93 100644 --- a/docs/Learning-Environment-Design-External-Internal-Brains.md +++ b/docs/Learning-Environment-Design-External-Internal-Brains.md @@ -24,7 +24,7 @@ The training algorithms included in the ML-Agents SDK produce TensorFlow graph m To use a graph model: -1. Select the Brain GameObject in the **Hierarchy** window of the Unity Editor. (The Brain GameObject must be a child of the Academy Gameobject and must have a Brain component.) +1. Select the Brain GameObject in the **Hierarchy** window of the Unity Editor. (The Brain GameObject must be a child of the Academy GameObject and must have a Brain component.) 2. Set the **Brain Type** to **Internal**. **Note:** In order to see the **Internal** Brain Type option, you must [enable TensorFlowSharp](Using-TensorFlow-Sharp-in-Unity.md). @@ -44,7 +44,7 @@ The default values of the TensorFlow graph parameters work with the model produc ![Internal Brain Inspector](images/internal_brain.png) - * `Graph Model` : This must be the `bytes` file corresponding to the pretrained Tensorflow graph. (You must first drag this file into your Resources folder and then from the Resources folder into the inspector) + * `Graph Model` : This must be the `bytes` file corresponding to the pre-trained TensorFlow graph. (You must first drag this file into your Resources folder and then from the Resources folder into the inspector) Only change the following Internal Brain properties if you have created your own TensorFlow model and are not using an ML-Agents model: diff --git a/docs/Learning-Environment-Design.md b/docs/Learning-Environment-Design.md index d57373b923..2cb2df6d14 100644 --- a/docs/Learning-Environment-Design.md +++ b/docs/Learning-Environment-Design.md @@ -25,7 +25,7 @@ The ML-Agents Academy class orchestrates the agent simulation loop as follows: To create a training environment, extend the Academy and Agent classes to implement the above methods. The `Agent.CollectObservations()` and `Agent.AgentAction()` functions are required; the other methods are optional — whether you need to implement them or not depends on your specific scenario. -**Note:** The API used by the Python PPO training process to communicate with and control the Academy during training can be used for other purposes as well. For example, you could use the API to use Unity as the simulation engine for your own machine learning algorithms. See [External ML API](Python-API.md) for more information. +**Note:** The API used by the Python PPO training process to communicate with and control the Academy during training can be used for other purposes as well. For example, you could use the API to use Unity as the simulation engine for your own machine learning algorithms. See [Python API](Python-API.md) for more information. ## Organizing the Unity Scene diff --git a/docs/Python-API.md b/docs/Python-API.md index d7f22eb844..d92bd4b202 100644 --- a/docs/Python-API.md +++ b/docs/Python-API.md @@ -38,7 +38,7 @@ A BrainInfo object contains the following fields: * **`text_observations`** : A list of string corresponding to the agents text observations. * **`memories`** : A two dimensional numpy array of dimension `(batch size, memory size)` which corresponds to the memories sent at the previous step. * **`rewards`** : A list as long as the number of agents using the brain containing the rewards they each obtained at the previous step. -* **`local_done`** : A list as long as the number of agents using the brain containing `done` flags (wether or not the agent is done). +* **`local_done`** : A list as long as the number of agents using the brain containing `done` flags (whether or not the agent is done). * **`max_reached`** : A list as long as the number of agents using the brain containing true if the agents reached their max steps. * **`agents`** : A list of the unique ids of the agents using the brain. * **`previous_actions`** : A two dimensional numpy array of dimension `(batch size, vector action size)` if the vector action space is continuous and `(batch size, 1)` if the vector action space is discrete. diff --git a/docs/Training-Imitation-Learning.md b/docs/Training-Imitation-Learning.md index c96b3273e9..6bd686c1f9 100644 --- a/docs/Training-Imitation-Learning.md +++ b/docs/Training-Imitation-Learning.md @@ -16,7 +16,7 @@ There are a variety of possible imitation learning algorithms which can be used, 8. From the Unity window, control the agent with the Teacher brain by providing "teacher demonstrations" of the behavior you would like to see. 9. Watch as the agent(s) with the student brain attached begin to behave similarly to the demonstrations. 10. Once the Student agents are exhibiting the desired behavior, end the training process with `CTL+C` from the command line. -11. Move the resulting `*.bytes` file into the `TFModels` sub-directory of the Assets folder (or a sub-directority within Assets of your choosing) , and use with `Internal` brain. +11. Move the resulting `*.bytes` file into the `TFModels` subdirectory of the Assets folder (or a subdirectory within Assets of your choosing) , and use with `Internal` brain. ### BC Teacher Helper diff --git a/docs/Training-ML-Agents.md b/docs/Training-ML-Agents.md index 2078c0a6fc..5e813cf5a5 100644 --- a/docs/Training-ML-Agents.md +++ b/docs/Training-ML-Agents.md @@ -43,7 +43,7 @@ While this example used the default training hyperparameters, you can edit the [ In addition to passing the path of the Unity executable containing your training environment, you can set the following command line options when invoking `learn.py`: -* `--curriculum=` – Specify a curriculum json file for defining the lessons for curriculum training. See [Curriculum Training](Training-Curriculum-Learning.md) for more information. +* `--curriculum=` – Specify a curriculum JSON file for defining the lessons for curriculum training. See [Curriculum Training](Training-Curriculum-Learning.md) for more information. * `--keep-checkpoints=` – Specify the maximum number of model checkpoints to keep. Checkpoints are saved after the number of steps specified by the `save-freq` option. Once the maximum number of checkpoints has been reached, the oldest checkpoint is deleted when saving a new checkpoint. Defaults to 5. * `--lesson=` – Specify which lesson to start with when performing curriculum training. Defaults to 0. * `--load` – If set, the training code loads an already trained model to initialize the neural network before training. The learning code looks for the model in `python/models//` (which is also where it saves models at the end of training). When not set (the default), the neural network weights are randomly initialized and an existing model is not loaded. diff --git a/docs/Training-PPO.md b/docs/Training-PPO.md index aa8e1994f3..3c7d0398e2 100644 --- a/docs/Training-PPO.md +++ b/docs/Training-PPO.md @@ -10,8 +10,6 @@ If you are using curriculum training to pace the difficulty of the learning task For information about imitation learning, which uses a different training algorithm, see [Imitation Learning](Training-Imitation-Learning). - - ## Best Practices when training with PPO Successfully training a Reinforcement Learning model often involves tuning the training hyperparameters. This guide contains some best practices for tuning the training process when the default parameters don't seem to be giving the level of performance you would like. @@ -28,7 +26,7 @@ Typical Range: `2048` - `409600` #### Batch Size `batch_size` is the number of experiences used for one iteration of a gradient descent update. **This should always be a fraction of the -`buffer_size`**. If you are using a continuous action space, this value should be large (in the order of 1000s). If you are using a discrete action space, this value +`buffer_size`**. If you are using a continuous action space, this value should be large (in the order of 1000s). If you are using a discrete action space, this value should be smaller (in order of 10s). Typical Range (Continuous): `512` - `5120` diff --git a/docs/Training-on-Amazon-Web-Service.md b/docs/Training-on-Amazon-Web-Service.md index b6b3228659..1f3b080f3d 100644 --- a/docs/Training-on-Amazon-Web-Service.md +++ b/docs/Training-on-Amazon-Web-Service.md @@ -9,11 +9,11 @@ A public pre-configured AMI is available with the ID: `ami-30ec184a` in the `us- Instructions here are adapted from this [Medium post](https://medium.com/towards-data-science/how-to-run-unity-on-amazon-cloud-or-without-monitor-3c10ce022639) on running general Unity applications in the cloud. 1. To begin with, you will need an EC2 instance which contains the latest Nvidia drivers, CUDA8, and cuDNN. There are a number of external tutorials which describe this, such as: - * [Getting CUDA 8 to Work With openAI Gym on AWS and Compiling Tensorflow for CUDA 8 Compatibility](https://davidsanwald.github.io/2016/11/13/building-tensorflow-with-gpu-support.html) + * [Getting CUDA 8 to Work With openAI Gym on AWS and Compiling TensorFlow for CUDA 8 Compatibility](https://davidsanwald.github.io/2016/11/13/building-tensorflow-with-gpu-support.html) * [Installing TensorFlow on an AWS EC2 P2 GPU Instance](http://expressionflow.com/2016/10/09/installing-tensorflow-on-an-aws-ec2-p2-gpu-instance/) * [Updating Nvidia CUDA to 8.0.x in Ubuntu 16.04 – EC2 Gx instance](https://aichamp.wordpress.com/2016/11/09/updating-nvidia-cuda-to-8-0-x-in-ubuntu-16-04-ec2-gx-instance/) 2. Move `python` to remote instance. -2. Install the required packages with `pip install .`. +2. Install the required packages with `pip3 install .`. 3. Run the following commands to install Xorg: ``` sudo apt-get update @@ -33,7 +33,7 @@ If you run `nvidia-smi`, you will have a list of processes running on the GPU, X sudo /usr/bin/X :0 & export DISPLAY=:0 ``` -3. To ensure the installation was succesful, run `glxgears`. If there are no errors, then Xorg is correctly configured. +3. To ensure the installation was successful, run `glxgears`. If there are no errors, then Xorg is correctly configured. 4. There is a bug in _Unity 2017.1_ which requires the uninstallation of `libxrandr2`, which can be removed with : ``` sudo apt-get remove --purge libwxgtk3.0-0v5 diff --git a/docs/Using-Docker.md b/docs/Using-Docker.md index f61a75cd59..8f8f826d21 100644 --- a/docs/Using-Docker.md +++ b/docs/Using-Docker.md @@ -14,7 +14,7 @@ add the _Linux Build Support_ Component - [Download](https://www.docker.com/community-edition#/download) and install Docker if you don't have it setup on your machine. -- Since Docker runs a container in an environment that is isolated from the host machine, a mounted directory in your host machine is used to share data, e.g. the Unity executable, curriculum files and tensorflow graph. For convenience, we created an empty `unity-volume` directory at the root of the repository for this purpose, but feel free to use any other directory. The remainder of this guide assumes that the `unity-volume` directory is the one used. +- Since Docker runs a container in an environment that is isolated from the host machine, a mounted directory in your host machine is used to share data, e.g. the Unity executable, curriculum files and TensorFlow graph. For convenience, we created an empty `unity-volume` directory at the root of the repository for this purpose, but feel free to use any other directory. The remainder of this guide assumes that the `unity-volume` directory is the one used. ## Usage diff --git a/docs/Using-TensorFlow-Sharp-in-Unity.md b/docs/Using-TensorFlow-Sharp-in-Unity.md index 0fd1e7fde7..a85c18bca0 100644 --- a/docs/Using-TensorFlow-Sharp-in-Unity.md +++ b/docs/Using-TensorFlow-Sharp-in-Unity.md @@ -1,4 +1,4 @@ -# Using TensorFlowSharp in Unity _[Experimental]_ +# Using TensorFlowSharp in Unity (Experimental) ML-Agents allows you to use pre-trained [TensorFlow graphs](https://www.tensorflow.org/programmers_guide/graphs) inside your Unity games. This support is possible thanks to [the TensorFlowSharp project](https://github.com/migueldeicaza/TensorFlowSharp). The primary purpose for this support is to use the TensorFlow models produced by the ML-Agents own training programs, but a side benefit is that you can use any TensorFlow model. @@ -137,4 +137,4 @@ To load and use a TensorFlow data graph in Unity: float[,] recurrent_tensor = runner.Run () [0].GetValue () as float[,]; ``` - Note that this example assumes the output array is a two-dimensional tensor of floats. Cast to a long array if your outputs are integers. \ No newline at end of file + Note that this example assumes the output array is a two-dimensional tensor of floats. Cast to a long array if your outputs are integers. diff --git a/docs/doxygen/Readme.md b/docs/doxygen/Readme.md index 62f233aa88..7755c618da 100644 --- a/docs/doxygen/Readme.md +++ b/docs/doxygen/Readme.md @@ -2,4 +2,4 @@ To generate the API reference as HTML files, run: - doxygen ml-agents.conf \ No newline at end of file + doxygen ml-agents.conf diff --git a/unity-environment/Assets/ML-Agents/Scripts/Agent.cs b/unity-environment/Assets/ML-Agents/Scripts/Agent.cs index e86065b508..4bfe55a2b2 100755 --- a/unity-environment/Assets/ML-Agents/Scripts/Agent.cs +++ b/unity-environment/Assets/ML-Agents/Scripts/Agent.cs @@ -180,7 +180,7 @@ public class AgentParameters /// see the Examples/ directory within this Unity project. /// [HelpURL("https://github.com/Unity-Technologies/ml-agents/blob/master/" + - "docs/Learning-Environment-Design-Agent.md")] + "docs/Learning-Environment-Design-Agents.md")] [System.Serializable] public abstract class Agent : MonoBehaviour { diff --git a/unity-environment/Assets/ML-Agents/Scripts/Brain.cs b/unity-environment/Assets/ML-Agents/Scripts/Brain.cs index 16ae9c0789..33ecbf4891 100755 --- a/unity-environment/Assets/ML-Agents/Scripts/Brain.cs +++ b/unity-environment/Assets/ML-Agents/Scripts/Brain.cs @@ -75,7 +75,8 @@ public class BrainParameters /**< \brief Defines if the state is discrete or continuous */ } -[HelpURL("https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Agents-Editor-Interface.md#brain")] +[HelpURL("https://github.com/Unity-Technologies/ml-agents/blob/master/" + + "docs/Learning-Environment-Design-Brains.md")] /** * Contains all high-level Brain logic. * Add this component to an empty GameObject in your scene and drag this diff --git a/unity-environment/Assets/ML-Agents/Scripts/CoreBrainInternal.cs b/unity-environment/Assets/ML-Agents/Scripts/CoreBrainInternal.cs index 1dde1cdb34..94af207cf5 100644 --- a/unity-environment/Assets/ML-Agents/Scripts/CoreBrainInternal.cs +++ b/unity-environment/Assets/ML-Agents/Scripts/CoreBrainInternal.cs @@ -496,7 +496,7 @@ public void OnInspector() "order to use the internal brain.", MessageType.Error); if (GUILayout.Button("Show me how")) { - Application.OpenURL("https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Getting-Started-with-Balance-Ball.md#setting-up-tensorflowsharp-support"); + Application.OpenURL("https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Getting-Started-with-Balance-Ball.md#embedding-the-trained-brain-into-the-unity-environment-experimental"); } #endif }