From d69b303145c41353899a0ce42fde59f9089f1ec2 Mon Sep 17 00:00:00 2001 From: Jeffrey Shih <34355042+unityjeffrey@users.noreply.github.com> Date: Thu, 6 Sep 2018 10:20:09 -0700 Subject: [PATCH 1/2] minor grammatical fix --- docs/Feature-Memory.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/Feature-Memory.md b/docs/Feature-Memory.md index 79b8eead50..3e4ef5efb9 100644 --- a/docs/Feature-Memory.md +++ b/docs/Feature-Memory.md @@ -1,6 +1,6 @@ # Memory-enhanced agents using Recurrent Neural Networks -## What are memories for +## What are memories used for? Have you ever entered a room to get something and immediately forgot what you were looking for? Don't let that happen to your agents. From 7479a966646865e50e53417b033a9a34ec47503a Mon Sep 17 00:00:00 2001 From: Jeffrey Shih <34355042+unityjeffrey@users.noreply.github.com> Date: Thu, 6 Sep 2018 10:30:45 -0700 Subject: [PATCH 2/2] fixed definition in table --- docs/Training-ML-Agents.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/Training-ML-Agents.md b/docs/Training-ML-Agents.md index 17e7966fa6..29945545ae 100644 --- a/docs/Training-ML-Agents.md +++ b/docs/Training-ML-Agents.md @@ -158,7 +158,7 @@ after the GameObject containing the Brain component that should use these settings. (This GameObject will be a child of the Academy in your scene.) Sections for the example environments are included in the provided config file. -| **Setting** | **Description** | **Applies To Trainer**| +| **Setting** | **Description** | **Applies To Trainer\***| | :-- | :-- | :-- | | batch_size | The number of experiences in each iteration of gradient descent.| PPO, BC | | batches_per_epoch | In imitation learning, the number of batches of training examples to collect before training the model.| BC | @@ -183,7 +183,8 @@ Sections for the example environments are included in the provided config file. | trainer | The type of training to perform: "ppo" or "imitation".| PPO, BC | | use_curiosity | Train using an additional intrinsic reward signal generated from Intrinsic Curiosity Module. | PPO | | use_recurrent | Train using a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md).| PPO, BC | -|| PPO = Proximal Policy Optimization, BC = Behavioral Cloning (Imitation)) || + +\*PPO = Proximal Policy Optimization, BC = Behavioral Cloning (Imitation) For specific advice on setting hyperparameters based on the type of training you are conducting, see: