From 2a36ccab81f64ead8b32781c3b6d2585bc5f5265 Mon Sep 17 00:00:00 2001 From: Jeffrey Shih <34355042+unityjeffrey@users.noreply.github.com> Date: Wed, 20 Dec 2017 14:20:34 -0800 Subject: [PATCH 1/6] Fixed typo and grammer --- docs/Organizing-the-Scene.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/Organizing-the-Scene.md b/docs/Organizing-the-Scene.md index b8a24fec37..4ad4df11a9 100644 --- a/docs/Organizing-the-Scene.md +++ b/docs/Organizing-the-Scene.md @@ -27,7 +27,7 @@ The Academy is responsible for: * Coordinating the Brains which must be set as children of the Academy. #### Brains -Each brain corresponds to a specific Decision-making method. This often aligns with a specific neural network model. A Brains is responsible for deciding the action of all the Agents which are linked to it. There can be multiple brains in the same scene and multiple agents can subscribe to the same brain. +Each brain corresponds to a specific Decision-making method. This often aligns with a specific neural network model. The brain is responsible for deciding the action of all the Agents which are linked to it. There can be multiple brains in the same scene and multiple agents can subscribe to the same brain. #### Agents Each agent within a scene takes actions according to the decisions provided by it's linked Brain. There can be as many Agents of as many types as you like in the scene. The state size and action size of each agent must match the brain's parameters in order for the Brain to decide actions for it. From 3cc9fc7be0a545f3036fd6b2bead4a53feb31ee2 Mon Sep 17 00:00:00 2001 From: Jeffrey Shih <34355042+unityjeffrey@users.noreply.github.com> Date: Wed, 20 Dec 2017 14:23:04 -0800 Subject: [PATCH 2/6] Update Agents-Editor-Interface.md --- docs/Agents-Editor-Interface.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/Agents-Editor-Interface.md b/docs/Agents-Editor-Interface.md index cfa5c52c43..3fd4949a09 100644 --- a/docs/Agents-Editor-Interface.md +++ b/docs/Agents-Editor-Interface.md @@ -32,7 +32,7 @@ values (in _Discrete_ action space). * `Action Descriptions` - A list of strings used to name the available actions for the Brain. * `State Space Type` - Corresponds to whether state vector contains a single integer (Discrete) or a series of real-valued floats (Continuous). * `Action Space Type` - Corresponds to whether action vector contains a single integer (Discrete) or a series of real-valued floats (Continuous). -* `Type of Brain` - Describes how Brain will decide actions. +* `Type of Brain` - Describes how the Brain will decide actions. * `External` - Actions are decided using Python API. * `Internal` - Actions are decided using internal TensorflowSharp model. * `Player` - Actions are decided using Player input mappings. From 7f01bb764a502ff13822b0ba774326cc32c0472c Mon Sep 17 00:00:00 2001 From: Jeffrey Shih <34355042+unityjeffrey@users.noreply.github.com> Date: Wed, 20 Dec 2017 14:33:25 -0800 Subject: [PATCH 3/6] Fixed typo --- docs/Unity-Agents-Overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/Unity-Agents-Overview.md b/docs/Unity-Agents-Overview.md index afbf97caa4..281a4b8dcc 100644 --- a/docs/Unity-Agents-Overview.md +++ b/docs/Unity-Agents-Overview.md @@ -2,7 +2,7 @@ ![diagram](../images/agents_diagram.png) -A visual depiction of how an Learning Environment might be configured within ML-Agents. +A visual depiction of how a Learning Environment might be configured within ML-Agents. The three main kinds of objects within any Agents Learning Environment are: From 301ea97e1368ca27fafc982662fc120e905ea417 Mon Sep 17 00:00:00 2001 From: Jeffrey Shih <34355042+unityjeffrey@users.noreply.github.com> Date: Wed, 20 Dec 2017 14:38:22 -0800 Subject: [PATCH 4/6] Fixed vocabulary --- docs/best-practices.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/best-practices.md b/docs/best-practices.md index 9c5592232b..bce6ef38d9 100644 --- a/docs/best-practices.md +++ b/docs/best-practices.md @@ -1,7 +1,7 @@ # Environment Design Best Practices ## General -* It is often helpful to being with the simplest version of the problem, to ensure the agent can learn it. From there increase +* It is often helpful to start with the simplest version of the problem, to ensure the agent can learn it. From there increase complexity over time. This can either be done manually, or via Curriculum Learning, where a set of lessons which progressively increase in difficulty are presented to the agent ([learn more here](../docs/curriculum.md)). * When possible, It is often helpful to ensure that you can complete the task by using a Player Brain to control the agent. From 4adcd47654fe6831c94d30c6ea41a665028b25d6 Mon Sep 17 00:00:00 2001 From: Jeffrey Shih <34355042+unityjeffrey@users.noreply.github.com> Date: Wed, 20 Dec 2017 14:39:04 -0800 Subject: [PATCH 5/6] Fixed typo --- docs/best-practices.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/best-practices.md b/docs/best-practices.md index bce6ef38d9..0d7967be37 100644 --- a/docs/best-practices.md +++ b/docs/best-practices.md @@ -9,7 +9,7 @@ complexity over time. This can either be done manually, or via Curriculum Learni * The magnitude of any given reward should typically not be greater than 1.0 in order to ensure a more stable learning process. * Positive rewards are often more helpful to shaping the desired behavior of an agent than negative rewards. * For locomotion tasks, a small positive reward (+0.1) for forward velocity is typically used. -* If you want the agent the finish a task quickly, it is often helpful to provide a small penalty every step (-0.05) that the agent does not complete the task. In this case completion of the task should also coincide with the end of the episode. +* If you want the agent to finish a task quickly, it is often helpful to provide a small penalty every step (-0.05) that the agent does not complete the task. In this case completion of the task should also coincide with the end of the episode. * Overly-large negative rewards can cause undesirable behavior where an agent learns to avoid any behavior which might produce the negative reward, even if it is also behavior which can eventually lead to a positive reward. ## States From e2a526fe26cc68342cfd43aecbe762baaaf77299 Mon Sep 17 00:00:00 2001 From: Jeffrey Shih <34355042+unityjeffrey@users.noreply.github.com> Date: Wed, 20 Dec 2017 14:39:22 -0800 Subject: [PATCH 6/6] Fixed typo --- docs/best-practices.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/best-practices.md b/docs/best-practices.md index 0d7967be37..1fd491d10f 100644 --- a/docs/best-practices.md +++ b/docs/best-practices.md @@ -3,7 +3,7 @@ ## General * It is often helpful to start with the simplest version of the problem, to ensure the agent can learn it. From there increase complexity over time. This can either be done manually, or via Curriculum Learning, where a set of lessons which progressively increase in difficulty are presented to the agent ([learn more here](../docs/curriculum.md)). -* When possible, It is often helpful to ensure that you can complete the task by using a Player Brain to control the agent. +* When possible, it is often helpful to ensure that you can complete the task by using a Player Brain to control the agent. ## Rewards * The magnitude of any given reward should typically not be greater than 1.0 in order to ensure a more stable learning process.