-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Documentation 0.5 Release Check List (Part 1) #1154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This tutorial walks through the process of creating a Unity Environment. A Unity | ||
Environment is an application built using the Unity Engine which can be used to | ||
train Reinforcement Learning agents. | ||
train Reinforcement Learning Agents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be lowercased.
steps: | ||
|
||
1. Create an environment for your agents to live in. An environment can range | ||
1. Create an environment for your Agents to live in. An environment can range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lowercased.
The Agent sends the information we collect to the Brain, which uses it to make a | ||
decision. When you train the agent (or use a trained model), the data is fed | ||
into a neural network as a feature vector. For an agent to successfully learn a | ||
decision. When you train the Agent (or use a trained model), the data is fed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lowercased.
|
||
* Position of the agent itself within the confines of the floor. This data is | ||
collected as the agent's distance from each edge of the floor. | ||
* Position of the Agent itself within the confines of the floor. This data is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lowercased.
the task. For example, the RollerAgent reward system provides a small reward if | ||
the agent moves closer to the target in a step and a small negative reward at | ||
each step which encourages the agent to complete its task quickly. | ||
the Agent moves closer to the target in a step and a small negative reward at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lowercased.
Heuristics or Internal brains game sessions. You can then use this data to train | ||
an agent in a supervised context. | ||
Heuristics or Internal Brains game sessions. You can then use this data to train | ||
an Agent in a supervised context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lowercased.
that you can use with the Internal Brain type. | ||
|
||
A __model__ is a mathematical relationship mapping an agent's observations to | ||
A __model__ is a mathematical relationship mapping an Agent's observations to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lowercased.
docs/Learning-Environment-Design.md
Outdated
Reinforcement learning is an artificial intelligence technique that trains | ||
_agents_ to perform tasks by rewarding desirable behavior. During reinforcement | ||
learning, an agent explores its environment, observes the state of things, and, | ||
learning, an Agent explores its environment, observes the state of things, and, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lowercased.
docs/Learning-Environment-Design.md
Outdated
state, the agent receives a positive reward. If it leads to a less desirable | ||
state, then the agent receives no reward or a negative reward (punishment). As | ||
the agent learns during training, it optimizes its decision making so that it | ||
state, the Agent receives a positive reward. If it leads to a less desirable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lowercased.
docs/Learning-Environment-Design.md
Outdated
[Proximal Policy Optimization (PPO)](https://blog.openai.com/openai-baselines-ppo/). | ||
PPO uses a neural network to approximate the ideal function that maps an agent's | ||
observations to the best action an agent can take in a given state. The | ||
PPO uses a neural network to approximate the ideal function that maps an Agent's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lowercased.
docs/Learning-Environment-Design.md
Outdated
|
||
**Note:** if you aren't studying machine and reinforcement learning as a subject | ||
and just want to train agents to accomplish tasks, you can treat PPO training as | ||
and just want to train Agents to accomplish tasks, you can treat PPO training as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lowercased.
docs/Learning-Environment-Design.md
Outdated
a _black box_. There are a few training-related parameters to adjust inside | ||
Unity as well as on the Python training side, but you do not need in-depth | ||
knowledge of the algorithm itself to successfully create and train agents. | ||
knowledge of the algorithm itself to successfully create and train Agents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lowercased.
class. The Academy works with Agent and Brain objects in the scene to step | ||
through the simulation. When either the Academy has reached its maximum number | ||
of steps or all agents in the scene are _done_, one training episode is | ||
of steps or all Agents in the scene are _done_, one training episode is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lowercased.
docs/Learning-Environment-Design.md
Outdated
|
||
An _environment_ in the ML-Agents toolkit can be any scene built in Unity. The | ||
Unity scene provides the environment in which agents observe, act, and learn. | ||
Unity scene provides the environment in which Agents observe, act, and learn. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lowercased.
* You can put your executable on a remote machine for faster training. | ||
* You can use `Headless` mode for faster training. | ||
* You can keep using the Unity Editor for other tasks while the agents are | ||
* You can keep using the Unity Editor for other tasks while the Agents are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lowercased.
docs/ML-Agents-Overview.md
Outdated
- **Observations** - what the medic perceives about the environment. | ||
Observations can be numeric and/or visual. Numeric observations measure | ||
attributes of the environment from the point of view of the agent. For our | ||
attributes of the environment from the point of view of the Agent. For our |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lowercased.
docs/ML-Agents-Overview.md
Outdated
|
||
- Single-Agent. A single Agent linked to a single Brain, with its own reward | ||
signal. The traditional way of training an agent. An example is any | ||
signal. The traditional way of training an Agent. An example is any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lowercased.
- **Monitoring Agent’s Decision Making** - Since communication in ML-Agents is a | ||
two-way street, we provide an agent Monitor class in Unity which can display | ||
aspects of the trained agent, such as the agents perception on how well it is | ||
two-way street, we provide an Agent Monitor class in Unity which can display |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep as it is. All others in this file should be lowercased.
docs/Migrating.md
Outdated
packages, `mlagents.env` and `mlagents.trainers`. `mlagents.env` can be used | ||
to interact directly with a Unity environment, while `mlagents.trainers` | ||
contains the classes for training agents. | ||
contains the classes for training Agents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lowercased.
docs/Training-ML-Agents.md
Outdated
The ML-Agents toolkit conducts training using an external Python training | ||
process. During training, this external process communicates with the Academy | ||
object in the Unity scene to generate a block of agent experiences. These | ||
object in the Unity scene to generate a block of Agent experiences. These |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lowercased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All in this file should be lowercased.
docs/Training-PPO.md
Outdated
[Proximal Policy Optimization (PPO)](https://blog.openai.com/openai-baselines-ppo/). | ||
PPO uses a neural network to approximate the ideal function that maps an agent's | ||
observations to the best action an agent can take in a given state. The | ||
PPO uses a neural network to approximate the ideal function that maps an Agent's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lowercased.
@awjuliani hopefully last round of capitalizations :) . let me know if any last min changes needed. |
Looks good @unityjeffrey! Thanks for making all these changes. |
Only the checked items. There are some structure and flow issues to be resolved based on the directory changes. I will tie off with @dericp tomorrow.
Wanted to get these into review before I start the bigger changes.
Please see Github Documentation Pre Release Checklist for 0.5 on what has been addressed