diff --git a/docs/Basic-Guide.md b/docs/Basic-Guide.md index 91cee039c1..fb0f627e50 100644 --- a/docs/Basic-Guide.md +++ b/docs/Basic-Guide.md @@ -109,7 +109,41 @@ perform training. Please refer to [this page](Learning-Environment-Executable.md) for instructions on how to build and use an executable. -![Training command example](images/training-command-example.png) +```console +ml-agents$ mlagents-learn config/trainer_config.yaml --run-id=first-run --train + + + ▄▄▄▓▓▓▓ + ╓▓▓▓▓▓▓█▓▓▓▓▓ + ,▄▄▄m▀▀▀' ,▓▓▓▀▓▓▄ ▓▓▓ ▓▓▌ + ▄▓▓▓▀' ▄▓▓▀ ▓▓▓ ▄▄ ▄▄ ,▄▄ ▄▄▄▄ ,▄▄ ▄▓▓▌▄ ▄▄▄ ,▄▄ + ▄▓▓▓▀ ▄▓▓▀ ▐▓▓▌ ▓▓▌ ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌ ╒▓▓▌ + ▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓ ▓▀ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▄ ▓▓▌ + ▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄ ▓▓ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▐▓▓ + ^█▓▓▓ ▀▓▓▄ ▐▓▓▌ ▓▓▓▓▄▓▓▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▓▄ ▓▓▓▓` + '▀▓▓▓▄ ^▓▓▓ ▓▓▓ └▀▀▀▀ ▀▀ ^▀▀ `▀▀ `▀▀ '▀▀ ▐▓▓▌ + ▀▀▀▀▓▄▄▄ ▓▓▓▓▓▓, ▓▓▓▓▀ + `▀█▓▓▓▓▓▓▓▓▓▌ + ¬`▀▀▀█▓ + + +INFO:mlagents.learn:{'--curriculum': 'None', + '--docker-target-name': 'Empty', + '--env': 'None', + '--help': False, + '--keep-checkpoints': '5', + '--lesson': '0', + '--load': False, + '--no-graphics': False, + '--num-runs': '1', + '--run-id': 'first-run', + '--save-freq': '50000', + '--seed': '-1', + '--slow': False, + '--train': True, + '--worker-id': '0', + '': 'config/trainer_config.yaml'} + ``` **Note**: If you're using Anaconda, don't forget to activate the ml-agents environment first. @@ -117,7 +151,56 @@ environment first. If `mlagents-learn` runs correctly and starts training, you should see something like this: -![Training running](images/training-running.png) +```console +INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor. +INFO:mlagents.envs: +'Ball3DAcademy' started successfully! +Unity Academy name: Ball3DAcademy + Number of Brains: 1 + Number of External Brains : 1 + Reset Parameters : + +Unity brain name: Ball3DBrain + Number of Visual Observations (per agent): 0 + Vector Observation space size (per agent): 8 + Number of stacked Vector Observation: 1 + Vector Action space type: continuous + Vector Action space size (per agent): [2] + Vector Action descriptions: , +INFO:mlagents.envs:Hyperparameters for the PPO Trainer of brain Ball3DBrain: + batch_size: 64 + beta: 0.001 + buffer_size: 12000 + epsilon: 0.2 + gamma: 0.995 + hidden_units: 128 + lambd: 0.99 + learning_rate: 0.0003 + max_steps: 5.0e4 + normalize: True + num_epoch: 3 + num_layers: 2 + time_horizon: 1000 + sequence_length: 64 + summary_freq: 1000 + use_recurrent: False + graph_scope: + summary_path: ./summaries/first-run-0 + memory_size: 256 + use_curiosity: False + curiosity_strength: 0.01 + curiosity_enc_size: 128 +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 1000. Mean Reward: 1.242. Std of Reward: 0.746. Training. +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 2000. Mean Reward: 1.319. Std of Reward: 0.693. Training. +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 3000. Mean Reward: 1.804. Std of Reward: 1.056. Training. +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 4000. Mean Reward: 2.151. Std of Reward: 1.432. Training. +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 5000. Mean Reward: 3.175. Std of Reward: 2.250. Training. +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 6000. Mean Reward: 4.898. Std of Reward: 4.019. Training. +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 7000. Mean Reward: 6.716. Std of Reward: 5.125. Training. +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 8000. Mean Reward: 12.124. Std of Reward: 11.929. Training. +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 9000. Mean Reward: 18.151. Std of Reward: 16.871. Training. +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 10000. Mean Reward: 27.284. Std of Reward: 28.667. Training. +``` ### After training diff --git a/docs/Learning-Environment-Executable.md b/docs/Learning-Environment-Executable.md index 89131ea93b..ca11d36399 100644 --- a/docs/Learning-Environment-Executable.md +++ b/docs/Learning-Environment-Executable.md @@ -100,10 +100,46 @@ For example, if you are training with a 3DBall executable you exported to the ml-agents/python directory, run: ```sh -mlagents-learn config/trainer_config.yaml --env=3DBall --run-id=firstRun --train +mlagents-learn config/trainer_config.yaml --env=3DBall --run-id=first-run --train ``` -![Training command example](images/training-command-example.png) +And you should see something like + +```console +ml-agents$ mlagents-learn config/trainer_config.yaml --env=3DBall --run-id=first-run --train + + + ▄▄▄▓▓▓▓ + ╓▓▓▓▓▓▓█▓▓▓▓▓ + ,▄▄▄m▀▀▀' ,▓▓▓▀▓▓▄ ▓▓▓ ▓▓▌ + ▄▓▓▓▀' ▄▓▓▀ ▓▓▓ ▄▄ ▄▄ ,▄▄ ▄▄▄▄ ,▄▄ ▄▓▓▌▄ ▄▄▄ ,▄▄ + ▄▓▓▓▀ ▄▓▓▀ ▐▓▓▌ ▓▓▌ ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌ ╒▓▓▌ + ▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓ ▓▀ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▄ ▓▓▌ + ▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄ ▓▓ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▐▓▓ + ^█▓▓▓ ▀▓▓▄ ▐▓▓▌ ▓▓▓▓▄▓▓▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▓▄ ▓▓▓▓` + '▀▓▓▓▄ ^▓▓▓ ▓▓▓ └▀▀▀▀ ▀▀ ^▀▀ `▀▀ `▀▀ '▀▀ ▐▓▓▌ + ▀▀▀▀▓▄▄▄ ▓▓▓▓▓▓, ▓▓▓▓▀ + `▀█▓▓▓▓▓▓▓▓▓▌ + ¬`▀▀▀█▓ + + +INFO:mlagents.learn:{'--curriculum': 'None', + '--docker-target-name': 'Empty', + '--env': '3DBall', + '--help': False, + '--keep-checkpoints': '5', + '--lesson': '0', + '--load': False, + '--no-graphics': False, + '--num-runs': '1', + '--run-id': 'firstRun', + '--save-freq': '50000', + '--seed': '-1', + '--slow': False, + '--train': True, + '--worker-id': '0', + '': 'config/trainer_config.yaml'} +``` **Note**: If you're using Anaconda, don't forget to activate the ml-agents environment first. @@ -111,7 +147,60 @@ environment first. If `mlagents-learn` runs correctly and starts training, you should see something like this: -![Training running](images/training-running.png) +```console +CrashReporter: initialized +Mono path[0] = '/Users/dericp/workspace/ml-agents/3DBall.app/Contents/Resources/Data/Managed' +Mono config path = '/Users/dericp/workspace/ml-agents/3DBall.app/Contents/MonoBleedingEdge/etc' +INFO:mlagents.envs: +'Ball3DAcademy' started successfully! +INFO:mlagents.envs: +'Ball3DAcademy' started successfully! +Unity Academy name: Ball3DAcademy + Number of Brains: 1 + Number of External Brains : 1 + Reset Parameters : + +Unity brain name: Ball3DBrain + Number of Visual Observations (per agent): 0 + Vector Observation space size (per agent): 8 + Number of stacked Vector Observation: 1 + Vector Action space type: continuous + Vector Action space size (per agent): [2] + Vector Action descriptions: , +INFO:mlagents.envs:Hyperparameters for the PPO Trainer of brain Ball3DBrain: + batch_size: 64 + beta: 0.001 + buffer_size: 12000 + epsilon: 0.2 + gamma: 0.995 + hidden_units: 128 + lambd: 0.99 + learning_rate: 0.0003 + max_steps: 5.0e4 + normalize: True + num_epoch: 3 + num_layers: 2 + time_horizon: 1000 + sequence_length: 64 + summary_freq: 1000 + use_recurrent: False + graph_scope: + summary_path: ./summaries/first-run-0 + memory_size: 256 + use_curiosity: False + curiosity_strength: 0.01 + curiosity_enc_size: 128 +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 1000. Mean Reward: 1.242. Std of Reward: 0.746. Training. +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 2000. Mean Reward: 1.319. Std of Reward: 0.693. Training. +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 3000. Mean Reward: 1.804. Std of Reward: 1.056. Training. +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 4000. Mean Reward: 2.151. Std of Reward: 1.432. Training. +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 5000. Mean Reward: 3.175. Std of Reward: 2.250. Training. +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 6000. Mean Reward: 4.898. Std of Reward: 4.019. Training. +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 7000. Mean Reward: 6.716. Std of Reward: 5.125. Training. +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 8000. Mean Reward: 12.124. Std of Reward: 11.929. Training. +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 9000. Mean Reward: 18.151. Std of Reward: 16.871. Training. +INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 10000. Mean Reward: 27.284. Std of Reward: 28.667. Training. +``` You can press Ctrl+C to stop the training, and your trained model will be at `models//_.bytes`, which corresponds diff --git a/docs/Migrating.md b/docs/Migrating.md index cd88716af5..cd56465e77 100644 --- a/docs/Migrating.md +++ b/docs/Migrating.md @@ -20,7 +20,8 @@ * In order to run a training session, you can now use the command `mlagents-learn` instead of `python3 learn.py` after installing the `mlagents` - packages. This change is documented [here](Training-ML-Agents.md#training-with-mlagents-learn). + packages. This change is documented + [here](Training-ML-Agents.md#training-with-mlagents-learn). * It is now required to specify the path to the yaml trainer configuration file when running `mlagents-learn`. For example, see [trainer_config.yaml](../config/trainer_config.yaml). diff --git a/docs/images/training-command-example.png b/docs/images/training-command-example.png deleted file mode 100644 index 8634226c7f..0000000000 Binary files a/docs/images/training-command-example.png and /dev/null differ diff --git a/docs/images/training-running.png b/docs/images/training-running.png deleted file mode 100644 index a6325186d7..0000000000 Binary files a/docs/images/training-running.png and /dev/null differ