From 2d3b7eaf85e93995ffa3961bf016c317a0d1eac6 Mon Sep 17 00:00:00 2001 From: Chris Elion Date: Wed, 24 Jul 2019 12:42:40 -0700 Subject: [PATCH 1/3] profiling docs --- docs/Profiling.md | 53 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) create mode 100644 docs/Profiling.md diff --git a/docs/Profiling.md b/docs/Profiling.md new file mode 100644 index 0000000000..677df31850 --- /dev/null +++ b/docs/Profiling.md @@ -0,0 +1,53 @@ +# Profiling ML-Agents in Python + +ml-agents provides a lightweight profiling system, in order to identity hotspots in the training process and help spot +regressions from changes. + +Timers are hierarchical, meaning that the time tracked in a block of code can be further split into other blocks if +desired. This also means that a function that is called from multiple places in the code will appear in multiple +places in the timing output. + +All timers operate using a "global" instance by default, but this can be overridden if necessary (mainly for testing). + +## Adding Profiling + +There are two ways to indicate code should be included in profiling. The simplest way is to add the `@timed` +decorator to a function or method of interested. + +```python +class TrainerController: + # .... + @timed + def advance(self, env: EnvManager) -> int: + # do stuff +``` + +You can also used the `hierarchical_timer` context manager. + + +``` python +with hierarchical_timer("communicator.exchange"): + outputs = self.communicator.exchange(step_input) +``` + +The context manager may be easier than the `@timed` decorator for profiling different parts of a large function, or +profiling calls to abstract methods that might not use decorator. + +## Output +By default, at the end of training, timers are collected and written in json format to +`{summaries_dir}/{run_id}_timers.json`. The output consists of node objects with the following keys: + * name (string): The name of the block of code. + * total (float): The total time in seconds spent in the block, including child calls. + * count (int): The number of times the block was called. + * self (float): The total time in seconds spent in the block, excluding child calls. + * children (list): A list of child nodes. + * is_parallel (bool): Indicates that the block of code was executed in multiple threads or processes (see below). This + is optional and defaults to false. + +### Parallel execution +For code that executes in multiple processes (for example, SubprocessEnvManager), we periodically send the timer +information back to the "main" process, aggregate the timers there, and flush them in the subprocess. Note that +(depending on the number of processes) this can result in timers where the total time may exceed the parent's total +time. This is analogous to the difference between "real" and "user" values reported from the unix `time` command. In the +timer output, blocks that were run in parallel are indicated by the `is_parallel` flag. + From 22ccf428ae1066c7fd88c8733ca7696df2562c7c Mon Sep 17 00:00:00 2001 From: Chris Elion Date: Wed, 24 Jul 2019 17:24:00 -0700 Subject: [PATCH 2/3] clean up debug option, move csv info --- docs/Profiling.md | 2 +- docs/Training-ML-Agents.md | 18 ++++++++++++++---- 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/docs/Profiling.md b/docs/Profiling.md index 677df31850..1fc28dd314 100644 --- a/docs/Profiling.md +++ b/docs/Profiling.md @@ -1,6 +1,6 @@ # Profiling ML-Agents in Python -ml-agents provides a lightweight profiling system, in order to identity hotspots in the training process and help spot +ML-Agents provides a lightweight profiling system, in order to identity hotspots in the training process and help spot regressions from changes. Timers are hierarchical, meaning that the time tracked in a block of code can be further split into other blocks if diff --git a/docs/Training-ML-Agents.md b/docs/Training-ML-Agents.md index da597e584b..c1e1489c51 100644 --- a/docs/Training-ML-Agents.md +++ b/docs/Training-ML-Agents.md @@ -144,10 +144,7 @@ environment, you can set the following command line options when invoking training doesn't involve visual observations (reading from Pixels). See [here](https://docs.unity3d.com/Manual/CommandLineArguments.html) for more details. -* `--debug` - Specify this option to run ML-Agents in debug mode and log Trainer - Metrics to a CSV stored in the `summaries` directory. The metrics stored are: - brain name, time to update policy, time since start of training, time for last experience collection, number of experiences used for training, mean return. This - option is not available currently for Imitation Learning. +* `--debug` - Specify this option to enable debug-level logging for some parts of the code. ### Training config file @@ -204,3 +201,16 @@ You can also compare the to the corresponding sections of the `config/trainer_config.yaml` file for each example to see how the hyperparameters and other configuration variables have been changed from the defaults. + +### Output metrics +Trainer Metrics are logged to a CSV stored in the `summaries` directory. The metrics stored are: + * brain name + * time to update policy + * time since start of training + * time for last experience collection + * number of experiences used for training + * mean return + +This option is not available currently for Imitation Learning. + +[Profiling](Profiling.md) information is also saved in the `summaries` directory. From 14f4751b8c72e0ce4a7071a90cd0fcd26630cb7b Mon Sep 17 00:00:00 2001 From: Chris Elion Date: Wed, 24 Jul 2019 17:32:15 -0700 Subject: [PATCH 3/3] Imitation Learning -> Behavioral Cloning --- docs/Training-ML-Agents.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/Training-ML-Agents.md b/docs/Training-ML-Agents.md index c1e1489c51..8b7b938ac1 100644 --- a/docs/Training-ML-Agents.md +++ b/docs/Training-ML-Agents.md @@ -211,6 +211,6 @@ Trainer Metrics are logged to a CSV stored in the `summaries` directory. The met * number of experiences used for training * mean return -This option is not available currently for Imitation Learning. +This option is not available currently for Behavioral Cloning. [Profiling](Profiling.md) information is also saved in the `summaries` directory.