Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python timers #2180

Merged
merged 21 commits into from
Jul 11, 2019
Merged

python timers #2180

merged 21 commits into from
Jul 11, 2019

Conversation

chriselion
Copy link
Contributor

@chriselion chriselion commented Jun 24, 2019

Simple nested timers for profiling long-running jobs. To use, just import hierarchical_timer and use is as a context manager:

with hierarchical_timer("my_stuff"):
  do_stuff()

or

@timed # will generate a "my_func" timer
def my_func(x, y)
  return x + y

class Foo:
  @timed  # will generate a "Foo.bar" timer
  def bar()

Some follow-ups todo after the initial PR:

  • See if we can capture what the existing TrainerMetrics is doing, and delete the old code there
  • Track timings in the subprocess workers, and send stats back to the "main" process and merge them there.

@CLAassistant
Copy link

CLAassistant commented Jun 24, 2019

CLA assistant check
All committers have signed the CLA.

Represents the time spent in a block of code.
"""

__slots__ = ["children", "total", "count"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly a premature optimization, but this helps reduce the memory footprint and time overhead to the overall system.

Copy link
Contributor

@xiaomaogy xiaomaogy Jul 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow I didn't know this exists. This will be super useful once we use the profiling in a ton of places.

@@ -118,6 +120,11 @@ def _write_training_metrics(self):
if brain_name in self.trainer_metrics:
self.trainers[brain_name].write_training_metrics()

def _write_timing_tree(self) -> None:
timing_path = f"{self.summaries_dir}/{self.run_id}_timers.json"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to suggestions on where to put this/what to call it. But this will put it in the same path as the current TrainerMetrics csv files.

@@ -333,6 +343,7 @@ def advance(self, env: SubprocessEnvManager) -> int:
trainer.increment_step(len(new_step_infos))
if trainer.is_ready_update():
# Perform gradient descent with experience buffer
trainer.update_policy()
with hierarchical_timer("update_policy"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decorator is more concise and doesn't need the whitespace change, but this is probably better for parts of the code where we're making calls to an abstract class.



@contextmanager
def hierarchical_timer(name: str, timer_stack: TimerStack = None) -> Generator:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this just be called "timer" or is that too generic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good, better to be precise to help people understand.

@@ -0,0 +1,172 @@
# # Unity ML-Agents Toolkit
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better suggestions on where this should live?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good to me for the short term.

@chriselion
Copy link
Contributor Author

based on the current PR, here's how hallway looks today:

{
  "total": 0.0,
  "count": 0,
  "self": 0.0,
  "children": [
    {
      "name": "TrainerController.advance",
      "total": 32.731460376,
      "count": 957,
      "self": 1.4661682279999795,
      "children": [
        {
          "name": "env_step",
          "total": 14.123654692000024,
          "count": 957,
          "self": 0.1792060960000832,
          "children": [
            {
              "name": "SubprocessEnvManager._take_step",
              "total": 1.2891531169999557,
              "count": 957,
              "self": 0.05311957899995745,
              "children": [
                {
                  "name": "PPOPolicy.evaluate",
                  "total": 1.2360335379999983,
                  "count": 957,
                  "self": 1.2360335379999983
                }
              ]
            },
            {
              "name": "recv",
              "total": 12.655295478999985,
              "count": 957,
              "self": 12.655295478999985
            }
          ]
        },
        {
          "name": "update_policy",
          "total": 17.141637455999998,
          "count": 14,
          "self": 0.13363037500003472,
          "children": [
            {
              "name": "PPOPolicy.update",
              "total": 17.008007080999963,
              "count": 678,
              "self": 17.008007080999963
            }
          ]
        }
      ]
    }
  ]
}

@chriselion chriselion changed the title [proof of concept] hierarchical timers python timers Jul 10, 2019
self.total: float = 0.0
self.count: int = 0

def get_child(self, name: str) -> "TimerNode":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is a dumb question.. Why do you make the TimeNode into "TimeNode" here?

Copy link
Contributor Author

@chriselion chriselion Jul 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a dumb question at all. This explains it better than I can: https://mypy.readthedocs.io/en/latest/kinds_of_types.html#class-name-forward-references

The runtime can't parse without the string:

______________________________________ ERROR collecting mlagents/trainers/tests/test_reward_signals.py ______________________________________
venv/lib/python3.7/site-packages/py/_path/local.py:701: in pyimport
    __import__(modname)
ml-agents/mlagents/trainers/__init__.py:6: in <module>
    from .trainer import *
ml-agents/mlagents/trainers/trainer.py:8: in <module>
    from mlagents.trainers import TrainerMetrics
ml-agents/mlagents/trainers/__init__.py:8: in <module>
    from .trainer_controller import *
ml-agents/mlagents/trainers/trainer_controller.py:16: in <module>
    from mlagents.envs.subprocess_env_manager import SubprocessEnvManager
ml-agents-envs/mlagents/envs/subprocess_env_manager.py:9: in <module>
    from mlagents.envs.timers import timed, hierarchical_timer
ml-agents-envs/mlagents/envs/timers.py:35: in <module>
    class TimerNode:
ml-agents-envs/mlagents/envs/timers.py:47: in TimerNode
    def get_child(self, name: str) -> TimerNode:
E   NameError: name 'TimerNode' is not defined

It looks like they're not needed in __init__ though, so I'll remove those ones.

if child_list:
res["children"] = child_list

return res
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that based on this implementation, the root TimerNode is treated as a special node (its total, count is always 0, it doesn't have a name). Since this is a recursive call, should we set a base case so that the root TimerNode also has the corresponding values?

Copy link
Contributor Author

@chriselion chriselion Jul 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the handling of the root node is a little strange but I'm not sure what the best way to format the output is. A few ways I was thinking about:

  • Omit total, self, and count keys from the root
  • Just return the list of the root's children
  • For the root, set total = program execution time, self = total - sum(children), and count = 1

Sounds like you're suggesting the third one? Any other ways to do it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'm suggesting the third one. Also I'm suggesting adding a "name" attribute to the TimerNode class, and give the root TimerNode a key value pair as "name":"root". Right now you are adding the "name" attribute to the child TimerNode in the line 104, which was a little bit confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll do that (and I forgot about adding "name":"root")

I'll add some comments about the node names - since the parent has the name as the key, the node doesn't actually store it.


The total time and counts are tracked for each block of code; in this example "foo" and "context.foo" are considered
distinct blocks, and are tracked separately.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is helpful to point out(or give an example) that @timed is totally equivalent to "with hierarchical_timer()" given a certain condition (when function name equals the name passed in). So that for people who are not familiar with Decorator or doesn't want to go to the implementation, it is easier to understand?

@xiaomaogy
Copy link
Contributor

Overall looks good to me. I've learned a lot during the review.

@@ -284,8 +296,10 @@ def start_learning(
env_manager.close()
if self.train_model:
self._write_training_metrics()
self._write_timing_tree()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to write the timing tree even when we're not training (i.e., doing inference). I think if we just move it out of this if statement it will do that..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@ervteng ervteng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Might change the timer tree write to be always rather than just while training. This is going to be very useful.

@chriselion chriselion merged commit 3e242c7 into develop Jul 11, 2019
@chriselion chriselion deleted the develop-timers-poc branch July 11, 2019 22:25
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants