-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python timers #2180
python timers #2180
Conversation
Represents the time spent in a block of code. | ||
""" | ||
|
||
__slots__ = ["children", "total", "count"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly a premature optimization, but this helps reduce the memory footprint and time overhead to the overall system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow I didn't know this exists. This will be super useful once we use the profiling in a ton of places.
@@ -118,6 +120,11 @@ def _write_training_metrics(self): | |||
if brain_name in self.trainer_metrics: | |||
self.trainers[brain_name].write_training_metrics() | |||
|
|||
def _write_timing_tree(self) -> None: | |||
timing_path = f"{self.summaries_dir}/{self.run_id}_timers.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to suggestions on where to put this/what to call it. But this will put it in the same path as the current TrainerMetrics csv files.
@@ -333,6 +343,7 @@ def advance(self, env: SubprocessEnvManager) -> int: | |||
trainer.increment_step(len(new_step_infos)) | |||
if trainer.is_ready_update(): | |||
# Perform gradient descent with experience buffer | |||
trainer.update_policy() | |||
with hierarchical_timer("update_policy"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decorator is more concise and doesn't need the whitespace change, but this is probably better for parts of the code where we're making calls to an abstract class.
|
||
|
||
@contextmanager | ||
def hierarchical_timer(name: str, timer_stack: TimerStack = None) -> Generator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this just be called "timer" or is that too generic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good, better to be precise to help people understand.
@@ -0,0 +1,172 @@ | |||
# # Unity ML-Agents Toolkit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better suggestions on where this should live?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems good to me for the short term.
based on the current PR, here's how hallway looks today: {
"total": 0.0,
"count": 0,
"self": 0.0,
"children": [
{
"name": "TrainerController.advance",
"total": 32.731460376,
"count": 957,
"self": 1.4661682279999795,
"children": [
{
"name": "env_step",
"total": 14.123654692000024,
"count": 957,
"self": 0.1792060960000832,
"children": [
{
"name": "SubprocessEnvManager._take_step",
"total": 1.2891531169999557,
"count": 957,
"self": 0.05311957899995745,
"children": [
{
"name": "PPOPolicy.evaluate",
"total": 1.2360335379999983,
"count": 957,
"self": 1.2360335379999983
}
]
},
{
"name": "recv",
"total": 12.655295478999985,
"count": 957,
"self": 12.655295478999985
}
]
},
{
"name": "update_policy",
"total": 17.141637455999998,
"count": 14,
"self": 0.13363037500003472,
"children": [
{
"name": "PPOPolicy.update",
"total": 17.008007080999963,
"count": 678,
"self": 17.008007080999963
}
]
}
]
}
]
} |
self.total: float = 0.0 | ||
self.count: int = 0 | ||
|
||
def get_child(self, name: str) -> "TimerNode": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is a dumb question.. Why do you make the TimeNode into "TimeNode" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a dumb question at all. This explains it better than I can: https://mypy.readthedocs.io/en/latest/kinds_of_types.html#class-name-forward-references
The runtime can't parse without the string:
______________________________________ ERROR collecting mlagents/trainers/tests/test_reward_signals.py ______________________________________
venv/lib/python3.7/site-packages/py/_path/local.py:701: in pyimport
__import__(modname)
ml-agents/mlagents/trainers/__init__.py:6: in <module>
from .trainer import *
ml-agents/mlagents/trainers/trainer.py:8: in <module>
from mlagents.trainers import TrainerMetrics
ml-agents/mlagents/trainers/__init__.py:8: in <module>
from .trainer_controller import *
ml-agents/mlagents/trainers/trainer_controller.py:16: in <module>
from mlagents.envs.subprocess_env_manager import SubprocessEnvManager
ml-agents-envs/mlagents/envs/subprocess_env_manager.py:9: in <module>
from mlagents.envs.timers import timed, hierarchical_timer
ml-agents-envs/mlagents/envs/timers.py:35: in <module>
class TimerNode:
ml-agents-envs/mlagents/envs/timers.py:47: in TimerNode
def get_child(self, name: str) -> TimerNode:
E NameError: name 'TimerNode' is not defined
It looks like they're not needed in __init__
though, so I'll remove those ones.
if child_list: | ||
res["children"] = child_list | ||
|
||
return res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that based on this implementation, the root TimerNode is treated as a special node (its total, count is always 0, it doesn't have a name). Since this is a recursive call, should we set a base case so that the root TimerNode also has the corresponding values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the handling of the root node is a little strange but I'm not sure what the best way to format the output is. A few ways I was thinking about:
- Omit total, self, and count keys from the root
- Just return the list of the root's children
- For the root, set total = program execution time, self = total - sum(children), and count = 1
Sounds like you're suggesting the third one? Any other ways to do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'm suggesting the third one. Also I'm suggesting adding a "name" attribute to the TimerNode class, and give the root TimerNode a key value pair as "name":"root". Right now you are adding the "name" attribute to the child TimerNode in the line 104, which was a little bit confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'll do that (and I forgot about adding "name":"root"
)
I'll add some comments about the node names - since the parent has the name as the key, the node doesn't actually store it.
|
||
The total time and counts are tracked for each block of code; in this example "foo" and "context.foo" are considered | ||
distinct blocks, and are tracked separately. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is helpful to point out(or give an example) that @timed is totally equivalent to "with hierarchical_timer()" given a certain condition (when function name equals the name passed in). So that for people who are not familiar with Decorator or doesn't want to go to the implementation, it is easier to understand?
Overall looks good to me. I've learned a lot during the review. |
@@ -284,8 +296,10 @@ def start_learning( | |||
env_manager.close() | |||
if self.train_model: | |||
self._write_training_metrics() | |||
self._write_timing_tree() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to write the timing tree even when we're not training (i.e., doing inference). I think if we just move it out of this if statement it will do that..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Might change the timer tree write to be always rather than just while training. This is going to be very useful.
Simple nested timers for profiling long-running jobs. To use, just import
hierarchical_timer
and use is as a context manager:or
Some follow-ups todo after the initial PR: