New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix profilers plot not being aligned to context manager enter time #9739
Conversation
Use them as bounds for visualized plots
Is there any way I can get someone to look at this? It includes relatively simple changes, but is a pretty big quality of life fix for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nudge @djhoese, apologies for the delayed response. The changes here generally seem good to me. Just one minor question
rprof.start_time = 0.0 | ||
rprof.end_time = 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgive me if this is incorrect, it's been a while since I looked at this. But from what I see in this PR, the rprof.clear()
call before this for
loop sets start_time
and end_time
to None
(which matches what was in the clear method before this PR). Now that start_time
and end_time
are actually used for something in the visualization they can't be None
because of the arithmetic operations being done with them.
...yeah, this line (218 of profile.py):
Previously only results
were sent to the visualize function for the ResourceProfiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
end_time here is arbitrary picked just to visualize the for the test, correct ? It could have also been a call to the current time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think that's right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking it over more, this start and end setting could be seen as the initialization of fake results/usage of the profiler. I suppose this could be made more clear with a fixture or a helper context manager.
I merged in main for another round of testing. Once tests pass I think we should be good to merge |
Thanks @djhoese . Merging in |
As mentioned in #9478, the Profiler and CacheProfiler plots X (time) range is set based on the first and last element in their results. Since these elements are based on when dask tasks are computed you can run into alignment issues when combining plots with the ResourceProfiler which starts collecting data when its context manager is entered.
This PR fixes this by adding a
start_time
andend_time
property to all of the profiles that are set in__enter__
and__exit__
. They are then used for plotting which makes sure the plots are generally aligned with in a couple milliseconds depending on how fast all the context manager's__enter__
methods are called and that they include all the data for the entire time spent inside the context.Some questions, todos, and other context:
with
block. Without this PR I was having the opposite issue where stuff was happening in the context that wasn't being shown in the default x-range of the plots (ex. resource profiler going on longer as something is being saved to disk)._
) because I don't really like writing tests that check private properties and I thought they could be useful to users.visualize
function and that backwards compatibility isn't an issue.I'll see if I can add some checks to the plot tests too.The plots are all relative time on their X axis so without diving into private properties on the bokeh objects I'm not sure how to check that the plot looks as expected.Here is what the plots look like for the "no starter task" case from the issue with the default view/zoom with this PR:
Note the extra space on the right of the plots. I think this is due to the time it takes to kill the ResourceProfiler's subprocess. If the
end_time
was set before the process was terminated this would probably get rid of this. I'm not sure if it matters in a real world case where these graphs will be larger than 5 seconds in a lot of cases.pre-commit run --all-files