Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix profilers plot not being aligned to context manager enter time #9739

Merged
merged 4 commits into from Feb 23, 2023

Conversation

djhoese
Copy link
Contributor

@djhoese djhoese commented Dec 9, 2022

As mentioned in #9478, the Profiler and CacheProfiler plots X (time) range is set based on the first and last element in their results. Since these elements are based on when dask tasks are computed you can run into alignment issues when combining plots with the ResourceProfiler which starts collecting data when its context manager is entered.

This PR fixes this by adding a start_time and end_time property to all of the profiles that are set in __enter__ and __exit__. They are then used for plotting which makes sure the plots are generally aligned with in a couple milliseconds depending on how fast all the context manager's __enter__ methods are called and that they include all the data for the entire time spent inside the context.

Some questions, todos, and other context:

  1. In general my opinion is that if you don't want something to show up in the plots/profilers then it shouldn't happen inside the with block. Without this PR I was having the opposite issue where stuff was happening in the context that wasn't being shown in the default x-range of the plots (ex. resource profiler going on longer as something is being saved to disk).
  2. I made the start_time and end_time public (no _) because I don't really like writing tests that check private properties and I thought they could be useful to users.
  3. This PR assumes the plotting functions used by each profiler are not used outside of the visualize function and that backwards compatibility isn't an issue.
  4. I'll see if I can add some checks to the plot tests too. The plots are all relative time on their X axis so without diving into private properties on the bokeh objects I'm not sure how to check that the plot looks as expected.

Here is what the plots look like for the "no starter task" case from the issue with the default view/zoom with this PR:

image

Note the extra space on the right of the plots. I think this is due to the time it takes to kill the ResourceProfiler's subprocess. If the end_time was set before the process was terminated this would probably get rid of this. I'm not sure if it matters in a real world case where these graphs will be larger than 5 seconds in a lot of cases.

@djhoese djhoese changed the title Fix CacheProfiler plot not being aligned to context manager enter time Fix profilers plot not being aligned to context manager enter time Dec 9, 2022
@github-actions github-actions bot added the needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. label Jan 9, 2023
@djhoese
Copy link
Contributor Author

djhoese commented Jan 13, 2023

Is there any way I can get someone to look at this? It includes relatively simple changes, but is a pretty big quality of life fix for me.

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nudge @djhoese, apologies for the delayed response. The changes here generally seem good to me. Just one minor question

Comment on lines +280 to +281
rprof.start_time = 0.0
rprof.end_time = 1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgive me if this is incorrect, it's been a while since I looked at this. But from what I see in this PR, the rprof.clear() call before this for loop sets start_time and end_time to None (which matches what was in the clear method before this PR). Now that start_time and end_time are actually used for something in the visualization they can't be None because of the arithmetic operations being done with them.

...yeah, this line (218 of profile.py):

https://github.com/dask/dask/pull/9739/files#diff-d1ec632f95ddf2554df9d54f70b8980b9f24699be167aa97c8b724c18a7da88dR218

Previously only results were sent to the visualize function for the ResourceProfiler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

end_time here is arbitrary picked just to visualize the for the test, correct ? It could have also been a call to the current time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think that's right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking it over more, this start and end setting could be seen as the initialization of fake results/usage of the profiler. I suppose this could be made more clear with a fixture or a helper context manager.

@quasiben
Copy link
Member

I merged in main for another round of testing. Once tests pass I think we should be good to merge

@quasiben
Copy link
Member

Thanks @djhoese . Merging in

@quasiben quasiben merged commit 970da68 into dask:main Feb 23, 2023
@djhoese djhoese deleted the bugfix-diag-plot-alignment branch February 23, 2023 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Diagnostics graphs are not time aligned
3 participants