Skip to content

Consider custom profile interval in performance_report#4601

Open
gjoseph92 wants to merge 2 commits into
dask:mainfrom
gjoseph92:perf-report-custom-interval
Open

Consider custom profile interval in performance_report#4601
gjoseph92 wants to merge 2 commits into
dask:mainfrom
gjoseph92:perf-report-custom-interval

Conversation

@gjoseph92

Copy link
Copy Markdown
Collaborator

plot_data assumes an interval of 0.010 seconds. When setting a custom distributed.worker.profile.interval, the plot would show incorrect time values.

  • Tests added / passed
  • Passes black distributed / flake8 distributed

@jrbourbeau jrbourbeau left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks for adding this @gjoseph92. It looks like there may be one more place we need to account for a custom profile_interval:

data = profile.plot_data(state)

@gjoseph92

Copy link
Copy Markdown
Collaborator Author

I wasn't sure how to handle that one. Do we want to assume the client's local config matches the scheduler's? (I don't think we should.) We could have the scheduler return its profile interval to the client along with the state—probably the best solution. But theoretically, workers could even all have different profiling intervals, in which case that would be inaccurate, though other logic (like merge) would also be wrong as well.

@jrbourbeau jrbourbeau left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point. Right now I think we're implicitly assuming that the same profiling interval is being used across the cluster, though you're right that there are scenarios where this isn't the case. The current set of changes here seem like a strict improvement. Would you like to get this in as is, or are there more changes you'd like to include here?

@gjoseph92

Copy link
Copy Markdown
Collaborator Author

I'll add a quick fix for client.profile as well—it still don't address all possible problems, but in practice it'll probably resolve it for most cases.

@gjoseph92 gjoseph92 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to go if this looks good to you

Comment thread distributed/client.py
workers = [workers]

state = await self.scheduler.profile(
state, interval = await self.scheduler.profile(

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrbourbeau I opted not to return (state, interval) from client.profile since it would be a breaking API change, but perhaps it's worth doing?

@jrbourbeau jrbourbeau left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there's some additional handling we need to do when calling get_profile inside performance_report now that scheduler.profile is returning the profiling interval

@gjoseph92 gjoseph92 requested a review from fjetter as a code owner January 23, 2024 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants