Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel execution serialization #11

Open
yupinov opened this issue Apr 23, 2018 · 4 comments
Open

Kernel execution serialization #11

yupinov opened this issue Apr 23, 2018 · 4 comments

Comments

@yupinov
Copy link

yupinov commented Apr 23, 2018

Is there an option for making all the kernels execute sequentially (especially when work is launched in multiple queues)? Coming from CUDA and nvprof, I was surprised to not find such a feature for the better kernel performance understanding.

@chesik-amd
Copy link
Collaborator

When collecting performance counters, the profiler will introduce serialization to try to ensure that only one kernel is executing at a time. There is no option for this, as it is the default behavior.

@pszi1ard
Copy link

What about measuring performance in real-life environment under concurrent execution?

Additionally this seems to imply that traces in CodeXL can't be used to analyze kernel overlap?

@chesik-amd
Copy link
Collaborator

chesik-amd commented Feb 15, 2019

Serialization is only done when collecting performance counters (which is the mode you would use to analyze performance of individual kernels). No additional serialization is introduced when collecting a trace (which is the mode you would use to analyze an entire application (including kernel overlap)).

@pszi1ard
Copy link

I see. I'd suggest allowing serialization to be turned on/off.

Is there a way to measure wall-time only without serialization?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants