Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider enabling some KHR_ICD_TRACE messages by default #211

Open
bashbaug opened this issue Mar 23, 2023 · 2 comments
Open

consider enabling some KHR_ICD_TRACE messages by default #211

bashbaug opened this issue Mar 23, 2023 · 2 comments

Comments

@bashbaug
Copy link
Contributor

One of our users was recently debugging a tricky library dependency issue that was preventing one of our OpenCL devices from enumerating. Enabling KHR_ICD_TRACE messages (via the OCL_ICD_ENABLE_TRACE environment variable) was a huge help debugging the issue, but most users (especially those who use OpenCL via higher-level language or libraries) won't know to set this environment variable. To help debug similar issues in the future does it make sense to enable some KHR_ICD_TRACE messages by default, say for exceptional conditions that are preventing OpenCL or OpenCL devices from functioning?

I can put together a more specific proposal but I wanted feedback on the overall concept first. Thanks!

@Kerilk
Copy link
Contributor

Kerilk commented Mar 23, 2023

I would very much dislike having a chatty library on my systems, especially in super computing environments where OpenCL drivers on front-end machines may be installed but fail to load at runtime because no GPU is found, but even on my laptop where sometimes leftover .icd files can be present. As a general rule I would very much frown at a library doing any kind of unsolicited output unless it outright crashed and tried to report diagnostics.

Instead, maybe we could provide a tool alongside the loader, an opencl-smi that would help debugging these issues?
It could be based on the same kind of approach as the cllayerinfo tool (or supplement it, because we may want to debug both drivers and layers), which would allow it to present problems in a more user-friendly fashion than the logs. It would also be easier to use than setting environment variables for non tech-savvy users

We could also propose debug versions of the loader with tracing enabled by default, especially if we work on Linux packaging to make distributing these easily.

And maybe we should work on a dedicated documentation page showing users how to troubleshoot device not popping up.

If we really wanted to go the logging route, we should do it correctly and interface with the OS log system, and I am not sure anybody wants to open that can of worms.

@bashbaug
Copy link
Contributor Author

I would very much dislike having a chatty library on my systems [...]

Yes, agreed. I was hopeful we could constrain any output to exceptional conditions, but maybe the conditions I was thinking were exceptional are common enough in some use-cases that this won't be possible.

Instead, maybe we could provide a tool alongside the loader, an opencl-smi that would help debugging these issues?

Interesting idea. This seems worth a few minutes discussion in an upcoming teleconference.

And maybe we should work on a dedicated documentation page showing users how to troubleshoot device not popping up.

This is a good idea too. I think this documentation could go here or in the OpenCL-Guide.

I could probably even adapt my OpenCL on Linux article for this purpose... though it also needs to be updated to include ICD loader trace messages...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants