Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leaks in libOpenCL.so.1!khrIcdVendorAdd #157

Open
MichaelTutin opened this issue Nov 12, 2021 · 6 comments · May be fixed by #217
Open

Memory leaks in libOpenCL.so.1!khrIcdVendorAdd #157

MichaelTutin opened this issue Nov 12, 2021 · 6 comments · May be fixed by #217

Comments

@MichaelTutin
Copy link

Intel Inspector detects memory leaks for two allocations in khrIcdVendorAdd():

  1. Error: Memory leak: 16 Bytes:

libOpenCL.so.1!khrIcdVendorAdd - /OpenCL-ICD-Loader/loader/icd.c:135
libOpenCL.so.1!khrIcdOsVendorsEnumerate - /OpenCL-ICD-Loader/loader/linux/icd_linux.c:125

  1. Error: Memory leak: 80 Bytes

libOpenCL.so.1!khrIcdVendorAdd - /OpenCL-ICD-Loader/loader/icd.c:153
libOpenCL.so.1!khrIcdOsVendorsEnumerate - /OpenCL-ICD-Loader/loader/linux/icd_linux.c:125

Looking at the code in icd.c is seems that vendor structure is initialized and added into khrIcdVendors global list, but there is no explicit deinitialization function that would free those elements. Probably one of the options to fix that issue would be to add a function with _attribute_((destructor)) on Linux or DllMain() handler on Windows to handle library unload event and deallocate those blocks.

@Kerilk
Copy link
Contributor

Kerilk commented Nov 12, 2021

Thanks for reporting the issue.
The termination of the loader is an open problem. I'll try to summarize the issues here:

  • We leak memory,
  • Libraries are not unloaded (drivers, layers),
  • We don't have a termination API for layers (so they must use at_exit() for termination).

Adding termination to the loader nonetheless may present a risk that user application (less likely) or libraries (more likely) would break, especially those that rely on at_exit or similar callbacks (C++ static destructors?) that call OpenCL APIs (usually to release OpenCL resources), and that would be called after the loader is de-initialized.

I think this topic needs to be addressed, but needs careful discussion/consideration. It should be noted that race conditions in shared libraries loading/unloading is not a subject I am particularly confident discussing, so everything I say here should be taken with a grain of salt.

note: ocl-icd is behaving similarly.

note: the layer infrastructure may be able to help here: we could add a no-op stub/layer once we enter de-initialization (and either always return SUCCESS or a dedicated error code?). This should prevent segfaults inside the loader. I think CUDA has a similar mechanism.

@Yanfeng-Mi
Copy link

Any updates on this issue?

@wenjiew
Copy link

wenjiew commented Mar 7, 2023

Is there any update on fixing this issue? Thanks!

@Rikyu1
Copy link

Rikyu1 commented Mar 15, 2023

How can we fix this issue?

JablonskiMateusz added a commit to JablonskiMateusz/OpenCL-ICD-Loader that referenced this issue Apr 11, 2023
fixes: KhronosGroup#157
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
@JablonskiMateusz
Copy link

Created PR with fix: #217

@bashbaug
Copy link
Contributor

bashbaug commented May 2, 2023

One item we briefly discussed in last week's teleconference: We recently added the ability to have "loader extensions" (see the ICD loader info PR #188). Could we add a similar "loader extension" that allows an application to safely and explicitly de-initialize the ICD loader on the application's schedule, rather than de-initializing the ICD loader implicitly when the ICD loader is unloaded?

This isn't a perfect solution because it requires an explicit call to "clean up", but it avoids some of the unloading race condition concerns described above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants