Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix thread leaks in the thread pool #1553

Merged
merged 1 commit into from
Nov 8, 2022

Conversation

kbenzie
Copy link
Contributor

@kbenzie kbenzie commented Oct 28, 2022

While testing an OpenCL driver with ThreadSanitizer enabled the OpenCL-CTS suffers from thread leaks in conversions and bruteforce on posix systems. This is because pthread_join is never called in ThreadPool_Exit for the pthread_ts created by the thread pool. Instead, the threads are only informed to stop waiting on the condition variable which unblocks the worker thread but does not clean up after itself.

ThreadPool: thread 1 exiting.
ThreadPool: thread 5 exiting.
ThreadPool: thread 4 exiting.
ThreadPool: thread 2 exiting.
ThreadPool: thread 7 exiting.
ThreadPool: thread 0 exiting.
ThreadPool: thread 3 exiting.
ThreadPool: thread 6 exiting.
Thread pool exited in a orderly fashion.
==================
WARNING: ThreadSanitizer: thread leak (pid=2292842)
  Thread T9 (tid=2292855, finished) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:969 (libtsan.so.0+0x5ad75)
    #1 ThreadPool_Init() <null> (test_conversions+0x35b2c)
    #2 pthread_once ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1449 (libtsan.so.0+0x4057c)
    #3 GetThreadCount() <null> (test_conversions+0x36262)
    #4 DoTest(_cl_device_id*, Type, Type, SaturationMode, RoundingMode, _MTdata*) [clone .isra.0] <null> (test_conversions+0x10555)
    #5 test_conversions(_cl_device_id*, _cl_context*, _cl_command_queue*, int) <null> (test_conversions+0x13226)
    #6 callSingleTestFunction(test_definition, _cl_device_id*, int, int, unsigned long) <null> (test_conversions+0x2e66d)
    #7 parseAndCallCommandLineTests(int, char const**, _cl_device_id*, int, test_definition*, int, unsigned long, int) <null> (test_conversions+0x2fb3a)
    #8 runTestHarnessWithCheck(int, char const**, int, test_definition*, int, unsigned long, test_status (*)(_cl_device_id*)) <null> (test_conversions+0x349d8)
    #9 main <null> (test_conversions+0xd725)

  And 7 more similar thread leaks.

SUMMARY: ThreadSanitizer: thread leak (OpenCL-CTS/buildbin/conversions/test_conversions+0x35b2c) in ThreadPool_Init()

This patch adds global state to keep track of the pthread_ts created by pthread_create in ThreadPool_Init. The list of pthread_ts is then used by ThreadPool_Exit to call pthread_join to cleanup the pthread_ts correctly.

A near identical example, and additional explanation, can be found on stackoverflow.

On the Windows path, a similar change is not necessary because _beginthread is used which automatically cleans up after itself when the worker thread function returns.

@b-sumner
Copy link
Contributor

Is this preferable to creating the threads detached?

@kbenzie
Copy link
Contributor Author

kbenzie commented Oct 31, 2022

Is this preferable to creating the threads detached?

For me yes, since this resolves the ThreadSanitizer issue I came across.

I can't speak to the original intent of the thread pool implementation though. Having looked at the code I don't really see a good reason to make these threads detached. This cleanup is happening after main exits in an atexit handler and all work should already be complete, joining seems like the sensible thing to do to me.

bashbaug
bashbaug previously approved these changes Oct 31, 2022
Copy link
Contributor

@bashbaug bashbaug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Longer term it would be nice to shift this thread pool stuff to use std::threads rather than OS-specific codepaths.

@bashbaug
Copy link
Contributor

Note: possibly related to #1266?

@kbenzie
Copy link
Contributor Author

kbenzie commented Nov 1, 2022

Longer term it would be nice to shift this thread pool stuff to use std::threads rather than OS-specific codepaths.

Fully agree, although that's a larger task than I have bandwidth for.

Note: possibly related to #1266?

It does look related, yeah. I haven't come across any data races though because in my situation the OpenCL-CTS itself was not compiled with -fsanitize=thread only the application using it. That still means the posix threading functions are intercepted though, which is why I found these thread leaks.

While testing an OpenCL driver with ThreadSanitizer enabled the
OpenCL-CTS suffers from thread leaks in conversions and bruteforce on
posix systems. This is because `pthread_join` is never called in
`ThreadPool_Exit` for the `pthread_t`s created by the thread pool.
Instead, the threads are only informed to stop waiting on the condition
variable which unblocks the worker thread but does not clean up after
itself.

```
ThreadPool: thread 1 exiting.
ThreadPool: thread 5 exiting.
ThreadPool: thread 4 exiting.
ThreadPool: thread 2 exiting.
ThreadPool: thread 7 exiting.
ThreadPool: thread 0 exiting.
ThreadPool: thread 3 exiting.
ThreadPool: thread 6 exiting.
Thread pool exited in a orderly fashion.
==================
WARNING: ThreadSanitizer: thread leak (pid=2292842)
  Thread T9 (tid=2292855, finished) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:969 (libtsan.so.0+0x5ad75)
    KhronosGroup#1 ThreadPool_Init() <null> (test_conversions+0x35b2c)
    KhronosGroup#2 pthread_once ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1449 (libtsan.so.0+0x4057c)
    KhronosGroup#3 GetThreadCount() <null> (test_conversions+0x36262)
    KhronosGroup#4 DoTest(_cl_device_id*, Type, Type, SaturationMode, RoundingMode, _MTdata*) [clone .isra.0] <null> (test_conversions+0x10555)
    KhronosGroup#5 test_conversions(_cl_device_id*, _cl_context*, _cl_command_queue*, int) <null> (test_conversions+0x13226)
    KhronosGroup#6 callSingleTestFunction(test_definition, _cl_device_id*, int, int, unsigned long) <null> (test_conversions+0x2e66d)
    KhronosGroup#7 parseAndCallCommandLineTests(int, char const**, _cl_device_id*, int, test_definition*, int, unsigned long, int) <null> (test_conversions+0x2fb3a)
    KhronosGroup#8 runTestHarnessWithCheck(int, char const**, int, test_definition*, int, unsigned long, test_status (*)(_cl_device_id*)) <null> (test_conversions+0x349d8)
    KhronosGroup#9 main <null> (test_conversions+0xd725)

  And 7 more similar thread leaks.

SUMMARY: ThreadSanitizer: thread leak (OpenCL-CTS/buildbin/conversions/test_conversions+0x35b2c) in ThreadPool_Init()
```

This patch adds global state to keep track of the `pthread_t`s created
by `pthread_create` in `ThreadPool_Init`. The list of `pthread_t`s is
then used by `ThreadPool_Exit` to call `pthread_join` to cleanup the
`pthread_t`s correctly.

A near identical example, and additional explanation, can be found on
[stackoverflow](https://stackoverflow.com/questions/72435574/thread-leak-detected-when-using-condition-variable-instead-of-join-with-pthrea).

On the Windows path, a similar change is not necessary because
`_beginthread` is used which automatically cleans up after itself when
the worker thread function returns.
@bashbaug
Copy link
Contributor

bashbaug commented Nov 8, 2022

Has two approvals, merging.

@bashbaug bashbaug merged commit 191fd0f into KhronosGroup:main Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants