Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use _fmutex for pthread_mutex #9

Closed
dmik opened this issue Feb 19, 2021 · 6 comments
Closed

Use _fmutex for pthread_mutex #9

dmik opened this issue Feb 19, 2021 · 6 comments
Assignees

Comments

@dmik
Copy link
Contributor

dmik commented Feb 19, 2021

Currently, the pthread_mutex API implementation internally uses OS/2 mutexes (DosCreateMutexSem). This is not a good idea because Linux applications may create LOTS of pthread mutexes (one example is bitwiseworks/qtwebengine-chromium-os2#31) and on OS/2 there may be only 65512 of them per process (due to kernel 16-bitness). And besides that, the mutex is a rather expensive kernel primitive.

I will replace OS/2 mutexes with LIBC _fmutex primitives because they are in fact cheaper: they only create an OS/2 kernel primitive when there is a definite need for that: when two threads happen to request (lock) the mutex at the very same time. In this case, the thread which is a bit slower will create an OS/2 event semaphore (which is also cheaper than an OS/2 mutex) and wait on it until the first thread releases the lock. If there is only one thread locking and unlocking the mutex at the same time (and no overlaps with other threads wanting to do the same), then the semaphore is not created and locking is guarded by an atomic CPU operation.

Note that the original pthreads-win32 code our implementation is based upon, uses exactly the same optimization: they use an atomic flag variable and a Win32 event object. However, when @ydario imported it 10 years ago (see e4528d9), he decided to go with an OS/2 mutex instead — for simplicity, I guess. But now it shot us in the back.

As an alternative, I could reimport it again now and implement it using an atomic flag and an OS/2 event semaphore as well. This would also bring things like PTHREAD_MUTEX_ROBUST attr support (and all the other "current" pthread stuff) but it looks like a lot of work for now, given that I'm totally occupied with Chromium. So I will go with _fmutex for now and postpone the task of resynching for later.

@dmik dmik self-assigned this Feb 19, 2021
@dmik
Copy link
Contributor Author

dmik commented Feb 19, 2021

BTW, _fmutex mutexes don't support recursion. But many Linux apps need that (requested via PTHREAD_MUTEX_ROBUST attribute). However, our current implementation does not handle the pthread_mutex_attr_t argument in pthread_mutex_init at all — it is simply ignored now. The reason why it worked so far is because the OS/2 mutex used there for now is always recursive. So, in fact, non-recursive behavior (the default for pthread mutexes) was not supported at all. It could actually affect some apps (deadlocks instead of failures).

I will have to implement that part now. In Chromium, at least sqlite3 uses it.

@dmik
Copy link
Contributor Author

dmik commented Feb 21, 2021

I made it use _fmutex and it seems to work in Chromium but somehow the main browser process hangs up at termination (in pthread_cond_wait). Not sure why so far.

@dmik
Copy link
Contributor Author

dmik commented Feb 22, 2021

I was lucky to catch the hang in the debug build and here's the stack trace:

image

Somehow some sqlite3 lock cannot be acquired at termination. I bet this is a recursive pthread mutex...

@dmik
Copy link
Contributor Author

dmik commented Feb 23, 2021

Ok, the above was my own glitch in the new implementation, fixed now. However, I'm now noticing that the SIMPLEBROWSER process does not always terminate cleanly: sometimes it returns from main() but no atexit handlers are called, sometimes it simply does not even return from main() but just exits (no traces of any crash, no assertions or such).

This doesn't seem to be related to the kind of mutex PTHREAD is using though — happens with both the new impl and the old impl. Will keep looking outside this ticket (and going to close this one after some more testing).

@dmik
Copy link
Contributor Author

dmik commented Feb 23, 2021

As for the strange process termination, I created bitwiseworks/qtwebengine-chromium-os2#32 to track it.

@StevenLevine
Copy link

StevenLevine commented Feb 24, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants