Changed to use posix robust mutex #67

PomLover · 2018-11-09T20:29:58Z

Changed to use posix robust mutex to allow subsequent mutex lock can grab the lock from previous dead owner to void deadlock. And it also fixed the issue #65.

Hi Reviewers,
We ran into some issues while we are using boost::interprocess::named_recursive_mutex with posix implementation when the previous owner of the mutex has been dead without releasing the mutex, so all others which are waiting for the same mutex will be blocked forever. And that's why I added this fix in posix mutex implementation to avoid this issue.

…grab the lock from previous dead owner to void deadlock. And it also fixed the issue boostorg#65.

363568233 · 2018-12-11T05:42:03Z

Which boost version will update it?

PomLover · 2018-12-11T17:13:40Z

I don't know, it depends on when the maintainer can merge it.

363568233 · 2018-12-17T06:53:12Z

It only changed recursive_mutex.hpp, there is another mutex.hpp need to change？

363568233 · 2018-12-17T07:03:58Z

pthread_mutex_consistent(&m_mut); after it
Whether it should be use pthread_mutex_unlock(&m_mut) to unlock it?

PomLover · 2018-12-17T16:11:07Z

pthread_mutex_consistent(&m_mut); after it
Whether it should be use pthread_mutex_unlock(&m_mut) to unlock it?

The case is going to solve is when the new request to lock the mutex if the previous owner has been dead, the new owner is expecting to get the mutex lock anyway, if unlock is called, the new owner would have to call lock again to grab the lock which is NOT expected behavior.

PomLover · 2018-12-17T16:19:34Z

It only changed recursive_mutex.hpp, there is another mutex.hpp need to change？

Yeah, potentially for semaphore to, but we didn't use them and test them, so I didn't include any of those changes, but only recursive mutex since we are using it and testing it.

363568233 · 2018-12-18T01:00:39Z

The way to deal owner dead is different from windows mutex, windows mutex will release mutex and throw owner_dead_error exception, Do not posix mutex keep consistent with windows to tell user?

363568233 · 2018-12-18T01:02:58Z

I see the boost::interprocess::interprocess_mutex on linux is using the mutex.cpp default, the boost::interprocess::interprocess_mutex is used in rbtree_best_fit

PomLover · 2018-12-18T15:59:08Z

The way to deal owner dead is different from windows mutex, windows mutex will release mutex and throw owner_dead_error exception, Do not posix mutex keep consistent with windows to tell user?

Yeah, it could be aligned with windows mutex implementation in boost, but I would prefer to granting mutex to new owner instead of leave it unlocked which behavior is different from Windows API WaitforSingleObject.

PomLover · 2018-12-18T16:16:31Z

I see the boost::interprocess::interprocess_mutex on linux is using the mutex.cpp default, the boost::interprocess::interprocess_mutex is used in rbtree_best_fit

I didn't use rbtree_best_fit, but definitely the mutex.cpp can be changed to use robust mutex too if needed, but I didn't use it, so I would prefer to having someone else who is using and testing it making the changes.

PomLover · 2018-12-18T17:13:34Z

Actually I think the new owner release the mutex from previous dead owner is a bad implementation which break some expectations in mutex use cases.

…till keep the mutex locked with new owner

PomLover · 2019-11-22T16:07:18Z

Hi Maintainers and Contributors,
It's been a while since I submitted this PR, I'm wondering if there are any concerns or suggestions with this PR, and is it possible to merge this in at that some point, could some of major contributors here provide some feedback or suggestions for next step? Thank you!

msmolensky · 2021-03-12T17:41:58Z

Recently ran into the same problem. Implemented this solution for both non-recursive and recursive mutexes. I think this PR fixes an important problem and should be accepted

igaztanaga · 2021-08-19T22:25:06Z

Hi,

I've found some time review old bugs and patches. I think the proposed patch (and, in general, robust mutex implementation in C++) has some problems.

Issue 1, Throwing while locking should left the mutex unlocked: The final implementation (with commit 34fc89a), throws an exception even when the mutex is locked, something that breaks C++ Lock guarantees (std::unique_lock, scoped_lock, etc... asume that an exception when locking guarantees the mutex was no locked). And nearly every C++ programmer will use a Lock to call the mutex operations.

Issue 2, Unconditionally marking the mutex as consistent is wrong as only the application can decide on this: The first implementation (commit aba35f0), marks the mutex as consistent, which is coherent with not throwing an exception, but a mutex should be marked as consistent if the application can guarantee that no data structures have been corrupted while the original owner died (the original owner probably died while modifying shared data protected by the mutex, and that data might be half-modified). According to the POSIX specification:

"The pthread_mutex_consistent() function is only responsible for notifying the implementation that the state protected by the mutex has been recovered and that normal operations with the mutex can be resumed. It is the responsibility of the application to recover the state so it can be reused."

So it's not possible to take a decision inside "posix_recursive_mutex::lock()" or any other internal function because the application has no chance to inspect the data.

Issue 3, Impossible to fix Issue 1 and Issue 2 at the same time with POSIX: And finally, we can't unlock the mutex and thow an exception informing the previous owner died when EOWNERDEAD is returned because unlocking the mutex marks it as unrecoverable forever. So it's not possible to inform the application that the previous owner died without breaking Lock guarantees.

I'm inclined to think that the only sensible implementation when receiving EOWNERDEAD is to mark the mutex as unrecoverable (just unlocking it) and throwing an exception because certainly data might be corrupted. No new callers will be able to lock it again and will receive an exception, but they won't be blocked and can handle the exception and try to recreate the data structure in another shared memory or mapped file.

PomLover · 2021-08-23T16:06:12Z

@igaztanaga Thank you very much for your valuable inputs, I think I agree with all your comments above, and I have a few thoughts based on your comments, as you mentioned, application is responsible to decide if the mutex can be set to consistent and reusable, do you think it will be appropriate to provide another interface for application to set mutex to consistent state and continue using it, as I can see there are some needs in this case from different comments above. As in this case, it is going to be harder for application to notify all existing waiters to wake up and start using a new mutex instead, and that could cause a mess in terms of keeping the same order of the waiters after starting using the new mutex, then reusing existing mutex by setting it consistent will be a better fit. As library provider, it will be great to provide more flexibilities for applications to achieve different cases easily. My thought is that in this case, library provider can't assume the applications will not be able to recover from corrupted shared data by using the existing mutex either and mark the mutex as not recoverable. I think I still agree with you that application is responsible to make this decision, but currently there is no interface for application to make this decision either reusing mutex or creating a new one. It seems like creating a new one is the only choice which might not be ideal for some applications and/or even cause other problems.

igaztanaga · 2021-08-31T10:28:45Z

I'm not sure which interface would be reasonable. If you directly manage the mutex, then we could have some lock() overload that succeeds but also informs that the owner was dead, this could work for POSIX, I'm not sure about Win32 (Win32 has no notion of "mutex consistency").

The problem is harder to solve when we try to use more advanced clases (such as managed shared memory in Boost.Interprocess), where the mutex is deeply hidden. If locking returns that the owner is dead, user code is many levels away the mutex failure logic, ¿how can the internal logic of Boost.Interprocess call the user code without unlocking the mutex? (because unlocking would permantly declare the mutex as non-recoverable in POSIX). A callback that the user must pass layer by layer until the internal logic of Boost.Interprocess?

The more I think about it, the more I see the issue is really tricky.

PomLover · 2021-08-31T20:07:23Z

@igaztanaga Thanks for your new inputs. Here are my additional thoughts. Even though there is no concept of consistency in Win32, but Win32 returns WAIT_ABANDONED indicator for waiter to identify dead owner case, and allow applications to decide what to do either continue using this mutex with the normal behavior or do something else, I think that is what a lot of Windows Applications have been living with. The interface in my mind would be something like "recover", for POSIX, it will need to set mutex consistence, and for WIN32, there is nothing to do in this case, as WIN32 allows to use the same mutex without setting anything as long as the new owner releases it before exiting. I can understand it feels tricky while dealing with different cases in multi-platforms, but that is the benefit of boost library to deal with the hard portion so that applications don't need to worry about the tricky cases in multi-platforms, then a cross-platform applications will behave the same in multi-platforms.

msmolensky · 2021-10-03T19:15:57Z

While I agree that in general the library cannot make a decision, in some cases it is possible. In the systems with only one writer, for example, it is safe for the library to declare the mutex is consistent. Can the mutex be policy or configuration based? By default, it can be unrecoverable, but it may be specialized/configured to be consistent.

I still feel that unconditionally preventing an application from accessing the data is too restrictive. While better than deadlocking, it still won't allow the application to function.

Changed to use posix robust mutex to allow subsequent mutex lock can …

aba35f0

…grab the lock from previous dead owner to void deadlock. And it also fixed the issue boostorg#65.

throw a owner dead exception when previous owner has been dead, but s…

34fc89a

…till keep the mutex locked with new owner

igaztanaga closed this in 7c88937 Aug 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changed to use posix robust mutex #67

Changed to use posix robust mutex #67

PomLover commented Nov 9, 2018

363568233 commented Dec 11, 2018

PomLover commented Dec 11, 2018

363568233 commented Dec 17, 2018

363568233 commented Dec 17, 2018

PomLover commented Dec 17, 2018

PomLover commented Dec 17, 2018

363568233 commented Dec 18, 2018

363568233 commented Dec 18, 2018

PomLover commented Dec 18, 2018

PomLover commented Dec 18, 2018

PomLover commented Dec 18, 2018

PomLover commented Nov 22, 2019

msmolensky commented Mar 12, 2021

igaztanaga commented Aug 19, 2021

PomLover commented Aug 23, 2021

igaztanaga commented Aug 31, 2021

PomLover commented Aug 31, 2021

msmolensky commented Oct 3, 2021

Changed to use posix robust mutex #67

Changed to use posix robust mutex #67

Conversation

PomLover commented Nov 9, 2018

363568233 commented Dec 11, 2018

PomLover commented Dec 11, 2018

363568233 commented Dec 17, 2018

363568233 commented Dec 17, 2018

PomLover commented Dec 17, 2018

PomLover commented Dec 17, 2018

363568233 commented Dec 18, 2018

363568233 commented Dec 18, 2018

PomLover commented Dec 18, 2018

PomLover commented Dec 18, 2018

PomLover commented Dec 18, 2018

PomLover commented Nov 22, 2019

msmolensky commented Mar 12, 2021

igaztanaga commented Aug 19, 2021

PomLover commented Aug 23, 2021

igaztanaga commented Aug 31, 2021

PomLover commented Aug 31, 2021

msmolensky commented Oct 3, 2021