New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mds: switch submit_mutex to fair mutex for MDLog #44215
Conversation
The implementations of the Mutex (e.g. std::mutex in C++) do not guarantee fairness, they do not guarantee that the lock will be acquired by threads in the order that they called the lock(). In most case this works well, but in overload case the client requests handling thread and _submit_thread could always successfully acquire the submit_mutex in a long time, which could make the MDLog::trim() get stuck. That means the MDS daemons will fill journal logs into the metadata pool, but couldn't trim the expired segments in time. This will switch the submit_mutex to fair mutex and it could make sure that the all the submit_mutex waiters are in FIFO order and could get a change to be excuted in time. Fixes: https://tracker.ceph.com/issues/40002 Signed-off-by: Xiubo Li <xiubli@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - we probably need to put this to test under load.
Sounds reasonable. I think we also need to put the |
Sure. |
I have raised one tracker about this: https://tracker.ceph.com/issues/53520 |
I'm not confident this is fixing the real problem. I think it may be more likely the MDS finisher was starved which deferred expiration of segments (e.g. Do you have a test case showing this fix resolves the original issue? |
Yeah, it seems will fix or at least alleviate some other issues like https://tracker.ceph.com/issues/53521.
Not yet, I was just using a simple testing script by creating/deleting dirs and files in 2 loops in two terminals, which will send a lot of client requests and also in a third terminal kept doing heavy read/write at the same time. I could see sometimes the With this fixing I have see that yet. |
The |
That's the reason this PR is not "approved", although, the switch to using fair mutex does look reasonable to me. We'll only know if log trimming does not get stuck when we load test this change. |
Yeah, as we know the In theory, if there has a large number of client requests queued in the dispatch thread, for the non-fair mutex it's possibly that the Then I'm sure that we will see the same issue with the |
With #44180, this really could resolve the problem. More detail please see https://tracker.ceph.com/issues/40002#note-14. |
I have fixed this in #44180, because there has dependency. |
The implementations of the Mutex (e.g. std::mutex in C++) do not
guarantee fairness, they do not guarantee that the lock will be
acquired by threads in the order that they called the lock().
In most case this works well, but in overload case the client
requests handling thread and _submit_thread could always successfully
acquire the submit_mutex in a long time, which could make the
MDLog::trim() get stuck. That means the MDS daemons will fill journal
logs into the metadata pool, but couldn't trim the expired segments
in time.
This will switch the submit_mutex to fair mutex and it could make
sure that the all the submit_mutex waiters are in FIFO order and
could get a change to be excuted in time.
Fixes: https://tracker.ceph.com/issues/40002
Signed-off-by: Xiubo Li xiubli@redhat.com
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox