Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reef: mds: revert standby-replay trimming changes #54716

Merged
merged 4 commits into from Feb 5, 2024

Conversation

batrick
Copy link
Member

@batrick batrick commented Nov 29, 2023

backport tracker: https://tracker.ceph.com/issues/63676


backport of #48483
parent tracker: https://tracker.ceph.com/issues/48673

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh

Fixes: 138fea6
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit e2b2e8e)
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit fe35c9b)
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit a1ca8e8)
Revert "mds: do not trim the inodes from the lru list in standby_replay"
Revert "mds: trim cache during standby replay"
This reverts commit 79bb44c.
This reverts commit c0fe25b.

standby-replay daemons were changed to keep minimal metadata from the journal
in cache but the original intent of standby-replay was to have a cache that's
as warm as the rank itself.  This reverts the two commits which changed that
behavior.

Part of these reason for this is that the new rapid cache trimming behavior was
not correct at all. The trimming loop would break when it runs into a dentry
with non-null linkage. This would nearly always be the case.  It was thought
that this was a problem introduced by [2] as MDCache::standby_trim_segment has
a different trim check [4] but the original issue (tracker 48673) is as old as
[1], indicating the problem predates [2].

So, this commit reverts all of that. I have lingering suspicions that the
standby-replay daemon is not pinning some dentries properly which causes [5]
but this did not show up unless the MDS was rapidly evicting some dentries.
More research needs done there.

[1] c0fe25b
[2] 79bb44c
[3] https://github.com/ceph/ceph/blob/84fba097049ec4f72549588eaacc64f30c7a88a8/src/mds/MDCache.cc#L6816-L6820
[4] https://github.com/ceph/ceph/blob/84fba097049ec4f72549588eaacc64f30c7a88a8/src/mds/MDCache.cc#L7476-L7481
[5] https://tracker.ceph.com/issues/50246

Fixes: https://tracker.ceph.com/issues/48673
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 589e59a)
@batrick batrick added this to the reef milestone Nov 29, 2023
@batrick batrick added the cephfs Ceph File System label Nov 29, 2023
Copy link
Contributor

@mchangir mchangir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuriw yuriw merged commit 05a8d61 into ceph:reef Feb 5, 2024
11 checks passed
@batrick batrick deleted the wip-63676-reef branch February 14, 2024 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants