Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds: reset heartbeat when fetching or committing dentries #45107

Merged
merged 4 commits into from Apr 16, 2022

Conversation

lxbsz
Copy link
Member

@lxbsz lxbsz commented Feb 21, 2022

Fixes: https://tracker.ceph.com/issues/54345
Signed-off-by: Xiubo Li xiubli@redhat.com

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@lxbsz lxbsz requested a review from a team February 21, 2022 15:08
@github-actions github-actions bot added the cephfs Ceph File System label Feb 21, 2022
@lxbsz lxbsz requested a review from vshankar February 21, 2022 15:08
src/mds/CDir.cc Outdated
@@ -2123,6 +2125,9 @@ void CDir::_omap_fetched(bufferlist& hdrbl, map<string, bufferlist>& omap,
last_name = std::string_view(k_it->c_str(), n_key.name.length());
null_keys.emplace_back(std::move(n_key));
++k_it;

if (!(++count % 1000))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to check 1000 or 2000 (or in multiple of that) ?
Can we use Macro for these numbers? Just in case if we need to update these values in future so we can only update macro.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some code it may take a longer time for each loop so will use 1000 as others do in mds/, and for some code it will be faster so will use 2000 instead.

Sounds good. Will add one macro for the whole mds/.

Copy link
Contributor

@kotreshhr kotreshhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the user impact because of this bug ?

src/mds/CDir.cc Outdated Show resolved Hide resolved
@lxbsz
Copy link
Member Author

lxbsz commented Feb 24, 2022

What's the user impact because of this bug ?

This is fixing the bug in bz#2041660

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
@kotreshhr
Copy link
Contributor

@lxbsz PR looks good to me. Is it possible to add a test case for this ?

@lxbsz
Copy link
Member Author

lxbsz commented Feb 28, 2022

@lxbsz PR looks good to me. Is it possible to add a test case for this ?

Hi Kotresh, let me try, thanks.

@@ -206,6 +206,14 @@ options:
services:
- mds
with_legacy: true
- name: mds_heartbeat_reset_grace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a Q: Is this heartbeat also used to check whether node is active or not, like in real :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The heartbeat here is just for the MDS daemon's liveness. And the MDS should periodically tell the Monitors that it's alive or not stuck.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this grace a time period or number of iterations ?
Does mds_heartbeat_reset_grace_period sound correct or mds_heartbeat_reset_grace_iterations sound correct ?

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok and fixes the warnings. However, we might want to come up with an alternate approach for this rather than spraying a bunch of heartbeat_reset() calls all over the mds.

Something to ponder...

@vshankar
Copy link
Contributor

jenkins test windows

@vshankar
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cephfs Ceph File System common
Projects
None yet
6 participants