Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds: try to avoid false positive heartbeat timeouts #13807

Merged
merged 2 commits into from Mar 15, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/mds/MDCache.cc
Expand Up @@ -5606,6 +5606,8 @@ void MDCache::export_remaining_imported_caps()
mds->send_message_client_counted(stale, q->first);
}
}

mds->heartbeat_reset();
}

for (map<inodeno_t, list<MDSInternalContextBase*> >::iterator p = cap_reconnect_waiters.begin();
Expand Down
2 changes: 2 additions & 0 deletions src/mds/MDSRank.cc
Expand Up @@ -439,6 +439,8 @@ bool MDSRank::_dispatch(Message *m, bool new_msg)
dout(0) << "unrecognized message " << *m << dendl;
return false;
}

heartbeat_reset();
}

if (dispatch_depth > 1)
Expand Down
7 changes: 6 additions & 1 deletion src/mds/MDSRank.h
Expand Up @@ -226,7 +226,6 @@ class MDSRank {
bool _dispatch(Message *m, bool new_msg);

ceph::heartbeat_handle_d *hb; // Heartbeat for threads using mds_lock
void heartbeat_reset();

bool is_stale_message(Message *m) const;

Expand Down Expand Up @@ -297,6 +296,12 @@ class MDSRank {
void respawn();
// <<<

/**
* Call this periodically if inside a potentially long running piece
* of code while holding the mds_lock
*/
void heartbeat_reset();

/**
* Report state DAMAGED to the mon, and then pass on to respawn(). Call
* this when an unrecoverable error is encountered while attempting
Expand Down