New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mon: clear duplicated logic in MDSMonitor #11209
Conversation
} else if (info.state == MDSMap::STATE_STANDBY_REPLAY || | ||
info.state == MDSMap::STATE_STANDBY) { | ||
dout(10) << " failing and removing " << gid << " " << info.addr << " mds." << info.rank | ||
<< "." << info.inc << " " << ceph_mds_state_name(info.state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These cases were almost the same, but you've dropped the last_beacon.erase(gid);
line. Did you check git logs to see how we got the duplicated cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gregsfortytwo Yes, I dropped last_beacon.erase(gid) because fail_mds_gid(gid) did this work here.
I checked git log and this duplicated case was introduced back to 2011. At that time this part made sense because the logic was a little different from now.
fail_mds_gid(gid); | ||
*mds_propose = true; | ||
} else if (!info.laggy()) { | ||
if (!info.laggy()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this might as well be an "else if" block instead of nested else-if, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gregsfortytwo This was an "else if" back to 2011 and then changed to nested one. I think current logic looks good and can be easily understand. If it goes here, it means mds doesn't find a replacement and we can not ignore this mds no matter what current mds state is. We should record laggy_since once and report the warning.
The whole checking is almost based on mds state. If we pick up a branch to check whether mds is already laggy or not. I don't think the logic is better and more clear than current one.
What do you think? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gregsfortytwo This was an "else if" back to 2011 and then changed to nested one. I think current logic looks good and can be easily understand. If it goes here, it means mds doesn't find a replacement and we can not ignore this mds no matter what current mds state is. We should record laggy_since once and report the warning.
The whole checking is almost based on mds state. If we pick up a branch to check whether mds is already laggy or not. I don't think the logic is better and more clear than current one.
What do you think? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last_beacon.erase part only made sense in the cases where we were actually removing an MDS. When we do last_beacon.erase for a GID that is laggy but staying in the map, it just gets replaced at the start of tick() next time (the "make sure last beacon is fully populated") section.
So I think you can delete that last_beacon.erase line, and then collapse the nested if into an else if
as greg suggests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcsp yes, I looked through the code again and we can delay last_beacon.erase to the next tick().
Ping @david-z, looks like we're waiting on a few code changes from you. :) |
@gregsfortytwo sorry for the late reply, I was on vacation in the past week. |
Signed-off-by: Zhi Zhang <zhangz.david@outlook.com>
c84719e
to
85c3ca1
Compare
@gregsfortytwo @jcsp Changes are made as you suggested. Pls help to review. Thanks. |
Clear some duplicated logic in MDSMonitor when checking replacement for a failed MDS. It will make this part of logic more clear and readable.
Signed-off-by: Zhi Zhang zhangz.david@outlook.com