Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

luminous: MDSMonitor: allow beacons from stopping MDS that was laggy #25686

Merged
merged 1 commit into from Jan 12, 2019

Conversation

joscollin
Copy link
Member

@joscollin joscollin commented Dec 23, 2018

Otherwise these get continually dropped.

Reproducing this manually:

o only have 2 MDS running (to prevent failover)
o max_mds=2
o create a lot of dirs with pins on rank 1 to make stopping take a while (as of this commit)
o max_mds=1
o immediately start dropping beacon packets to the mons from rank 1 using iptables
o wait ~30 seconds until the rank shows up as laggy
o remove the iptables rule

Now debug output shows after this commit:

    2018-12-20 14:58:07.190 7fbe19f5d700  5 mon.a@0(leader).mds e148 preprocess_beacon mdsbeacon(34119/b up:stopping seq 155 v148) v7 from mds.1 127.0.0.1:6839/1223470631 compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
    2018-12-20 14:58:07.190 7fbe19f5d700 10 mon.a@0(leader).mds e148 preprocess_beacon: GID exists in map: 34119
    2018-12-20 14:58:07.190 7fbe19f5d700  5 mon.a@0(leader).mds e148 _note_beacon mdsbeacon(34119/b up:stopping seq 155 v148) v7 noting time
    2018-12-20 14:58:07.190 7fbe19f5d700  7 mon.a@0(leader).mds e148 prepare_update mdsbeacon(34119/b up:stopping seq 155 v148) v7
    2018-12-20 14:58:07.190 7fbe19f5d700 12 mon.a@0(leader).mds e148 prepare_beacon mdsbeacon(34119/b up:stopping seq 155 v148) v7 from mds.1 127.0.0.1:6839/1223470631
    2018-12-20 14:58:07.190 7fbe19f5d700 15 mon.a@0(leader).mds e148 prepare_beacon got health from gid 34119 with 0 metrics.
    2018-12-20 14:58:07.190 7fbe19f5d700  0 log_channel(cluster) log [INF] : MDS health message cleared (mds.1): 1 slow metadata IOs are blocked > 30 secs, oldest blocked for 30 secs
    2018-12-20 14:58:07.190 7fbe19f5d700  1 -- 127.0.0.1:40495/0 --> 127.0.0.1:40495/0 -- log(1 entries from seq 129 at 2018-12-20 14:58:07.192368) v1 -- 0x5de9f11a80 con 0
    2018-12-20 14:58:07.190 7fbe19f5d700  1 mon.a@0(leader).mds e148 prepare_beacon clearing laggy flag on 127.0.0.1:6839/1223470631
    2018-12-20 14:58:07.190 7fbe19f5d700  5 mon.a@0(leader).mds e148 prepare_beacon mds.1 up:stopping -> up:stopping  standby_for_rank=-1

Fixes: https://tracker.ceph.com/issues/37724
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 05e9037)
@yuriw
Copy link
Contributor

yuriw commented Jan 8, 2019

@yuriw yuriw merged commit 5a9d72c into ceph:luminous Jan 12, 2019
@joscollin joscollin deleted the wip-37737-luminous branch January 13, 2019 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants