New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

luminous: cephfs: MDSMonitor: lookup of gid in prepare_beacon that has been removed will cause exception #23990

Merged
merged 1 commit into from Sep 18, 2018

Conversation

Projects
None yet
4 participants
@batrick
Member

batrick commented Sep 7, 2018

mon: test if gid exists in pending for prepare_beacon
If it does not, send a null map. Bug introduced by
624efc6 which made preprocess_beacon only look
at the current fsmap (correctly). prepare_beacon relied on preprocess_beacon
doing that check on pending.

Running:

    while sleep 0.5; do bin/ceph mds fail 0; done

is sufficient to reproduce this bug. You will see:

    2018-09-07 15:33:30.350 7fffe36a8700  5 mon.a@0(leader).mds e69 preprocess_beacon mdsbeacon(24412/a up:reconnect seq 2 v69) v7 from mds.0 127.0.0.1:6813/2891525302 compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
    2018-09-07 15:33:30.350 7fffe36a8700 10 mon.a@0(leader).mds e69 preprocess_beacon: GID exists in map: 24412
    2018-09-07 15:33:30.350 7fffe36a8700  5 mon.a@0(leader).mds e69 _note_beacon mdsbeacon(24412/a up:reconnect seq 2 v69) v7 noting time
    2018-09-07 15:33:30.350 7fffe36a8700  7 mon.a@0(leader).mds e69 prepare_update mdsbeacon(24412/a up:reconnect seq 2 v69) v7
    2018-09-07 15:33:30.350 7fffe36a8700 12 mon.a@0(leader).mds e69 prepare_beacon mdsbeacon(24412/a up:reconnect seq 2 v69) v7 from mds.0 127.0.0.1:6813/2891525302
    2018-09-07 15:33:30.350 7fffe36a8700 15 mon.a@0(leader).mds e69 prepare_beacon got health from gid 24412 with 0 metrics.
    2018-09-07 15:33:30.350 7fffe36a8700  5 mon.a@0(leader).mds e69 mds_beacon mdsbeacon(24412/a up:reconnect seq 2 v69) v7 is not in fsmap (state up:reconnect)

in the mon leader log. The last line indicates the problem was safely handled.

Fixes: http://tracker.ceph.com/issues/35848

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit f26752a)

Conflicts:
    src/mon/MDSMonitor.cc: needs to use old message creation mechanism

@batrick batrick added the cephfs label Sep 7, 2018

@batrick batrick added this to the luminous milestone Sep 7, 2018

@batrick batrick changed the title from luminous: mds: runs out of file descriptors after several respawns to luminous: MDSMonitor: lookup of gid in prepare_beacon that has been removed will cause exception Sep 7, 2018

@smithfarm smithfarm requested review from liewegas, jdurgin and gregsfortytwo Sep 11, 2018

@yuriw

This comment has been minimized.

Contributor

yuriw commented Sep 13, 2018

@gregsfortytwo

Reviewed-by: Greg Farnum gfarnum@redhat.com

@yuriw yuriw merged commit f71602c into ceph:luminous Sep 18, 2018

4 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details

@batrick batrick deleted the batrick:i35852 branch Sep 26, 2018

@smithfarm smithfarm changed the title from luminous: MDSMonitor: lookup of gid in prepare_beacon that has been removed will cause exception to luminous: cephfs: MDSMonitor: lookup of gid in prepare_beacon that has been removed will cause exception Oct 26, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment