osd: heartbeat peers need to be updated when a new OSD added into an existed cluster #12069

liupan1111 · 2016-11-18T12:57:31Z

The case is:
There is a storage node which has two osds. nearly one min later, a new OSD joins in. Monitor sees it as "up", and no heatbeat from existing two osds to this new one. When we kill this new OSD, no osd reports this to monitor, so monitor still treats it as "up". Then, if the user invokes fio, io may hang there.

Fixes: http://tracker.ceph.com/issues/18004

…already existed cluster Signed-off-by: Pan Liu <pan.liu@istuary.com>

Signed-off-by: Pan Liu <pan.liu@istuary.com>

liupan1111 · 2016-11-18T13:02:12Z

@liewegas @tchaikov , please help take a look.

liewegas · 2016-11-18T14:53:32Z

Note that this only happens if the new OSD gets no PGs mapped to it, which is pretty rare. Still, let's fix it!

yuyuyu101 · 2016-11-18T14:54:09Z

@liewegas yes, I'm confusing why io will hang if no pg mapping

yuyuyu101 · 2016-11-18T14:54:50Z

the problem solved here is fixing incorrect "up" status in mon I think. @liupan1111 is it right?

liupan1111 · 2016-11-18T14:58:11Z

@yuyuyu101, at this case when a new osd joins, and then be killed, monitor still treat it as up, So client will still take this osd in its computing scope.

liewegas · 2016-11-18T15:00:20Z

We add osdid +1 and -1 to the peer set so that we ensure we always ping all osds and the graph is fully connected. See maybe_update_heartbeat_peers()

yuriw · 2016-11-23T00:28:33Z

http://pulpito.ceph.com/yuriw-2016-11-22_16:38:08-rados-wip-yuri-testing2_2016_11_21-distro-basic-smithi/

yuriw · 2016-11-23T00:31:14Z

@liupan1111 - I see no tracker ticket reference, it'd be useful if you can link it.

liupan1111 · 2016-11-23T01:50:37Z

@yuriw , done. Thank you for reminding.

Pan Liu added 2 commits November 18, 2016 20:01

OSD: heartbeat peers need to be updated when a new OSD added into an …

e95026f

…already existed cluster Signed-off-by: Pan Liu <pan.liu@istuary.com>

OSD: remove 'has_inst', which has the same function as 'is_up'

01dfc1b

Signed-off-by: Pan Liu <pan.liu@istuary.com>

liewegas changed the title ~~OSD: heartbeat peers need to be updated when a new OSD added into an existed cluster~~ osd: heartbeat peers need to be updated when a new OSD added into an existed cluster Nov 18, 2016

liewegas approved these changes Nov 18, 2016

View reviewed changes

liewegas added bug-fix core labels Nov 18, 2016

liewegas added the needs-qa label Nov 18, 2016

liewegas added this to the kraken milestone Nov 18, 2016

yuriw added the wip-yuri2-testing label Nov 21, 2016

yuriw merged commit 2c3cda2 into ceph:master Nov 23, 2016

liupan1111 deleted the wip-osd-up-heartbeat-peers branch November 23, 2016 01:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd: heartbeat peers need to be updated when a new OSD added into an existed cluster #12069

osd: heartbeat peers need to be updated when a new OSD added into an existed cluster #12069

liupan1111 commented Nov 18, 2016 •

edited

liupan1111 commented Nov 18, 2016

liewegas commented Nov 18, 2016

yuyuyu101 commented Nov 18, 2016

yuyuyu101 commented Nov 18, 2016

liupan1111 commented Nov 18, 2016

liewegas commented Nov 18, 2016

yuriw commented Nov 23, 2016

yuriw commented Nov 23, 2016

liupan1111 commented Nov 23, 2016

osd: heartbeat peers need to be updated when a new OSD added into an existed cluster #12069

osd: heartbeat peers need to be updated when a new OSD added into an existed cluster #12069

Conversation

liupan1111 commented Nov 18, 2016 • edited

liupan1111 commented Nov 18, 2016

liewegas commented Nov 18, 2016

yuyuyu101 commented Nov 18, 2016

yuyuyu101 commented Nov 18, 2016

liupan1111 commented Nov 18, 2016

liewegas commented Nov 18, 2016

yuriw commented Nov 23, 2016

yuriw commented Nov 23, 2016

liupan1111 commented Nov 23, 2016

liupan1111 commented Nov 18, 2016 •

edited