Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: heartbeat peers need to be updated when a new OSD added into an existed cluster #12069

Merged
merged 2 commits into from Nov 23, 2016
Merged

osd: heartbeat peers need to be updated when a new OSD added into an existed cluster #12069

merged 2 commits into from Nov 23, 2016

Conversation

liupan1111
Copy link
Contributor

@liupan1111 liupan1111 commented Nov 18, 2016

The case is:
There is a storage node which has two osds. nearly one min later, a new OSD joins in. Monitor sees it as "up", and no heatbeat from existing two osds to this new one. When we kill this new OSD, no osd reports this to monitor, so monitor still treats it as "up". Then, if the user invokes fio, io may hang there.

Fixes: http://tracker.ceph.com/issues/18004

Pan Liu added 2 commits November 18, 2016 20:01
…already existed cluster

Signed-off-by: Pan Liu <pan.liu@istuary.com>
Signed-off-by: Pan Liu <pan.liu@istuary.com>
@liupan1111
Copy link
Contributor Author

@liewegas @tchaikov , please help take a look.

@liewegas liewegas changed the title OSD: heartbeat peers need to be updated when a new OSD added into an existed cluster osd: heartbeat peers need to be updated when a new OSD added into an existed cluster Nov 18, 2016
@liewegas
Copy link
Member

Note that this only happens if the new OSD gets no PGs mapped to it, which is pretty rare. Still, let's fix it!

@liewegas liewegas added this to the kraken milestone Nov 18, 2016
@yuyuyu101
Copy link
Member

@liewegas yes, I'm confusing why io will hang if no pg mapping

@yuyuyu101
Copy link
Member

the problem solved here is fixing incorrect "up" status in mon I think. @liupan1111 is it right?

@liupan1111
Copy link
Contributor Author

@yuyuyu101, at this case when a new osd joins, and then be killed, monitor still treat it as up, So client will still take this osd in its computing scope.

@liewegas
Copy link
Member

We add osdid +1 and -1 to the peer set so that we ensure we always ping all osds and the graph is fully connected. See maybe_update_heartbeat_peers()

@yuriw yuriw merged commit 2c3cda2 into ceph:master Nov 23, 2016
@yuriw
Copy link
Contributor

yuriw commented Nov 23, 2016

@liupan1111 - I see no tracker ticket reference, it'd be useful if you can link it.

@liupan1111 liupan1111 deleted the wip-osd-up-heartbeat-peers branch November 23, 2016 01:27
@liupan1111
Copy link
Contributor Author

@yuriw , done. Thank you for reminding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants