New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: heartbeat peers need to be updated when a new OSD added into an existed cluster #12069
Conversation
…already existed cluster Signed-off-by: Pan Liu <pan.liu@istuary.com>
Signed-off-by: Pan Liu <pan.liu@istuary.com>
Note that this only happens if the new OSD gets no PGs mapped to it, which is pretty rare. Still, let's fix it! |
@liewegas yes, I'm confusing why io will hang if no pg mapping |
the problem solved here is fixing incorrect "up" status in mon I think. @liupan1111 is it right? |
@yuyuyu101, at this case when a new osd joins, and then be killed, monitor still treat it as up, So client will still take this osd in its computing scope. |
We add osdid +1 and -1 to the peer set so that we ensure we always ping all osds and the graph is fully connected. See maybe_update_heartbeat_peers() |
@liupan1111 - I see no tracker ticket reference, it'd be useful if you can link it. |
@yuriw , done. Thank you for reminding. |
The case is:
There is a storage node which has two osds. nearly one min later, a new OSD joins in. Monitor sees it as "up", and no heatbeat from existing two osds to this new one. When we kill this new OSD, no osd reports this to monitor, so monitor still treats it as "up". Then, if the user invokes fio, io may hang there.
Fixes: http://tracker.ceph.com/issues/18004