-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: improve OSD robustness. #50326
osd: improve OSD robustness. #50326
Conversation
@neha-ojha - mind taking a look? |
5dc1926
to
8d2a20e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good feature. Excellent simplicity / effect ratio.
src/osd/OSD.cc
Outdated
int r = store->omap_get_values( | ||
service.meta_ch, OSD_SUPERBLOCK_GOBJECT, keys, &vals); | ||
if (r < 0 || vals.size() == 0) { | ||
dout(10) << __func__ << " attempting read from omap replica" << dendl; | ||
|
||
r = store->read(service.meta_ch, OSD_SUPERBLOCK_GOBJECT, 0, 0, bl); | ||
if (r < 0) { | ||
return -ENOENT; | ||
} | ||
dout(10) << __func__ << " got omap replica" << dendl; | ||
} else { | ||
std::swap(bl, vals.begin()->second); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that at some point you decided to give priority to data read from OMAP.
I agree with that, but douts and return
did not get synced.
In addition, I think that we should read both, put derr if they differ, and select one.
We should try decode both, and select the one that does decode.
If both decode, we should select the one that has later epoch.
In last step if they still differ, I vote for OMAP one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments updated Implemented attempting both superblock replica and choosing the most recent one too.
@@ -8184,7 +8205,15 @@ void OSD::handle_osd_map(MOSDMap *m) | |||
{ | |||
bufferlist bl; | |||
::encode(pg_num_history, bl); | |||
t.write(coll_t::meta(), make_pg_num_history_oid(), 0, bl.length(), bl); | |||
auto oid = make_pg_num_history_oid(); | |||
t.truncate(coll_t::meta(), oid, 0); // we don't need bytes left if new data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing truncate is an excellent idea.
It will empty object content, and will force re-allocation of space for osd_superblock
.
We no longer will have the case that osd_superblock
is parked forever as first data on the device, always updated in-place by deferred writes.
And I think an info to this effect should be added to code comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Achieved by 1. osd superblock data is replicated in onode's OMAP - hence one can recover from that after onode's content is corrupted. 2. pg_num_history object gets full overwrite which eliminatess the need to merge with previous data (and hence reading corrupted data wouldn't kill OSD). Signed-off-by: Igor Fedotov <ifedotov@croit.io>
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
8d2a20e
to
e7c08ec
Compare
@aclamk - hopefully I resolved your comments, please re-review. |
jenkins test api |
Achieved by:
This is a branch off from #48309.
Signed-off-by: Igor Fedotov igor.fedotov@croit.io
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows