Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: improve OSD robustness. #50326

Merged
merged 2 commits into from
Jul 7, 2023
Merged

Conversation

ifed01
Copy link
Contributor

@ifed01 ifed01 commented Mar 1, 2023

Achieved by:

  1. osd superblock data is replicated in onode's OMAP - hence one can recover from that after onode's content is corrupted.
  2. pg_num_history object gets full overwrite which eliminatess the need to merge with previous data (and hence reading corrupted data wouldn't kill OSD).

This is a branch off from #48309.

Signed-off-by: Igor Fedotov igor.fedotov@croit.io

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@ifed01 ifed01 requested a review from a team as a code owner March 1, 2023 12:25
@github-actions github-actions bot added the core label Mar 1, 2023
@ifed01 ifed01 removed the bluestore label Mar 1, 2023
@ifed01 ifed01 changed the title os/bluestore: improve OSD robustness. osd: improve OSD robustness. Mar 1, 2023
@ifed01
Copy link
Contributor Author

ifed01 commented Mar 1, 2023

@neha-ojha - mind taking a look?

@ifed01 ifed01 force-pushed the wip-ifed-better-osd-robust branch from 5dc1926 to 8d2a20e Compare March 1, 2023 12:28
@ifed01 ifed01 added the feature label Mar 1, 2023
@neha-ojha neha-ojha self-requested a review March 1, 2023 23:45
Copy link
Contributor

@aclamk aclamk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good feature. Excellent simplicity / effect ratio.

src/osd/OSD.cc Outdated
Comment on lines 4763 to 4792
int r = store->omap_get_values(
service.meta_ch, OSD_SUPERBLOCK_GOBJECT, keys, &vals);
if (r < 0 || vals.size() == 0) {
dout(10) << __func__ << " attempting read from omap replica" << dendl;

r = store->read(service.meta_ch, OSD_SUPERBLOCK_GOBJECT, 0, 0, bl);
if (r < 0) {
return -ENOENT;
}
dout(10) << __func__ << " got omap replica" << dendl;
} else {
std::swap(bl, vals.begin()->second);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that at some point you decided to give priority to data read from OMAP.
I agree with that, but douts and return did not get synced.

In addition, I think that we should read both, put derr if they differ, and select one.
We should try decode both, and select the one that does decode.
If both decode, we should select the one that has later epoch.
In last step if they still differ, I vote for OMAP one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments updated Implemented attempting both superblock replica and choosing the most recent one too.

@@ -8184,7 +8205,15 @@ void OSD::handle_osd_map(MOSDMap *m)
{
bufferlist bl;
::encode(pg_num_history, bl);
t.write(coll_t::meta(), make_pg_num_history_oid(), 0, bl.length(), bl);
auto oid = make_pg_num_history_oid();
t.truncate(coll_t::meta(), oid, 0); // we don't need bytes left if new data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing truncate is an excellent idea.
It will empty object content, and will force re-allocation of space for osd_superblock.
We no longer will have the case that osd_superblock is parked forever as first data on the device, always updated in-place by deferred writes.
And I think an info to this effect should be added to code comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

ifed01 and others added 2 commits March 9, 2023 18:41
Achieved by
1. osd superblock data is replicated in onode's OMAP - hence one can
   recover from that after onode's content is corrupted.
2. pg_num_history object gets full overwrite which eliminatess the need to
   merge with previous data (and hence reading corrupted data wouldn't
   kill OSD).

Signed-off-by: Igor Fedotov <ifedotov@croit.io>
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
@ifed01 ifed01 force-pushed the wip-ifed-better-osd-robust branch from 8d2a20e to e7c08ec Compare March 9, 2023 18:04
@ifed01
Copy link
Contributor Author

ifed01 commented Mar 9, 2023

@aclamk - hopefully I resolved your comments, please re-review.

@ifed01
Copy link
Contributor Author

ifed01 commented Mar 10, 2023

jenkins test api

@yuriw yuriw merged commit f4d9ed8 into ceph:main Jul 7, 2023
@ifed01 ifed01 deleted the wip-ifed-better-osd-robust branch July 7, 2023 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants