Skip to content

osd: Correct scrub analysis for optimised EC#62952

Merged
ljflores merged 2 commits intoceph:mainfrom
aainscow:scrub_improvements
Jun 18, 2025
Merged

osd: Correct scrub analysis for optimised EC#62952
ljflores merged 2 commits intoceph:mainfrom
aainscow:scrub_improvements

Conversation

@aainscow
Copy link
Copy Markdown
Contributor

@aainscow aainscow commented Apr 24, 2025

This PR changes the scrub code, adjusting for new EC. It does not attempt to fix any bugs which were detected (these will come in EC PRs).

There are two major areas where optimised EC differs:

  1. Shard sizes. The shards are not all the same size, but are algorithmically related to the object size. Scrub now checks intended size.
  2. version. Not all shards are updated on every writes. This means that the object versions on non-updated shards are inaccurate. The "authoritative" shards contains a list of versions which are old. Scrub now uses this to check.
  3. New EC refuses to allow some shards to be primaries. These shards can never be authoritative in scrub.

NOTE: this PR is NOT implementing any deep-scrub CRC checking, this will be a later PR.

Tracker: https://tracker.ceph.com/issues/71285

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

@aainscow aainscow requested a review from a team as a code owner April 24, 2025 14:25
@github-actions github-actions bot added the core label Apr 24, 2025
@aainscow aainscow marked this pull request as draft April 24, 2025 15:40
@ronen-fr
Copy link
Copy Markdown
Contributor

(I'll try to get to this on Sunday)


// used to verify our "cleanliness" before scrubbing
virtual bool is_waiting_for_unreadable_object() const = 0;
virtual bool get_is_nonprimary_shard(const pg_shard_t &pg_shard) const = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add. The nonprimary shard is one that can never be a primary (or by implication authoritative)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be forgetting something here, but: why can't it be authoritative? in replicated pools - it surely can.

Copy link
Copy Markdown
Contributor Author

@aainscow aainscow Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nonprimary shards in new EC contain an old OI and do not have the snapset attributes, as such I don't think they can be authoritative.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that we do not have any redundancy regarding the snapset attribute?

Copy link
Copy Markdown
Contributor Author

@aainscow aainscow May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snapset (and all other xattrs) is on shard 0 and all the parity shards. This gives the same redundancy as the data (i.e. m failures for a k+m EC)

@ronen-fr
Copy link
Copy Markdown
Contributor

@aainscow - this is marked as draft, so, just to be sure: this is the correct PR to review? thanks

@aainscow
Copy link
Copy Markdown
Contributor Author

@aainscow - this is marked as draft, so, just to be sure: this is the correct PR to review? thanks

Yes please. I marked as draft as I was getting some scrub errors, but it turns out they were fast-EC bugs, rather than scrubs.

@aainscow aainscow marked this pull request as ready for review April 28, 2025 13:54
@ronen-fr
Copy link
Copy Markdown
Contributor

Can we implement the new mechanism for all EC PGs, "optimized" or not?

const bufferlist& auth_bl = auth_attr->second;
if (auth_oi.get_version_for_shard(shard.shard) == auth_oi.version) {
// The expected version of the shard and the authoritative shard are the
// same, so we can do a simple memcmp of the OI attr.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice idea!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth a separate PR (which can be backported)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shard_version in OI does not exist pre-tentacle. I think this whole PR needs backporting to tentacle.

I can split out if you want, however.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea. Not just for backporting, but it would help there for sure.

@aainscow
Copy link
Copy Markdown
Contributor Author

Can we implement the new mechanism for all EC PGs, "optimized" or not?

I am not sure what is useful for non-optimized EC in here. All the non-primary changes are new-EC specific, as is the shard versioning.

…xactly match.

With optimised EC, some shards do not receive every IO updates (partial writes).

In such cases, the OI on these "nonprimary" shards is expected to be out of date.
This commit allows for the mismatch and instead checks that the OI can be decoded
and checks the version against the shard versions in the authoritative OI.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
@aainscow aainscow force-pushed the scrub_improvements branch from 9bc8d43 to ab5284f Compare May 12, 2025 14:06
@aainscow aainscow requested a review from ronen-fr May 12, 2025 14:17
@ronen-fr
Copy link
Copy Markdown
Contributor

@aainscow - I am still missing something. isn't it that not_usable_no_err is set (also) for the replicas in replicated pools?

@ronen-fr ronen-fr requested a review from athanatos May 13, 2025 12:26
@aainscow
Copy link
Copy Markdown
Contributor Author

@aainscow - I am still missing something. isn't it that not_usable_no_err is set (also) for the replicas in replicated pools?

This is the only thing to set not_usable_no_err:
nonprimary?usable_t::not_usable_no_err:usable_t::usable

nonprimary can only be true for new EC profiles. So I don't agree, or am I looking at the wrong thing?

@JonBailey1993 JonBailey1993 force-pushed the scrub_improvements branch 2 times, most recently from 32cc9bf to 8c7e79f Compare May 15, 2025 10:12
@JonBailey1993 JonBailey1993 self-assigned this May 15, 2025
@JonBailey1993
Copy link
Copy Markdown
Contributor

I believe open comments should now be addressed. Requesting re-review.

@JonBailey1993 JonBailey1993 requested a review from ronen-fr May 15, 2025 10:15
@ronen-fr
Copy link
Copy Markdown
Contributor

LGTM. I have emailed @athanatos , and asked him to go over this PR and the other one.

@ronen-fr
Copy link
Copy Markdown
Contributor

jenkins test make check arm64

This is a collection of minor changes to make scrub compatible with new EC.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Copy link
Copy Markdown
Contributor

@athanatos athanatos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me.

@ronen-fr
Copy link
Copy Markdown
Contributor

jenkins test make check arm64

@aainscow
Copy link
Copy Markdown
Contributor Author

aainscow commented May 18, 2025

jenkins retest this please

Looks like this doesn't clear the default thing!

@ronen-fr
Copy link
Copy Markdown
Contributor

ronen-fr commented May 18, 2025

@aainscow

the default thing

is not marked as 'required'. Just ignore it

@aainscow aainscow changed the title osd: Correct scrub analysis for OSDs osd: Correct scrub analysis for optimised EC Jun 11, 2025
@amathuria
Copy link
Copy Markdown
Contributor

@ljflores ljflores merged commit 46c0d5c into ceph:main Jun 18, 2025
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants