Skip to content

DAOS-18524 object: fix EC checksum in rebuild, aggregation#17908

Merged
gnailzenh merged 1 commit intorelease/2.8from
kccain/daos_18524_ec_csum_agg_csum_rel2p8
Apr 8, 2026
Merged

DAOS-18524 object: fix EC checksum in rebuild, aggregation#17908
gnailzenh merged 1 commit intorelease/2.8from
kccain/daos_18524_ec_csum_agg_csum_rel2p8

Conversation

@kccain
Copy link
Copy Markdown
Contributor

@kccain kccain commented Apr 5, 2026

Fix two bugs causing -DER_CSUM failures with non-power-of-2 checksum chunk sizes on EC objects:

  1. In __migrate_fetch_update_bulk(), checksums were computed while IOD recxs were still in DAOS-space (needed for the preceding fetch), but stored at VOS-space offsets. When cksum_size does not evenly divide ec_cell_sz, chunk boundaries differ between coordinate spaces, producing mismatched checksums. Move mrone_recx_daos2_vos() before migrate_csum_calc() so checksums align with VOS storage offsets.

  2. In vos_csum_recalc.c, calc_csum_params() and csum_agg_verify() used unsigned int for extent offsets, truncating the PARITY_INDICATOR bit (0x8000000000000000) on parity extents. With non-power-of-2 chunk sizes, this truncation shifts chunk boundaries, causing aggregation checksum verification to always fail. Widen to uint64_t.

Allow-unstable-test: true
Features: checksum rebuild ec aggregation

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

Fix two bugs causing -DER_CSUM failures with non-power-of-2 checksum
chunk sizes on EC objects:

1. In __migrate_fetch_update_bulk(), checksums were computed while IOD
   recxs were still in DAOS-space (needed for the preceding fetch), but
   stored at VOS-space offsets. When cksum_size does not evenly divide
   ec_cell_sz, chunk boundaries differ between coordinate spaces,
   producing mismatched checksums. Move mrone_recx_daos2_vos() before
   migrate_csum_calc() so checksums align with VOS storage offsets.

2. In vos_csum_recalc.c, calc_csum_params() and csum_agg_verify() used
   unsigned int for extent offsets, truncating the PARITY_INDICATOR bit
   (0x8000000000000000) on parity extents. With non-power-of-2 chunk
   sizes, this truncation shifts chunk boundaries, causing aggregation
   checksum verification to always fail. Widen to uint64_t.

Allow-unstable-test: true
Features: checksum rebuild ec aggregation

Signed-off-by: Kenneth Cain <kenneth.cain@hpe.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 5, 2026

Ticket title is 'Observe DER_CSUM -2021 errors and "Data corruption found for recx" engine logging during reintegration and client data verify read-back'
Status is 'In Progress'
Labels: 'test_2.8'
Job should run at elevated priority (1)
https://daosio.atlassian.net/browse/DAOS-18524

@github-actions github-actions Bot added the priority Ticket has high priority (automatically managed) label Apr 5, 2026
@kccain kccain added the clean-cherry-pick Cherry-pick from another branch that did not require additional edits label Apr 6, 2026
@kccain kccain marked this pull request as ready for review April 7, 2026 11:42
@kccain kccain requested review from a team as code owners April 7, 2026 11:42
@gnailzenh gnailzenh merged commit 7852828 into release/2.8 Apr 8, 2026
41 checks passed
@gnailzenh gnailzenh deleted the kccain/daos_18524_ec_csum_agg_csum_rel2p8 branch April 8, 2026 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clean-cherry-pick Cherry-pick from another branch that did not require additional edits priority Ticket has high priority (automatically managed)

Development

Successfully merging this pull request may close these issues.

3 participants