Skip to content

DAOS-18799 pool: Fix handle loading#18016

Open
liw wants to merge 1 commit intomasterfrom
liw/iv_valid
Open

DAOS-18799 pool: Fix handle loading#18016
liw wants to merge 1 commit intomasterfrom
liw/iv_valid

Conversation

@liw
Copy link
Copy Markdown
Contributor

@liw liw commented Apr 15, 2026

It appears that if the PS leader skips the ds_pool_iv_conn_hdls_update
call during pool_svc_step_up_cb because there's no pool handle in the
DB, IV fetches for pool handles will create invalid IV entries and
return unexpected -DER_NOTLEADERs. To prevent that, this patch changes
pool_svc_step_up_cb to call ds_pool_iv_conn_hdls_update even if there's
no pool handle in the DB.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

@github-actions
Copy link
Copy Markdown

Ticket title is 'Infinite DER_NOTLEADER loop on single-server DAOS cluster after pool create'
Status is 'In Progress'
https://daosio.atlassian.net/browse/DAOS-18799

Signed-off-by: Li Wei <liwei@hpe.com>
@daosbuild3
Copy link
Copy Markdown
Collaborator

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Unit Test completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-18016/1/display/redirect

@liw liw marked this pull request as ready for review April 17, 2026 03:00
@liw liw requested review from a team as code owners April 17, 2026 03:00
@liw liw requested a review from wangshilong April 17, 2026 03:00
@liw
Copy link
Copy Markdown
Contributor Author

liw commented Apr 17, 2026

Mark as ready for reviews, for the CI testing is too slow at the moment.

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18016/2/testReport/

@liw
Copy link
Copy Markdown
Contributor Author

liw commented Apr 20, 2026

Build 2:

  • dfuse/daos_build: SRE-3745

Copy link
Copy Markdown
Contributor

@wangshilong wangshilong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit tricky, but looks workable. So if ->iv_valid is true it means IV is updated. but It could be empty/dummy value.

@wangshilong
Copy link
Copy Markdown
Contributor

Probably should go 2.8 as well, NOT leader errors are confusing and annoying

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants