Skip to content

DAOS-16881 control: Fix daos_server scm prep for single missing ns#15632

Merged
daltonbohning merged 3 commits intomasterfrom
tanabarr/control-ds-scm-prep-nospc-fix
Jan 6, 2025
Merged

DAOS-16881 control: Fix daos_server scm prep for single missing ns#15632
daltonbohning merged 3 commits intomasterfrom
tanabarr/control-ds-scm-prep-nospc-fix

Conversation

@tanabarr
Copy link
Copy Markdown
Contributor

@tanabarr tanabarr commented Dec 17, 2024

When single socket is missing a pmem namespace on dual-socket host a
confusing no-space error can be returned from daos_server scm prepare.
The previously required workaround is to specify --socket. Fix this
issue by adding NumaNode in fall-back case where ndctl region idset
overflow requires matching of numa/socket via ipmctl region info
instead. Also add unit test cases to cover the situation.

Features: control
Required-githooks: true

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

Features: control
Required-githooks: true

Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@tanabarr tanabarr requested review from a team as code owners December 17, 2024 18:02
@tanabarr tanabarr self-assigned this Dec 17, 2024
@github-actions
Copy link
Copy Markdown

Ticket title is 'Erroneous behaviour of daos_server scm scan'
Status is 'In Review'
Labels: 'PMEM'
https://daosio.atlassian.net/browse/DAOS-16881

@tanabarr tanabarr added bug control-plane work on the management infrastructure of the DAOS Control Plane labels Dec 17, 2024
case 0:
return nil, nil, errors.New("no numa nodes were processed")
return nil, nil, errors.New("no namespaces created on regions with free capacity")
case 1:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: not sure the case 1 still needed.
If yes, the switch could probably be replace by a simple if len(numaIDs) == 0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think case 1 is still needed, although it confused me for a minute, too. Maybe clearer:

if len(numaIDs) == 0 {
    return return nil, nil, errors.New("no namespaces created on regions with free capacity")
}

if len(numaIDs) > 1 && sockSelector != sockAny {
    return nil, nil, errors.Errorf("unexpected number of numa nodes processed, want 1 got %d", len(numaIDs))
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear why an empty switch statement is problematic, it is just allowing a drop through much like a break, I'm requesting reviews again because I don't see the need to change it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't block on it. It is technically correct. I just thought it might be worth changing if it confused two of us on the first read.

Personally I find it harder to parse and understand greater-than and less-than checks when using switch in this way, as opposed to either phrasing the cases as conditionals (e.g. case len(numaIDs) > 1), or using if-statements. I effectively had to translate the switch cases into conditionals for my own understanding.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Copy Markdown
Contributor

@kjacque kjacque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise looks OK, comment is small but I think important for readability.

case 0:
return nil, nil, errors.New("no numa nodes were processed")
return nil, nil, errors.New("no namespaces created on regions with free capacity")
case 1:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think case 1 is still needed, although it confused me for a minute, too. Maybe clearer:

if len(numaIDs) == 0 {
    return return nil, nil, errors.New("no namespaces created on regions with free capacity")
}

if len(numaIDs) > 1 && sockSelector != sockAny {
    return nil, nil, errors.Errorf("unexpected number of numa nodes processed, want 1 got %d", len(numaIDs))
}

@tanabarr tanabarr requested review from kjacque and knard38 December 19, 2024 13:52
kjacque
kjacque previously approved these changes Dec 19, 2024
…-scm-prep-nospc-fix

Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Features: control
Required-githooks: true

Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@tanabarr tanabarr requested a review from a team January 6, 2025 13:25
@daltonbohning daltonbohning merged commit fe389f9 into master Jan 6, 2025
@daltonbohning daltonbohning deleted the tanabarr/control-ds-scm-prep-nospc-fix branch January 6, 2025 17:00
tanabarr added a commit that referenced this pull request Jan 8, 2025
…15632)

When single socket is missing a pmem namespace on dual-socket host a
confusing no-space error can be returned from daos_server scm prepare.
The previously required workaround is to specify --socket. Fix this
issue by adding NumaNode in fall-back case where ndctl region idset
overflow requires matching of numa/socket via ipmctl region info
instead. Also add unit test cases to cover the situation.

Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Features: control
daltonbohning pushed a commit that referenced this pull request Jan 9, 2025
…15632) (#15701)

When single socket is missing a pmem namespace on dual-socket host a
confusing no-space error can be returned from daos_server scm prepare.
The previously required workaround is to specify --socket. Fix this
issue by adding NumaNode in fall-back case where ndctl region idset
overflow requires matching of numa/socket via ipmctl region info
instead. Also add unit test cases to cover the situation.

Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@mjmac mjmac mentioned this pull request Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug control-plane work on the management infrastructure of the DAOS Control Plane

Development

Successfully merging this pull request may close these issues.

4 participants