Skip to content

DAOS-18495 md: 24k ABT stack size for md_on_ssd#17686

Merged
daltonbohning merged 11 commits intomasterfrom
grom72/DAOS-18495-ABT_THREAD_STACKSIZE-24k-md-on-ssd
Mar 19, 2026
Merged

DAOS-18495 md: 24k ABT stack size for md_on_ssd#17686
daltonbohning merged 11 commits intomasterfrom
grom72/DAOS-18495-ABT_THREAD_STACKSIZE-24k-md-on-ssd

Conversation

@grom72
Copy link
Contributor

@grom72 grom72 commented Mar 11, 2026

Increase min ABT stack size to 24k for md_on_ssd

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

@github-actions
Copy link

github-actions bot commented Mar 11, 2026

Ticket title is 'Random ranks are excluded with signal (6) with master branch.'
Status is 'In Review'
Labels: 'scrubbed_2.8,test_2.8,testp1'
Job should run at elevated priority (1)
https://daosio.atlassian.net/browse/DAOS-18495

@grom72 grom72 requested review from NiuYawei and kjacque and removed request for kjacque March 11, 2026 06:33
grom72 added 2 commits March 11, 2026 07:36
Increase min ABT stack size to 24k for md_on_ssd

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>

Priority: 2
Priority: 2
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
@grom72 grom72 force-pushed the grom72/DAOS-18495-ABT_THREAD_STACKSIZE-24k-md-on-ssd branch from 6992eb3 to 71e9059 Compare March 11, 2026 06:37
NiuYawei
NiuYawei previously approved these changes Mar 11, 2026
@NiuYawei NiuYawei requested a review from tanabarr March 11, 2026 06:56
@daosbuild3
Copy link
Collaborator

@daosbuild3
Copy link
Collaborator

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>

Priority: 2
Final implementation based on HasBdevRoleMeta().

Tests improvements.

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>

Priority: 2

Allow-unstable-test: true
@daosbuild3
Copy link
Collaborator

@grom72 grom72 requested a review from NiuYawei March 11, 2026 11:42
NiuYawei
NiuYawei previously approved these changes Mar 11, 2026
WithEnvVarAbtThreadStackSize(minABTThreadStackSizeMdOnSsd),
expABTthreadStackSize: minABTThreadStackSizeMdOnSsd,
},
"config for md_on_ssd without thread size should sed ABT_THREAD_STACKSIZE": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"sed"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 94a2f4f

tanabarr
tanabarr previously approved these changes Mar 11, 2026
Copy link
Contributor

@tanabarr tanabarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nonblocking suggestions, otherwise looks good

t.Run(name, func(t *testing.T) {
err := tc.cfg.UpdateABTEnvarsMdOnSsd()
test.CmpErr(t, tc.expErr, err)
if err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return early if err != nil rather than nesting here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 94a2f4f

err := tc.cfg.UpdateABTEnvarsMdOnSsd()
test.CmpErr(t, tc.expErr, err)
if err == nil {
if tc.expABTthreadStackSize == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it would be tidier to run GetEnvVar before if check then perform checks on the result based on expected stack size after e.g. GetEnvVar; if expSS==0 {; Assert(err!=nil); return; }; <note no else clause> Assert(err==nil)....
Generally returning early where possible and reducing nesting to a minimum is seen as a good thing to improve readability of code in Go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 94a2f4f

Only Unit Tests and NLT to be re-run to confirm that logic has not been
changed.

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>

Priority: 2
Cancel-prev-build: false
Skip-unit-test-memcheck: true

Skip-test: true
Skip-func-test: true

Skip-func-vm: true

Skip-func-hw-test: true
@grom72 grom72 dismissed stale reviews from tanabarr and NiuYawei via 94a2f4f March 11, 2026 13:35
@grom72 grom72 requested review from NiuYawei and tanabarr March 11, 2026 13:36
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>

Priority: 2

Allow-unstable-test: true
@grom72 grom72 marked this pull request as ready for review March 12, 2026 07:36
@grom72 grom72 requested review from a team as code owners March 12, 2026 07:36
tanabarr
tanabarr previously approved these changes Mar 12, 2026
Copy link
Contributor

@tanabarr tanabarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

NiuYawei
NiuYawei previously approved these changes Mar 12, 2026
@grom72 grom72 requested review from a team as code owners March 16, 2026 07:47
…S-18495-ABT_THREAD_STACKSIZE-24k-md-on-ssd"

This reverts commit d5bc34e.

Priority: 2
Allow-unstable-test: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
@grom72 grom72 requested review from NiuYawei, kjacque and tanabarr March 16, 2026 12:40
@grom72 grom72 requested a review from a team March 17, 2026 13:35
@grom72 grom72 self-assigned this Mar 18, 2026
@daltonbohning daltonbohning removed request for a team March 18, 2026 13:55
@daltonbohning
Copy link
Contributor

Why is this running Functional on EL 8 instead of EL 9?

@grom72
Copy link
Contributor Author

grom72 commented Mar 19, 2026

Why is this running Functional on EL 8 instead of EL 9?

It looks like the Jenkins file parameters haven't been updated correctly:
https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-17686/15/parameters/

Revert "Merge remote-tracking branch 'origin/master' into grom72/DAOS…
restore EL 8 as primary functional VM test OS

but
Reapply "Merge remote-tracking branch 'origin/master' into grom72/DAO… did not restore proper (EL9) value.

Next build behaves properly

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>

Priority: 2

Skip-unit-tests:true
Skip-unit-test: true
Skip-NLT: true
Skip-unit-test-memcheck: true

Skip-fault-injection-test: true
Skip-test-el-9.6-rpms: true
Skip-test-leap-15-rpms: true

Skip-func-hw-test: true
@grom72 grom72 requested a review from a team March 19, 2026 09:11
@daltonbohning daltonbohning added the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label Mar 19, 2026
@daltonbohning daltonbohning merged commit 6bdb8f9 into master Mar 19, 2026
32 checks passed
@daltonbohning daltonbohning deleted the grom72/DAOS-18495-ABT_THREAD_STACKSIZE-24k-md-on-ssd branch March 19, 2026 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. priority Ticket has high priority (automatically managed)

Development

Successfully merging this pull request may close these issues.

6 participants