DAOS-17468 control: Prevent start if transparent hugepages are enabled#16313
DAOS-17468 control: Prevent start if transparent hugepages are enabled#16313daltonbohning merged 23 commits intomasterfrom
Conversation
|
Ticket title is 'Prevent start if transparent hugepages are enabled' |
Features: control Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
d46e506 to
5c07867
Compare
Features: control Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Features: control Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Features: control Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
kjacque
left a comment
There was a problem hiding this comment.
All my issues were addressed. You'll want to fix the typo in the title of the PR (DARS -> DAOS), otherwise looks good.
|
@ryon-jensen @JohnMalmberg can we please ensure that transparent hugepages feature is disabled on all CI test runners. if not it will create problems with DAOS and this PR will cause failures. TIA |
…-thp Features: control Allow-unstable-test: true Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
…-thp Features: control Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
|
Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16313/7/execution/node/1095/log |
|
Test stage Functional on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16313/7/execution/node/1086/log |
…-thp Features: control Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
|
Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16313/8/execution/node/1081/log |
|
Test stage Functional on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16313/8/execution/node/1095/log |
|
@ryon-jensen functional tests are failing because presumably on test runner THP is enabled: https://jenkins.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16313/8/#showFailuresLink I wonder whether THP needs to be enabled on the runner? if we find situations where THP needs to be enabled e.g. VMs then we can add override flag to skip to check. |
…-thp Features: control Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
|
Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16313/9/execution/node/1056/log |
|
Test stage Functional on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16313/9/execution/node/1113/log |
|
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16313/20/testReport/ |
kjacque
left a comment
There was a problem hiding this comment.
Overall looks good. Nothing I would block on.
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16313/21/execution/node/1254/log |
docs/admin/deployment.md
Outdated
| ``` | ||
|
|
||
| If `allow_thp: true` parameter is set in server config file global section, the behavior will change | ||
| and the server will start (with THP enabled. SCM tmpfs will be mounted with `huge=always` on `dmg |
There was a problem hiding this comment.
NIT
| and the server will start (with THP enabled. SCM tmpfs will be mounted with `huge=always` on `dmg | |
| and the server will start (with THP enabled). SCM tmpfs will be mounted with `huge=always` on `dmg |
There was a problem hiding this comment.
will update on follow on or repush
|
CI run no. 21 passed all tests apart from one known OSADrain failure. NLT test was skipped in the unit stage. Rerunning only NLT. |
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16313/23/testReport/ |
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
… NLT" This reverts commit 5892623. Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
|
Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16313/26/execution/node/1393/log |
This reverts commit 3d55554. Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Doc-only: false Priority: 2 Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16313/28/testReport/ |
kjacque
left a comment
There was a problem hiding this comment.
Nothing I'd block on, just some suggestions for a follow-on.
| seenScmClsIdx = idx | ||
|
|
||
| if seenScmHugeIdx != -1 && scmConf.Scm.DisableHugepages != seenScmHuge { | ||
| log.Debugf("scm_hugepages_disabled entry %v in %d doesn't match %d", |
There was a problem hiding this comment.
Might be worth using Error here to log the details, since we error out anyway.
There was a problem hiding this comment.
yes, will add in follow-on
| defer test.ShowBufferOnFailure(t, buf) | ||
|
|
||
| conf := DefaultServer(). | ||
| WithAllowTHP(true). // Enable differences between scm_hugepages_disabled. |
There was a problem hiding this comment.
Would be useful to have a case where allow_thp: false?
There was a problem hiding this comment.
will add in other follow-on although as it's the default value it gets tested in multiple other places
#16313) When THP feature is enabled on linux platforms, SPDK related hugepage management in DAOS performs sub-optimally. Resulting problems relate to memory accounting and fragmentation. To remedy, refuse to start daos_server if THP is enabled on platform and recommend disabling THP by applying kernel commandline parameters effective on reboot. Features: control Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
#16313) When THP feature is enabled on linux platforms, SPDK related hugepage management in DAOS performs sub-optimally. Resulting problems relate to memory accounting and fragmentation. To remedy, refuse to start daos_server if THP is enabled on platform and recommend disabling THP by applying kernel commandline parameters effective on reboot. Features: control Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
When THP feature is enabled on linux platforms, SPDK related
hugepage management in DAOS performs sub-optimally. Resulting problems
relate to memory accounting and fragmentation. To remedy, refuse to
start daos_server if THP is enabled on platform and recommend
disabling THP by applying kernel commandline parameters effective on
reboot.
Features: control
Steps for the author:
After all prior steps are complete: