Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-6605 control: validate auto-generated config file #4676

Merged
merged 9 commits into from Feb 19, 2021

Conversation

tanabarr
Copy link
Contributor

@tanabarr tanabarr commented Feb 17, 2021

Signed-off-by: Tom Nabarro tom.nabarro@intel.com

Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@tanabarr tanabarr changed the title DAOS-XXX control: validate auto-generated config file DAOS-6605 control: validate auto-generated config file Feb 17, 2021
mjmac
mjmac previously approved these changes Feb 17, 2021
berserk-fury
berserk-fury previously approved these changes Feb 17, 2021
@tanabarr tanabarr dismissed stale reviews from berserk-fury and mjmac via d6f6c93 February 17, 2021 23:19
@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Small completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-4676/2/display/redirect

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4676/5/execution/node/302/log

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4676/5/execution/node/339/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4676/5/execution/node/336/log

@daosbuild1
Copy link
Collaborator

Test stage Build on Ubuntu 20.04 with Clang completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4676/5/execution/node/498/log

@daosbuild1
Copy link
Collaborator

Test stage Build on CentOS 7 debug completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4676/5/execution/node/447/log

@daosbuild1
Copy link
Collaborator

Test stage Build on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4676/5/execution/node/378/log

@daosbuild1
Copy link
Collaborator

Test stage Build on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4676/5/execution/node/389/log

@daosbuild1
Copy link
Collaborator

Test stage Build on CentOS 7 release completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4676/5/execution/node/461/log

@daosbuild1
Copy link
Collaborator

Test stage Build on Leap 15 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4676/5/execution/node/513/log

Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@berserk-fury
Copy link
Contributor

The changes look good. I have a small suggestion.

I think that DAOS new users would benefit from adding the insecure flag in the Example RPM Deployment Workflow section.

IIRC by default DAOS runs in secure mode, and since we are trying to generate the config file, we need to override it by adding the -i flag. Otherwise, daos_server will fail to start.

$ daos_server start
DAOS Server config loaded from /etc/daos/daos_server.yml
No DAOS I/O Engines in configuration, DAOS Control Server starting in discovery mode
no control log file specified; logging to stdout
No DAOS I/O Engines in configuration, DAOS Control Server starting in discovery mode
ERROR: could not load key: open /etc/daos/certs/server.key: permission denied

The same would go for the dmg example. In my opinion, the certificate stuff might be confusing for starters. The certificates are not created or installed as part of the rpm installation process. I think that dmg auto-generate assumes that the certificates already exists. The certificates are setup later in the user guide.

@tanabarr
Copy link
Contributor Author

The changes look good. I have a small suggestion.

I think that DAOS new users would benefit from adding the insecure flag in the Example RPM Deployment Workflow section.

IIRC by default DAOS runs in secure mode, and since we are trying to generate the config file, we need to override it by adding the -i flag. Otherwise, daos_server will fail to start.

$ daos_server start
DAOS Server config loaded from /etc/daos/daos_server.yml
No DAOS I/O Engines in configuration, DAOS Control Server starting in discovery mode
no control log file specified; logging to stdout
No DAOS I/O Engines in configuration, DAOS Control Server starting in discovery mode
ERROR: could not load key: open /etc/daos/certs/server.key: permission denied

The same would go for the dmg example. In my opinion, the certificate stuff might be confusing for starters. The certificates are not created or installed as part of the rpm installation process. I think that dmg auto-generate assumes that the certificates already exists. The certificates are setup later in the user guide.

The certificates can be installed prior to RPM install, that's what I do in my workflow using the utils/certs/gen_certificates.sh script. Then insecure mode doesn't need to be used. I'm not sure we should be actively encouraging the use of insecure mode in production environments, it should for development and experimentation purposes only

Comment on lines +179 to +184
Note that there is currently a [bug](https://github.com/pmem/ndctl/issues/130)
in `ndctl` that prevents the NUMA affinity of an nvdimm namespace from being
correctly reported.
This prevents generation of dual engine configs using `dmg config generate`
command on CentOS7 operating systems, the issue has reportedly been fixed on
CentOS8.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This is not completely accurate. The bug is not in ndctl, it was in certain CentOS7.x kernels, and as we discussed, it is no longer a problem in CentOS7.9 (and possibly 7.8, but that's untested).

I suggest a quick doc-only update to the effect that some CentOS 7.x kernels from before the 7.9 release were known to have a defect that prevented ndctl from being able to report on the NUMA affinity for a namespace.

Given that this documentation will be referenced for the next year or so, it's important not to imply that this is an ongoing problem.

@mjmac mjmac merged commit 53762ab into master Feb 19, 2021
@mjmac mjmac deleted the tanabarr/control-autoconfig-fixes branch February 19, 2021 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants