DAOS-3730 tests: Starting agent and server with common params#2403
DAOS-3730 tests: Starting agent and server with common params#2403
Conversation
Ensuring the daos_server and daos_agent commands are started by TestWithServers using configuration files with equal common settings. These changes also enable configuring the transport credentials of both the daos_agent and daos_server through the test yaml file. Since the default configuration file settings are now stored in classes the src/tests/ftest/config_file_gen.py utility has been added to replace the src/tests/ftest/data/daos_*_baseline.yaml files by generating both files with a common access point and the defaults used by the tests. Test-tag-hw-large: pr,hw,large metadata_ior daosracer Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
Test-tag-hw-large: pr,hw,large metadata_ior daosracer soak_smoke Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
|
Test stage Test CentOS 7 RPMs completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/2/execution/node/570/log |
|
Test stage Functional_Hardware_Small completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/2/execution/node/616/log |
|
Test stage Functional completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/2/execution/node/675/log |
|
Test stage Functional_Hardware_Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/2/execution/node/760/log |
|
Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/2/execution/node/826/log |
Test-tag-hw-large: pr,hw,large metadata_ior daosracer soak_smoke Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
|
Test stage Test CentOS 7 RPMs completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/3/execution/node/618/log |
|
Test stage Functional_Hardware_Small completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/3/execution/node/719/log |
|
Test stage Functional_Hardware_Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/3/execution/node/707/log |
|
Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/3/execution/node/690/log |
|
Test stage Functional completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/3/execution/node/731/log |
Test-tag-hw-large: pr,hw,large metadata_ior daosracer soak_smoke Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
|
Test stage Functional_Hardware_Small completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/4/execution/node/706/log |
|
Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/4/execution/node/755/log |
|
Test stage Functional_Hardware_Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/4/execution/node/684/log |
|
Test stage Functional completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/4/execution/node/718/log |
Test-tag-hw-large: pr,hw,large metadata_ior daosracer soak_smoke Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
|
Test stage Functional_Hardware_Small completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/5/execution/node/636/log |
|
Test stage Functional completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/5/execution/node/648/log |
|
Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/5/execution/node/761/log |
Test-tag-hw-large: pr,hw,large metadata_ior daosracer soak_smoke Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
daosbuild1
left a comment
There was a problem hiding this comment.
Style warning(s) for job https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-2403/6/
Please review https://wiki.hpdd.intel.com/display/DC/Coding+Rules
FYI: Errors found in lines not modified in the patch:
src/tests/ftest/io/nvme_fragmentation.py:141:
(pylint-protected-access) Access to a protected member _get_result of a client class
|
Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/6/execution/node/60/log |
|
Test stage Functional_Hardware_Small completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/6/execution/node/637/log |
|
Test stage Functional completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/6/execution/node/674/log |
Test-tag-hw-large: pr,hw,large metadata_ior daosracer soak_smoke Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
|
Test stage Functional completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/7/execution/node/649/log |
|
Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/7/execution/node/759/log |
Test-tag-hw-large: pr,hw,large metadata_ior daosracer soak_smoke Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
|
Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/8/execution/node/824/log |
Dropping metadata_ior tag as this test is not run in PR nor weekly testing. Test-tag-hw-large: pr,hw,large daosracer soak_smoke Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
Test-tag-hw-large: pr,hw,large daosracer soak_smoke Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
|
Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/10/execution/node/759/log |
Test-tag-hw-large: pr,hw,large daosracer soak_smoke Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
|
Test stage Functional completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-2403/11/testReport/(root)/ |
|
Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/11/execution/node/764/log |
Test-tag-hw-large: pr,hw,large daosracer soak_smoke Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
|
Test stage run_test.sh completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/12/execution/node/372/log |
| "rc=0", | ||
| "if pgrep --list-full {}".format(pattern), | ||
| "then rc=1", | ||
| "sudo pkill {}".format(pattern), | ||
| "if pgrep --list-full {}".format(pattern), | ||
| "then sleep 5", | ||
| "pkill --signal KILL {}".format(pattern), | ||
| "fi", | ||
| "fi", | ||
| "exit $rc", |
There was a problem hiding this comment.
Not sure if this is the PR that was going to address the shutdown issue, but it seems like it would be better to set rc=1 if the second pgrep finds processes running. Also, seems like the sleep 5 should be after the first pkill in order to give things time to shutdown before moving on to more aggressive means of cleaning up.
There was a problem hiding this comment.
This PR may eventually have a role in the shutdown issue (by means of enabling clush to launch servers), but it is not intended to directly address the issue. As mentioned in the docstring, the intent of the exit status (rc) is to indicate whether or not processes where detected (and then attempted to be killed). Its current intended use is for ensuring the test starts clean by killing any leftover processes from a previous test and it is used in the kill() method of the DaosAgentManager and DaosServerManger. In terms of the sleep 5 placement, it probably should be moved as you suggested. Given that this method is currently called as a means of killing processes that should have been killed in the previous test's tearDown() and tests will be unlikely to hit the first pkill anyway (its more of a safety net), would you be ok with fixing it in a future PR or should I resolve it here?
There was a problem hiding this comment.
If I understand you correctly, this sequence isn't currently used as part of the shutdown, and therefore making the suggested changes on this PR won't solve the shutdown issue. If that's the case, then I'm fine with getting this landed as-is and adjusting later in a PR focused on the shutdown problem.
There was a problem hiding this comment.
Yes, you understand me correctly.
amjustin13
left a comment
There was a problem hiding this comment.
Just a few comments and questions, nothing major. I'm still going through a couple of the files, might have a couple more comments later.
| """ | ||
| super(DaosAgentConfig, self).get_params(test) | ||
| self.transport_params.get_params(test) | ||
| def include_local_host(hosts): |
There was a problem hiding this comment.
Not a major thing... just a thought. This wouldn't be used for other purposes? This seems like a very general thing to do and could be something that can live in general_utils.py.
There was a problem hiding this comment.
Ideally we should not need to use it, allowing the test control node to be independent of the hosts being used for testing. Commands like daos container create ... should be run on one of the hosts identified as a client - not the test control host.
| self.pattern = "Listening on " | ||
|
|
||
| # If specified use the configuration file from the YamlParameters object | ||
| default_yaml_file = None |
There was a problem hiding this comment.
The default_yaml_file is None by default and will only have a value if self.yaml is a YamlParameters object. So there isn't a way to set the default, other than setting self.yaml.filename. In which case, would it be beneficial to have the default_yaml_file default to some filename instead of None?
There was a problem hiding this comment.
Typically we will always pass in a yaml_cfg argument as it defines some of the other configuration file parameters that are common between the agent and server configuration files (and these values should be the same in order for communication to occur correctly), but if you wanted a different default for the DaosAgentCommand.config attribute, you could just call DaosAgentCommand.config.update_default(<new_deafult_value>) after creating a DaosAgentCommand object
| # Support specifying timeout values with units, e.g. "1d 2h 3m 4s". | ||
| # Any unit combination may be used, but they must be specified in | ||
| # descending order. Spaces can optionally be used between units and | ||
| # values. The first unit character is required; other unit characters | ||
| # are optional. The units are case-insensitive. | ||
| # The following units are supported: | ||
| # - days e.g. 1d, 1 day | ||
| # - hours e.g. 2h, 2 hrs, 2 hours | ||
| # - minutes e.g. 3m, 3 mins, 3 minutes | ||
| # - seconds e.g. 4s, 4 secs, 4 seconds | ||
| if isinstance(self.timeout, str): | ||
| pattern = r"" | ||
| for interval in ("days", "hours", "minutes", "seconds"): | ||
| pattern += r"(?:(\d+)(?:\s*{0}[{1}]*\s*)){{0,1}}".format( | ||
| interval[0], interval[1:]) | ||
| # pylint: disable=no-member | ||
| dhms = re.search(pattern, self.timeout, re.IGNORECASE).groups() | ||
| # pylint: enable=no-member | ||
| self.timeout = 0 | ||
| for index, multiplier in enumerate([24 * 60 * 60, 60 * 60, 60, 1]): | ||
| if dhms[index] is not None: | ||
| self.timeout += multiplier * int(dhms[index]) |
There was a problem hiding this comment.
Why not place this it's own function? or even place it in general_utils?
There was a problem hiding this comment.
It could be a function in general_utils.py. Currently I believe this is the only place it is used.
|
Unless there are major objections or concerns with landing this PR as-is, I'd like to advocate for getting it in sooner rather than later. Apparently there are more weekly tests that will be failing until I can land a fix for DAOS-4546. Update: Created PR #2511 based on the work in this PR. When this lands, that one should be all ready to go. Was pretty easy to do, nice work @phender! |
Ensuring the daos_server and daos_agent commands are started by TestWithServers using configuration files with equal common settings. These changes also enable configuring the transport credentials of both the daos_agent and daos_server through the test yaml file. Since the default configuration file settings are now stored in classes the src/tests/ftest/config_file_gen.py utility has been added to replace the src/tests/ftest/data/daos_*_baseline.yaml files by generating both files with a common access point and the defaults used by the tests. Signed-off-by: Phillip Henderson <phillip.henderson@intel.com> Conflicts: src/tests/ftest/control/daos_admin_privileged.py src/tests/ftest/control/super_block_versioning.py src/tests/ftest/data/daos_server_baseline.yaml src/tests/ftest/io/nvme_fragmentation.py src/tests/ftest/launch.py src/tests/ftest/server/metadata.py src/tests/ftest/util/agent_utils.py src/tests/ftest/util/dfuse_utils.py src/tests/ftest/util/fio_test_base.py src/tests/ftest/util/ior_test_base.py src/tests/ftest/util/ior_utils.py src/tests/ftest/util/mdtest_test_base.py src/tests/ftest/util/mdtest_utils.py src/tests/ftest/util/server_utils.py
Ensuring the daos_server and daos_agent commands are started by
TestWithServers using configuration files with equal common settings.
These changes also enable configuring the transport credentials of both
the daos_agent and daos_server through the test yaml file. Since the
default configuration file settings are now stored in classes the
src/tests/ftest/config_file_gen.py utility has been added to replace
the src/tests/ftest/data/daos_*_baseline.yaml files by generating both
files with a common access point and the defaults used by the tests.
Test-tag-hw-large: pr,hw,large metadata_ior daosracer
Signed-off-by: Phillip Henderson phillip.henderson@intel.com