Skip to content

DAOS-3730 tests: Starting agent and server with common params#2403

Merged
mjmac merged 16 commits intomasterfrom
DAOS-3730_2
Apr 23, 2020
Merged

DAOS-3730 tests: Starting agent and server with common params#2403
mjmac merged 16 commits intomasterfrom
DAOS-3730_2

Conversation

@phender
Copy link
Contributor

@phender phender commented Apr 11, 2020

Ensuring the daos_server and daos_agent commands are started by
TestWithServers using configuration files with equal common settings.
These changes also enable configuring the transport credentials of both
the daos_agent and daos_server through the test yaml file. Since the
default configuration file settings are now stored in classes the
src/tests/ftest/config_file_gen.py utility has been added to replace
the src/tests/ftest/data/daos_*_baseline.yaml files by generating both
files with a common access point and the defaults used by the tests.

Test-tag-hw-large: pr,hw,large metadata_ior daosracer

Signed-off-by: Phillip Henderson phillip.henderson@intel.com

phender added 2 commits April 11, 2020 10:05
Ensuring the daos_server and daos_agent commands are started by
TestWithServers using configuration files with equal common settings.
These changes also enable configuring the transport credentials of both
the daos_agent and daos_server through the test yaml file.  Since the
default configuration file settings are now stored in classes the
src/tests/ftest/config_file_gen.py utility has been added to replace
the src/tests/ftest/data/daos_*_baseline.yaml files by generating both
files with a common access point and the defaults used by the tests.

Test-tag-hw-large: pr,hw,large metadata_ior daosracer

Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
Test-tag-hw-large: pr,hw,large metadata_ior daosracer soak_smoke

Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage Test CentOS 7 RPMs completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/2/execution/node/570/log

@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Small completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/2/execution/node/616/log

@daosbuild1
Copy link
Collaborator

@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/2/execution/node/760/log

@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/2/execution/node/826/log

phender added 2 commits April 12, 2020 22:47
Test-tag-hw-large: pr,hw,large metadata_ior daosracer soak_smoke

Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage Test CentOS 7 RPMs completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/3/execution/node/618/log

@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Small completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/3/execution/node/719/log

@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/3/execution/node/707/log

@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/3/execution/node/690/log

@daosbuild1
Copy link
Collaborator

phender added 2 commits April 13, 2020 09:22
Test-tag-hw-large: pr,hw,large metadata_ior daosracer soak_smoke

Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Small completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/4/execution/node/706/log

@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/4/execution/node/755/log

@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/4/execution/node/684/log

@daosbuild1
Copy link
Collaborator

Test-tag-hw-large: pr,hw,large metadata_ior daosracer soak_smoke

Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Small completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/5/execution/node/636/log

@daosbuild1
Copy link
Collaborator

@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/5/execution/node/761/log

Test-tag-hw-large: pr,hw,large metadata_ior daosracer soak_smoke

Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style warning(s) for job https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-2403/6/
Please review https://wiki.hpdd.intel.com/display/DC/Coding+Rules

FYI: Errors found in lines not modified in the patch:

src/tests/ftest/io/nvme_fragmentation.py:141:
(pylint-protected-access) Access to a protected member _get_result of a client class

@daosbuild1
Copy link
Collaborator

@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Small completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/6/execution/node/637/log

@daosbuild1
Copy link
Collaborator

Test-tag-hw-large: pr,hw,large metadata_ior daosracer soak_smoke

Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
@daosbuild1 daosbuild1 dismissed their stale review April 14, 2020 03:38

Updated patch

@daosbuild1
Copy link
Collaborator

@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/7/execution/node/759/log

Test-tag-hw-large: pr,hw,large metadata_ior daosracer soak_smoke

Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/8/execution/node/824/log

phender added 3 commits April 15, 2020 11:24
Dropping metadata_ior tag as this test is not run in PR nor weekly
testing.

Test-tag-hw-large: pr,hw,large daosracer soak_smoke

Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
Test-tag-hw-large: pr,hw,large daosracer soak_smoke

Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
@phender phender requested a review from mjean308 April 15, 2020 21:20
@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/10/execution/node/759/log

Test-tag-hw-large: pr,hw,large daosracer soak_smoke

Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
@daosbuild1
Copy link
Collaborator

@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-2403/11/execution/node/764/log

phender added 2 commits April 17, 2020 14:20
Test-tag-hw-large: pr,hw,large daosracer soak_smoke

Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
@daosbuild1
Copy link
Collaborator

@phender
Copy link
Contributor Author

phender commented Apr 20, 2020

Build 13 resulted in 2 test failures:

  • daos_test/daos_core_test.py variant 13 (test_C): known issue DAOS-3945
  • daos_test/daos_core_test-rebuild.py variant 13 (test_r_26): new issue DAOS-4630 (may be related to DAOS-4626)

Copy link
Contributor

@shimizukko shimizukko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Comment on lines +298 to +307
"rc=0",
"if pgrep --list-full {}".format(pattern),
"then rc=1",
"sudo pkill {}".format(pattern),
"if pgrep --list-full {}".format(pattern),
"then sleep 5",
"pkill --signal KILL {}".format(pattern),
"fi",
"fi",
"exit $rc",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the PR that was going to address the shutdown issue, but it seems like it would be better to set rc=1 if the second pgrep finds processes running. Also, seems like the sleep 5 should be after the first pkill in order to give things time to shutdown before moving on to more aggressive means of cleaning up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR may eventually have a role in the shutdown issue (by means of enabling clush to launch servers), but it is not intended to directly address the issue. As mentioned in the docstring, the intent of the exit status (rc) is to indicate whether or not processes where detected (and then attempted to be killed). Its current intended use is for ensuring the test starts clean by killing any leftover processes from a previous test and it is used in the kill() method of the DaosAgentManager and DaosServerManger. In terms of the sleep 5 placement, it probably should be moved as you suggested. Given that this method is currently called as a means of killing processes that should have been killed in the previous test's tearDown() and tests will be unlikely to hit the first pkill anyway (its more of a safety net), would you be ok with fixing it in a future PR or should I resolve it here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand you correctly, this sequence isn't currently used as part of the shutdown, and therefore making the suggested changes on this PR won't solve the shutdown issue. If that's the case, then I'm fine with getting this landed as-is and adjusting later in a PR focused on the shutdown problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you understand me correctly.

Copy link
Contributor

@mjean308 mjean308 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@phender phender requested a review from a team April 22, 2020 14:59
Copy link
Contributor

@amjustin13 amjustin13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments and questions, nothing major. I'm still going through a couple of the files, might have a couple more comments later.

"""
super(DaosAgentConfig, self).get_params(test)
self.transport_params.get_params(test)
def include_local_host(hosts):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a major thing... just a thought. This wouldn't be used for other purposes? This seems like a very general thing to do and could be something that can live in general_utils.py.

Copy link
Contributor Author

@phender phender Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should not need to use it, allowing the test control node to be independent of the hosts being used for testing. Commands like daos container create ... should be run on one of the hosts identified as a client - not the test control host.

self.pattern = "Listening on "

# If specified use the configuration file from the YamlParameters object
default_yaml_file = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default_yaml_file is None by default and will only have a value if self.yaml is a YamlParameters object. So there isn't a way to set the default, other than setting self.yaml.filename. In which case, would it be beneficial to have the default_yaml_file default to some filename instead of None?

Copy link
Contributor Author

@phender phender Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically we will always pass in a yaml_cfg argument as it defines some of the other configuration file parameters that are common between the agent and server configuration files (and these values should be the same in order for communication to occur correctly), but if you wanted a different default for the DaosAgentCommand.config attribute, you could just call DaosAgentCommand.config.update_default(<new_deafult_value>) after creating a DaosAgentCommand object

Comment on lines +86 to 107
# Support specifying timeout values with units, e.g. "1d 2h 3m 4s".
# Any unit combination may be used, but they must be specified in
# descending order. Spaces can optionally be used between units and
# values. The first unit character is required; other unit characters
# are optional. The units are case-insensitive.
# The following units are supported:
# - days e.g. 1d, 1 day
# - hours e.g. 2h, 2 hrs, 2 hours
# - minutes e.g. 3m, 3 mins, 3 minutes
# - seconds e.g. 4s, 4 secs, 4 seconds
if isinstance(self.timeout, str):
pattern = r""
for interval in ("days", "hours", "minutes", "seconds"):
pattern += r"(?:(\d+)(?:\s*{0}[{1}]*\s*)){{0,1}}".format(
interval[0], interval[1:])
# pylint: disable=no-member
dhms = re.search(pattern, self.timeout, re.IGNORECASE).groups()
# pylint: enable=no-member
self.timeout = 0
for index, multiplier in enumerate([24 * 60 * 60, 60 * 60, 60, 1]):
if dhms[index] is not None:
self.timeout += multiplier * int(dhms[index])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not place this it's own function? or even place it in general_utils?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a function in general_utils.py. Currently I believe this is the only place it is used.

@mjmac
Copy link
Contributor

mjmac commented Apr 22, 2020

Unless there are major objections or concerns with landing this PR as-is, I'd like to advocate for getting it in sooner rather than later. Apparently there are more weekly tests that will be failing until I can land a fix for DAOS-4546.

Update: Created PR #2511 based on the work in this PR. When this lands, that one should be all ready to go. Was pretty easy to do, nice work @phender!

@mjmac mjmac merged commit e27fe60 into master Apr 23, 2020
@mjmac mjmac deleted the DAOS-3730_2 branch April 23, 2020 16:16
saurabhtandan pushed a commit that referenced this pull request May 13, 2020
Ensuring the daos_server and daos_agent commands are started by
TestWithServers using configuration files with equal common settings.
These changes also enable configuring the transport credentials of both
the daos_agent and daos_server through the test yaml file.  Since the
default configuration file settings are now stored in classes the
src/tests/ftest/config_file_gen.py utility has been added to replace
the src/tests/ftest/data/daos_*_baseline.yaml files by generating both
files with a common access point and the defaults used by the tests.

Signed-off-by: Phillip Henderson <phillip.henderson@intel.com>
Conflicts:
	src/tests/ftest/control/daos_admin_privileged.py
	src/tests/ftest/control/super_block_versioning.py
	src/tests/ftest/data/daos_server_baseline.yaml
	src/tests/ftest/io/nvme_fragmentation.py
	src/tests/ftest/launch.py
	src/tests/ftest/server/metadata.py
	src/tests/ftest/util/agent_utils.py
	src/tests/ftest/util/dfuse_utils.py
	src/tests/ftest/util/fio_test_base.py
	src/tests/ftest/util/ior_test_base.py
	src/tests/ftest/util/ior_utils.py
	src/tests/ftest/util/mdtest_test_base.py
	src/tests/ftest/util/mdtest_utils.py
	src/tests/ftest/util/server_utils.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

6 participants