Skip to content

Conversation

@portante
Copy link
Member

@portante portante commented Nov 21, 2020

Merge collect sysinfo and metadata log

Merge pbench-collect-sysinfo and pbench-metadata-log operations and behaviors into the Tool Meister framework.

To make it easier to review, this PR was broken into 4 commits:

  • All the changes to the agent/bench-scripts directory without unit tests (14 files)
  • All the changes to the agent/util-scripts and lib/pbench/agent hierarchies (23 files)
  • UML Sequence Diagram updates (1 file)
  • All testing related updates (100+ files)

Once reviewed, we merged them back down to a single commit.

Description of combined changes

The last set of ssh commands issued by the pbench-agent that are not involved in orchestrating the Tool Meisters, outside of those issued by the bench-scripts, are for hostname -s data collection on all the registered remote tool hosts.

The Tool Meisters are now given the responsibility for collecting information about their environment, including the version of the pbench agent, and a more comprehensive collection of host name information, returning that with their startup acknowledgement payload. This change also addresses a previous bug where registered hosts labels were not being used in the on-disk data capture directories as they were before.

This collected data set is now recorded by the Tool Data Sink, after restructuring the action flow so that clients send action playloads to the Tool Data Sink, which then forwards to the Tool Meisters. This gives the Tool Data Sink the ability to collect all the Tool Meister information and write it to disk before acknowledging to the client that startup is complete.

We add an entirely new section in the metadata.log file which is per-host/per-tool, recording the tool options, and the output from the tool install checks (if any). This section supercedes the previous tool options in the per-host sections, though we leave those options there for compatibility for now.

We add graceful handling of bad PUT requests from Tool Meisters to the Tool Data Sink.

With this refactoring, we also collapse all the "sysinfo", "init", and "end" actions into the respective pbench-tool-meister-start and pbench-tool-meister-stop interfaces to simplify the CLI behaviors.

To make this possible, we add a new Client API used by the CLI interfaces pbench-tool-meister-start and pbench-tool-meister-stop.

We have also undertaken a major refactoring of pbench-sysinfo-dump to remove the dependency on base for environment variables. As a result of this work, we drop the stockpile configuration entirely, as it is no longer required.

We have enhanced the Tool Meister tests to use persistent tools, and enhanced the unit test framework to properly clean up the Tool Meister test environment when tests fail.

We have updated the UML sequence diagram describing the Tool Meister operation to add the init/end/sysinfo steps, and the fact that the Tool Data Sink is the gatekeeper of actions, forwarding them on to the Tool Meisters.

Notes

Okay, this is where Python 3 falls down: replacing command line utilities available via bash with Python 3 modules.

Turns out you can't use shutil.copytree() inside a container to replace cp -RL.

That code attempts to use os.setxattr() at the lowest level to copy all the attributes properly. But when not running as a real root user in a container, you can't copy all attributes, and a "Permission denied" exception is raised.

The original pbench-metadata-log code just used cp -rL and it worked both in and out of a container. So we just invoke that command directly again from the new Python code in the tool_data_sink.py module.

@portante portante added enhancement Agent fio pbench-fio benchmark related trafficgen pbench-trafficgen benchmark related tools Of and related to the operation and behavior of various tools (iostat, sar, etc.) run-benchmark pbench-run-benchmark related specjbb2005 Related to the specjbb2005 benchmark script. user-benchmark Of and relating to the pbench-user-benchmark interface labels Nov 21, 2020
@portante portante added this to the v0.71 milestone Nov 21, 2020
@portante portante self-assigned this Nov 21, 2020
@portante portante force-pushed the merge-coll-sysinfo-metadata-log branch 2 times, most recently from 9af8657 to 048a86a Compare November 23, 2020 20:35
Copy link
Member Author

@portante portante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a work-in-progress, as we need to implement the pbench-metadata-log functionality in as a callable API perhaps.

@portante portante force-pushed the merge-coll-sysinfo-metadata-log branch from 048a86a to 1a2fd2f Compare November 24, 2020 05:41
@portante portante force-pushed the merge-coll-sysinfo-metadata-log branch 2 times, most recently from 899e513 to dccd59c Compare November 30, 2020 05:05
The last set of `ssh` commands issued by the pbench-agent that are not
involved in orchestrating the Tool Meisters, outside of those issued by
the `bench-scripts`, are for `hostname -s` data collection on all
the registered remote tool hosts.

The Tool Meisters are now given the responsibility for collecting
information about their environment, including the version of the
pbench agent, and a more comprehensive collection of host name
information, returning that with their startup acknowledgement
payload.  This change also addresses a previous bug where registered
host labels were not being used in the on-disk data capture
directories as they were before.

This collected data set is now recorded by the Tool Data Sink, after
restructuring the action flow so that clients send action payloads to
the Tool Data Sink, which then forwards those payloads to the Tool
Meisters.  This gives the Tool Data Sink the ability to collect all the
Tool Meister information and write it to disk before acknowledging to
the client that startup is complete.

We add an entirely new section in the `metadata.log` file which is
per-host/per-tool, recording the tool options, and the output from the
tool install checks (if any).  This section supercedes the previous
tool options in the per-host sections, though we leave those options
there for compatibility for now.

We add graceful handling of bad PUT requests from Tool Meisters to the
Tool Data Sink.

With this refactoring, we also collapse all the "sysinfo", "init", and
"end" actions into the respective `pbench-tool-meister-start` and
`pbench-tool-meister-stop` interfaces to simplify the CLI behaviors.

To make this possible, we add a new `Client` API used by the CLI
interfaces `pbench-tool-meister-start` and `pbench-tool-meister-stop`.

We have also undertaken a major refactoring of `pbench-sysinfo-dump` to
remove the dependency on `base` for environment variables.  As a result
of this work, we drop the stockpile configuration entirely, as it is no
longer required.

The Tool Meister unit tests were enhanced to use persistent tools, and
the unit test framework now properly cleans up the Tool Meister test
environment when tests fail.

We moved the pbench-sysinfo-dump CLI command to the
`agent/util-scripts/tool-meister` directory so that users won't see
that internal command in their `PATH`.

We updateed the UML Seq Diag to reflect the code, addng the
`init`/`end`/`sysinfo` actions, and the fact that the Tool Data Sink
is now the gatekeeper of actions, forwarding them on to the Tool
Meisters instead of the client sending actions directly to the Tool
Meisters.

Since we no longer have the CLI command `pbench-collect-sysinfo`, we
drop it from the unit tests, and replace it with more appropriate tests.

To that end:
  * tests 25 - 30 now just invoke `pbench-verify-sysinfo-options`
  * tests 54 & 55 just invoke the `--help` option on
    `pbench-tool-meister-start` and `-stop`
  * tests 23 * 24 are dropped

The Tool Meisters need to start their persistent tools first before the
collectors can run correctly.  The PCP pmlogger collectors wait for the
remote pmcds to start listening on the expected ports.  We have to pass
along the "init" action to the Tool Meisters before we setup the PCP
collectors.

The `pbench-tool-meister-stop` command will now return an error when the
`end` operation fails, or if we can't create the directory for the `end`
operation to work.  We always wait for the local Tool Data Sink and
local Tool Meister to exit before killing the Redis server, regardless
of the success or failure of `terminate` operation.
@portante portante force-pushed the merge-coll-sysinfo-metadata-log branch from c11a78b to 8c4a0f6 Compare March 18, 2021 01:19
@portante portante requested a review from dbutenhof March 18, 2021 01:19
Copy link
Member

@Maxusmusti Maxusmusti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-tested with local and remote node, transient tools + persistent tools, works as expected

Copy link
Member

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't appear that GitHub thinks anything significant has changed since my last review, so let's get it in.

@portante portante merged commit 33027e8 into distributed-system-analysis:main Mar 18, 2021
@portante portante linked an issue Apr 12, 2021 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Agent enhancement fio pbench-fio benchmark related run-benchmark pbench-run-benchmark related specjbb2005 Related to the specjbb2005 benchmark script. tools Of and related to the operation and behavior of various tools (iostat, sar, etc.) trafficgen pbench-trafficgen benchmark related user-benchmark Of and relating to the pbench-user-benchmark interface

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Merge pbench-collect-sysinfo and pbench-metadata-log into pbench-tool-meister-start Each Tool Meister instance should record its pbench-agent version

5 participants