-
Notifications
You must be signed in to change notification settings - Fork 107
Merge coll sysinfo metadata log #2005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge coll sysinfo metadata log #2005
Conversation
9af8657 to
048a86a
Compare
portante
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still a work-in-progress, as we need to implement the pbench-metadata-log functionality in as a callable API perhaps.
048a86a to
1a2fd2f
Compare
899e513 to
dccd59c
Compare
6fbf2d5 to
c11a78b
Compare
The last set of `ssh` commands issued by the pbench-agent that are not
involved in orchestrating the Tool Meisters, outside of those issued by
the `bench-scripts`, are for `hostname -s` data collection on all
the registered remote tool hosts.
The Tool Meisters are now given the responsibility for collecting
information about their environment, including the version of the
pbench agent, and a more comprehensive collection of host name
information, returning that with their startup acknowledgement
payload. This change also addresses a previous bug where registered
host labels were not being used in the on-disk data capture
directories as they were before.
This collected data set is now recorded by the Tool Data Sink, after
restructuring the action flow so that clients send action payloads to
the Tool Data Sink, which then forwards those payloads to the Tool
Meisters. This gives the Tool Data Sink the ability to collect all the
Tool Meister information and write it to disk before acknowledging to
the client that startup is complete.
We add an entirely new section in the `metadata.log` file which is
per-host/per-tool, recording the tool options, and the output from the
tool install checks (if any). This section supercedes the previous
tool options in the per-host sections, though we leave those options
there for compatibility for now.
We add graceful handling of bad PUT requests from Tool Meisters to the
Tool Data Sink.
With this refactoring, we also collapse all the "sysinfo", "init", and
"end" actions into the respective `pbench-tool-meister-start` and
`pbench-tool-meister-stop` interfaces to simplify the CLI behaviors.
To make this possible, we add a new `Client` API used by the CLI
interfaces `pbench-tool-meister-start` and `pbench-tool-meister-stop`.
We have also undertaken a major refactoring of `pbench-sysinfo-dump` to
remove the dependency on `base` for environment variables. As a result
of this work, we drop the stockpile configuration entirely, as it is no
longer required.
The Tool Meister unit tests were enhanced to use persistent tools, and
the unit test framework now properly cleans up the Tool Meister test
environment when tests fail.
We moved the pbench-sysinfo-dump CLI command to the
`agent/util-scripts/tool-meister` directory so that users won't see
that internal command in their `PATH`.
We updateed the UML Seq Diag to reflect the code, addng the
`init`/`end`/`sysinfo` actions, and the fact that the Tool Data Sink
is now the gatekeeper of actions, forwarding them on to the Tool
Meisters instead of the client sending actions directly to the Tool
Meisters.
Since we no longer have the CLI command `pbench-collect-sysinfo`, we
drop it from the unit tests, and replace it with more appropriate tests.
To that end:
* tests 25 - 30 now just invoke `pbench-verify-sysinfo-options`
* tests 54 & 55 just invoke the `--help` option on
`pbench-tool-meister-start` and `-stop`
* tests 23 * 24 are dropped
The Tool Meisters need to start their persistent tools first before the
collectors can run correctly. The PCP pmlogger collectors wait for the
remote pmcds to start listening on the expected ports. We have to pass
along the "init" action to the Tool Meisters before we setup the PCP
collectors.
The `pbench-tool-meister-stop` command will now return an error when the
`end` operation fails, or if we can't create the directory for the `end`
operation to work. We always wait for the local Tool Data Sink and
local Tool Meister to exit before killing the Redis server, regardless
of the success or failure of `terminate` operation.
c11a78b to
8c4a0f6
Compare
Maxusmusti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-tested with local and remote node, transient tools + persistent tools, works as expected
dbutenhof
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't appear that GitHub thinks anything significant has changed since my last review, so let's get it in.
Merge collect sysinfo and metadata log
Merge
pbench-collect-sysinfoandpbench-metadata-logoperations and behaviors into the Tool Meister framework.To make it easier to review, this PR was broken into 4 commits:
agent/bench-scriptsdirectory without unit tests (14 files)agent/util-scriptsandlib/pbench/agenthierarchies (23 files)Once reviewed, we merged them back down to a single commit.
Description of combined changes
The last set of
sshcommands issued by the pbench-agent that are not involved in orchestrating the Tool Meisters, outside of those issued by thebench-scripts, are forhostname -sdata collection on all the registered remote tool hosts.The Tool Meisters are now given the responsibility for collecting information about their environment, including the version of the pbench agent, and a more comprehensive collection of host name information, returning that with their startup acknowledgement payload. This change also addresses a previous bug where registered hosts labels were not being used in the on-disk data capture directories as they were before.
This collected data set is now recorded by the Tool Data Sink, after restructuring the action flow so that clients send action playloads to the Tool Data Sink, which then forwards to the Tool Meisters. This gives the Tool Data Sink the ability to collect all the Tool Meister information and write it to disk before acknowledging to the client that startup is complete.
We add an entirely new section in the
metadata.logfile which is per-host/per-tool, recording the tool options, and the output from the tool install checks (if any). This section supercedes the previous tool options in the per-host sections, though we leave those options there for compatibility for now.We add graceful handling of bad PUT requests from Tool Meisters to the Tool Data Sink.
With this refactoring, we also collapse all the "sysinfo", "init", and "end" actions into the respective
pbench-tool-meister-startandpbench-tool-meister-stopinterfaces to simplify the CLI behaviors.To make this possible, we add a new
ClientAPI used by the CLI interfacespbench-tool-meister-startandpbench-tool-meister-stop.We have also undertaken a major refactoring of
pbench-sysinfo-dumpto remove the dependency onbasefor environment variables. As a result of this work, we drop the stockpile configuration entirely, as it is no longer required.We have enhanced the Tool Meister tests to use persistent tools, and enhanced the unit test framework to properly clean up the Tool Meister test environment when tests fail.
We have updated the UML sequence diagram describing the Tool Meister operation to add the
init/end/sysinfosteps, and the fact that the Tool Data Sink is the gatekeeper of actions, forwarding them on to the Tool Meisters.Notes
Okay, this is where Python 3 falls down: replacing command line utilities available via
bashwith Python 3 modules.Turns out you can't use
shutil.copytree()inside a container to replacecp -RL.That code attempts to use
os.setxattr()at the lowest level to copy all the attributes properly. But when not running as a realrootuser in a container, you can't copy all attributes, and a "Permission denied" exception is raised.The original
pbench-metadata-logcode just usedcp -rLand it worked both in and out of a container. So we just invoke that command directly again from the new Python code in thetool_data_sink.pymodule.