Tool Metadata Abstraction #1778

Maxusmusti · 2020-09-03T19:35:18Z

IN PROGRESS:
Working on tool metadata abstraction

The goal of the "Tool Meister" is to encapsulate the starting and stopping of tools into a wrapper daemon which is started once on each node for the duration of a benchmark script. Instead of the start/stop tools scripts using SSH to start/stop tools on local or remote hosts, a Redis Server is use to communicate with all the started Tool Meisters which execute the tool start/stop operations as a result of messages they receive using Redis's publish/subscribe pattern. The Redis server location is passed as a set of parameters (host & port) to the Tool Meister instance, along with the name of a "key" in the Redis server which contains that Tool Meister's initial operating instructions for the duration of the benchmark script's execution: * What Redis pub/sub channel to use * What tool group describing the tools to use and their options The Tool Meister then runs through a simple two phase life-cycle for tools until it is told to "`terminate`": "`start`" the registered tools on this host, and "`stop`" the registered tools on this host. The initial expected phase is "`start`", where it waits to be told when to start its tools running from a published message on the "tool meister" channel. Once it starts one or more tools in the background via `screen`, it waits for a "`stop`" message to invoke the running tools' `stop` action. This start/stop cycle is no different from the previous way tools were started and stopped, except that the start and stop operations no longer involve `ssh` operations to remote hosts. Each `start` and `stop` message sent to the Tool Meisters is accompanied by two parameters: the tool `group` of the registered tool set (only used to ensure the context of the message is correct), and a path to a `directory` on the host (the controller) driving the benchmark where all the tool data will be collected. Since the benchmark script ensures the directory is unique for each set of tool data collected (iteration / sample / host), the Tool Meister running on the same host as the controller just writes its collected tool data in that given directory. However, when a Tool Meister is running on a host remote from the controller, that `directory` path is not present. Instead the remote Tool Meister uses a temporary directory instead of the given `directory` path. The given `directory` path is treated as a unique context ID to track all the tool data collected in temporary directories so that specific tool data can be retrieved when requested. Because we are no longer using `ssh` to copy the collected tool data from the remote hosts to the local controller driving the benchmark, we have added a "`send`" phase for gathering each tool data set collected by a start / stop pair. The controller running the benchmark driver determines when to request the collected tool data be "sent" back to a new Tool Data Sink process running on the controller. The `send` can be issued immediately following a `stop`, or all of the `start`/`stop` sequences can be executed before all the `send` requests are made, or some combination thereof. The only requirement is that a `send` has to follow its related `start`/`stop` sequence. The Tool Data Sink is responsible for accepting data from remote Tool Meisters, via an HTTP PUT method, whenever a "`send`" message is posted. The pseudo code for the use of the Tool Meisters in a benchmark script is as follows: ``` pbench-tool-meister-start # New interface for iter in ${iterations}; do for sample in ${samples}; do pbench-start-tools --group=${grp} --dir=${iter}/${sample} ... <benchmark> ... pbench-stop-tools --group=${grp} --dir=${iter}/${sample} # New interface added for `send` operation pbench-send-tools --group=${grp} --dir=${iter}/${sample} pbench-postprocess-tools --group=${grp} --dir=${iter}/${sample} done done pbench-tool-meister-stop # New interface ``` Or having the tool data sent later: ``` pbench-tool-meister-start for iter in ${iterations}; do for sample in ${samples}; do pbench-start-tools --group=${grp} --dir=${iter}/${sample} ... <benchmark> ... pbench-stop-tools --group=${grp} --dir=${iter}/${sample} done done for iter in ${iterations}; do for sample in ${samples}; do pbench-send-tools --group=${grp} --dir=${iter}/${sample} pbench-postprocess-tools --group=${grp} --dir=${iter}/${sample} done done pbench-tool-meister-stop ``` Note the addition of the new `pbench-send-tools` interface a caller can use to indicate when remote tool data can be sent. A behavioral change that comes with this work is that tool post-processing is no longer performed remotely on the host where it is collected. Previous work added the necessary "stop-post-processing" step, so that when tools are stopped any necessary post-processing and environmental data collection required to allow the tool data to be used off-host is collected. This work IMPLIES that we no longer need to record registered tools remotely. We only need to start a Tool Meister remotely for each host, passing the initial data it needs at start time via Redis. Now that pbench-register-tool[-set] supports the ability for a caller to register a tool [or tool set] for a list of hosts, we keep all the tool data local on the pbench "controller" node where the pbench-agent's user registers tools. By doing this, we remove the need to manage a distributed data set across multiple hosts, allowing for a "late" binding of tools to be run on a set of hosts. In other words, the tool registration can be done without a host being present, with the understanding that it must be present when a workload is run. This is particularly powerful for environments like, OpenStack and OpenShift, where software installation of tools are provided by container images, VM images (like `qcow2`), and other automated installation environments. This is an invasive change, as knowledge about how tool data is represented on disk was spread out across different pieces of code. We have attempted to consolidate that knowledge, future work might be required to adhere to the DRY principle. **NOTES**: * The Tool Meister invokes the existing tools in `tool-scripts` as they operate today without any changes - [ ] Rewrite `pbench-tool-trigger` into a python application that talks directly to the Redis server to initiate the start, stop, send messages - [ ] Add support for the Tool Meisters to support collecting the `pbench-sysinfo-dump` data.

This work adds the notion of a "collector" to the Tool Data Sink, and "tools" which run continuously without cycling through the "start", "stop", and "send" phases. The collector is responsible for continuously pulling data from those tools which are now started during the new "init" phase, and stopped during the new "end" phase. The first actual implementation of this kind of collector is for the prometheus data collection environment, where a `node-exporter` "tool" is run providing a end-point for a prometheus server "collector" to pull data from it and store it locally off the run directory (`${benchmark_run_dir}`, e.g. `${benchmark_run_dir}/collector/prometheus`).

Co-authored-by: maxusmusti <meyceoz@redhat.com>

dbutenhof

Cool: quick prototyping.

dbutenhof · 2020-09-03T19:44:10Z

agent/tool-scripts/meta.json

@@ -0,0 +1,48 @@
+{
+	"transient":{


Interestingly, I'd imagined it "inside out" with ["node-exporter": {"type":"persistent"}, "blktrace":{"type":"transient"}, ...]

Yours is better as you can 'tool in meta.transient.keys()' which makes a lot more sense, and we can still easily define properties for each.

We can dynamically create the convenient way to access the data, too.

We absolutely want all behavior to be encapsulated behind a class API, with nobody outside the class (or singleton, since I generally dislike static members) knowing anything about the JSON schema.

dbutenhof · 2020-09-03T19:52:31Z

agent/util-scripts/pbench-tool-meister

+            #ADDING METADATA GRAB HERE
+            meta_raw = redis_server.get("tool-metadata")
+            if params_raw is None:
+                logger.error('Metadata was never loaded.')
+            meta_str = meta_raw.decode("utf-8")
+            tool_metadata = json.loads(meta_str)
+            logger.info(f"Metadata: {tool_metadata}")


I'm not sure this is the only place we'll want access to the metadata. Maybe something like a ToolMetadata object that can be constructed either from the JSON or from the Redis document, and passed around as context with methods to abstract the representation?

Both the on-disk JSON document and the in-Redis JSON document should be the same, so an entity constructing the ToolMetadata object can just read from on or the other to then construct the object.

npalaska · 2020-09-03T23:01:06Z

agent/util-scripts/pbench-tool-meister-start


+    # 2.5. Add tool metadata json to redis
+    #FIXME: ABSOLUTE PATH IS BAD (also add try/catch)
+    with open('/opt/pbench-agent/tool-scripts/meta.json') as json_file:


We can add the path in the pbench-agent config file I guess

We know the path, we just need the installation directory, to that access the path relative to that:

with Path(config.install_dir, "tool-scripts", "meta.json").open("r") as json_file: ...

Maxusmusti · 2020-09-08T15:53:54Z

Work continued in PR #1787

portante and others added 4 commits September 2, 2020 18:05

Added DCGM Tool to pbench-agent (keshavm02)

303cee7

Co-authored-by: maxusmusti <meyceoz@redhat.com>

tool-meisters now load tool metadata from redis

ea990c8

Maxusmusti changed the title ~~tool-meisters now load tool metadata from redis~~ Tool Metadata Abstraction Sep 3, 2020

dbutenhof linked an issue Sep 3, 2020 that may be closed by this pull request

Identify tool characteristics via persistent metadata rather than embedded code #1775

Closed

dbutenhof reviewed Sep 3, 2020

View reviewed changes

npalaska reviewed Sep 3, 2020

View reviewed changes

portante added Agent enhancement tools Of and related to the operation and behavior of various tools (iostat, sar, etc.) labels Sep 4, 2020

portante added this to the v0.70 milestone Sep 4, 2020

portante assigned Maxusmusti Sep 4, 2020

portante force-pushed the tool-meister branch from 303cee7 to d930bac Compare September 4, 2020 18:56

Maxusmusti mentioned this pull request Sep 8, 2020

Proper Tool Metadata Abstraction #1787

Merged

Maxusmusti closed this Sep 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tool Metadata Abstraction #1778

Tool Metadata Abstraction #1778

Uh oh!

Maxusmusti commented Sep 3, 2020

Uh oh!

dbutenhof left a comment

Uh oh!

dbutenhof Sep 3, 2020

Uh oh!

portante Sep 4, 2020

Uh oh!

dbutenhof Sep 4, 2020

Uh oh!

dbutenhof Sep 3, 2020

Uh oh!

portante Sep 4, 2020

Uh oh!

npalaska Sep 3, 2020

Uh oh!

portante Sep 4, 2020

Uh oh!

Maxusmusti commented Sep 8, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Tool Metadata Abstraction #1778

Tool Metadata Abstraction #1778

Uh oh!

Conversation

Maxusmusti commented Sep 3, 2020

Uh oh!

dbutenhof left a comment

Choose a reason for hiding this comment

Uh oh!

dbutenhof Sep 3, 2020

Choose a reason for hiding this comment

Uh oh!

portante Sep 4, 2020

Choose a reason for hiding this comment

Uh oh!

dbutenhof Sep 4, 2020

Choose a reason for hiding this comment

Uh oh!

dbutenhof Sep 3, 2020

Choose a reason for hiding this comment

Uh oh!

portante Sep 4, 2020

Choose a reason for hiding this comment

Uh oh!

npalaska Sep 3, 2020

Choose a reason for hiding this comment

Uh oh!

portante Sep 4, 2020

Choose a reason for hiding this comment

Uh oh!

Maxusmusti commented Sep 8, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants