-
Notifications
You must be signed in to change notification settings - Fork 107
Tool Metadata Abstraction #1778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tool Metadata Abstraction #1778
Conversation
The goal of the "Tool Meister" is to encapsulate the starting and
stopping of tools into a wrapper daemon which is started once on each
node for the duration of a benchmark script. Instead of the start/stop
tools scripts using SSH to start/stop tools on local or remote hosts, a
Redis Server is use to communicate with all the started Tool Meisters
which execute the tool start/stop operations as a result of messages
they receive using Redis's publish/subscribe pattern.
The Redis server location is passed as a set of parameters (host & port)
to the Tool Meister instance, along with the name of a "key" in the
Redis server which contains that Tool Meister's initial operating
instructions for the duration of the benchmark script's execution:
* What Redis pub/sub channel to use
* What tool group describing the tools to use and their options
The Tool Meister then runs through a simple two phase life-cycle for
tools until it is told to "`terminate`": "`start`" the registered tools
on this host, and "`stop`" the registered tools on this host.
The initial expected phase is "`start`", where it waits to be told when
to start its tools running from a published message on the "tool
meister" channel. Once it starts one or more tools in the background via
`screen`, it waits for a "`stop`" message to invoke the running tools'
`stop` action.
This start/stop cycle is no different from the previous way tools were
started and stopped, except that the start and stop operations no longer
involve `ssh` operations to remote hosts.
Each `start` and `stop` message sent to the Tool Meisters is accompanied
by two parameters: the tool `group` of the registered tool set (only
used to ensure the context of the message is correct), and a path to a
`directory` on the host (the controller) driving the benchmark where all
the tool data will be collected.
Since the benchmark script ensures the directory is unique for each set
of tool data collected (iteration / sample / host), the Tool Meister
running on the same host as the controller just writes its collected
tool data in that given directory.
However, when a Tool Meister is running on a host remote from the
controller, that `directory` path is not present. Instead the remote
Tool Meister uses a temporary directory instead of the given `directory`
path. The given `directory` path is treated as a unique context ID to
track all the tool data collected in temporary directories so that
specific tool data can be retrieved when requested.
Because we are no longer using `ssh` to copy the collected tool data
from the remote hosts to the local controller driving the benchmark, we
have added a "`send`" phase for gathering each tool data set collected
by a start / stop pair.
The controller running the benchmark driver determines when to request
the collected tool data be "sent" back to a new Tool Data Sink process
running on the controller. The `send` can be issued immediately
following a `stop`, or all of the `start`/`stop` sequences can be
executed before all the `send` requests are made, or some combination
thereof. The only requirement is that a `send` has to follow its
related `start`/`stop` sequence.
The Tool Data Sink is responsible for accepting data from remote Tool
Meisters, via an HTTP PUT method, whenever a "`send`" message is posted.
The pseudo code for the use of the Tool Meisters in a benchmark script
is as follows:
```
pbench-tool-meister-start # New interface
for iter in ${iterations}; do
for sample in ${samples}; do
pbench-start-tools --group=${grp} --dir=${iter}/${sample}
... <benchmark> ...
pbench-stop-tools --group=${grp} --dir=${iter}/${sample}
# New interface added for `send` operation
pbench-send-tools --group=${grp} --dir=${iter}/${sample}
pbench-postprocess-tools --group=${grp} --dir=${iter}/${sample}
done
done
pbench-tool-meister-stop # New interface
```
Or having the tool data sent later:
```
pbench-tool-meister-start
for iter in ${iterations}; do
for sample in ${samples}; do
pbench-start-tools --group=${grp} --dir=${iter}/${sample}
... <benchmark> ...
pbench-stop-tools --group=${grp} --dir=${iter}/${sample}
done
done
for iter in ${iterations}; do
for sample in ${samples}; do
pbench-send-tools --group=${grp} --dir=${iter}/${sample}
pbench-postprocess-tools --group=${grp} --dir=${iter}/${sample}
done
done
pbench-tool-meister-stop
```
Note the addition of the new `pbench-send-tools` interface a caller can
use to indicate when remote tool data can be sent.
A behavioral change that comes with this work is that tool
post-processing is no longer performed remotely on the host where it is
collected. Previous work added the necessary "stop-post-processing"
step, so that when tools are stopped any necessary post-processing and
environmental data collection required to allow the tool data to be used
off-host is collected.
This work IMPLIES that we no longer need to record registered tools
remotely. We only need to start a Tool Meister remotely for each host,
passing the initial data it needs at start time via Redis.
Now that pbench-register-tool[-set] supports the ability for a caller to
register a tool [or tool set] for a list of hosts, we keep all the tool
data local on the pbench "controller" node where the pbench-agent's user
registers tools.
By doing this, we remove the need to manage a distributed data set
across multiple hosts, allowing for a "late" binding of tools to be run
on a set of hosts. In other words, the tool registration can be done
without a host being present, with the understanding that it must be
present when a workload is run.
This is particularly powerful for environments like, OpenStack and
OpenShift, where software installation of tools are provided by
container images, VM images (like `qcow2`), and other automated
installation environments.
This is an invasive change, as knowledge about how tool data is
represented on disk was spread out across different pieces of code. We
have attempted to consolidate that knowledge, future work might be
required to adhere to the DRY principle.
**NOTES**:
* The Tool Meister invokes the existing tools in `tool-scripts` as
they operate today without any changes
- [ ] Rewrite `pbench-tool-trigger` into a python application that
talks directly to the Redis server to initiate the start, stop,
send messages
- [ ] Add support for the Tool Meisters to support collecting the
`pbench-sysinfo-dump` data.
This work adds the notion of a "collector" to the Tool Data Sink, and
"tools" which run continuously without cycling through the "start", "stop",
and "send" phases. The collector is responsible for continuously pulling
data from those tools which are now started during the new "init" phase,
and stopped during the new "end" phase.
The first actual implementation of this kind of collector is for the
prometheus data collection environment, where a `node-exporter` "tool" is
run providing a end-point for a prometheus server "collector" to pull data
from it and store it locally off the run directory (`${benchmark_run_dir}`,
e.g. `${benchmark_run_dir}/collector/prometheus`).
Co-authored-by: maxusmusti <meyceoz@redhat.com>
dbutenhof
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool: quick prototyping.
| @@ -0,0 +1,48 @@ | |||
| { | |||
| "transient":{ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, I'd imagined it "inside out" with ["node-exporter": {"type":"persistent"}, "blktrace":{"type":"transient"}, ...]
Yours is better as you can 'tool in meta.transient.keys()' which makes a lot more sense, and we can still easily define properties for each.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can dynamically create the convenient way to access the data, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We absolutely want all behavior to be encapsulated behind a class API, with nobody outside the class (or singleton, since I generally dislike static members) knowing anything about the JSON schema.
| #ADDING METADATA GRAB HERE | ||
| meta_raw = redis_server.get("tool-metadata") | ||
| if params_raw is None: | ||
| logger.error('Metadata was never loaded.') | ||
| meta_str = meta_raw.decode("utf-8") | ||
| tool_metadata = json.loads(meta_str) | ||
| logger.info(f"Metadata: {tool_metadata}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the only place we'll want access to the metadata. Maybe something like a ToolMetadata object that can be constructed either from the JSON or from the Redis document, and passed around as context with methods to abstract the representation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both the on-disk JSON document and the in-Redis JSON document should be the same, so an entity constructing the ToolMetadata object can just read from on or the other to then construct the object.
|
|
||
| # 2.5. Add tool metadata json to redis | ||
| #FIXME: ABSOLUTE PATH IS BAD (also add try/catch) | ||
| with open('/opt/pbench-agent/tool-scripts/meta.json') as json_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add the path in the pbench-agent config file I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know the path, we just need the installation directory, to that access the path relative to that:
with Path(config.install_dir, "tool-scripts", "meta.json").open("r") as json_file:
...
303cee7 to
d930bac
Compare
|
Work continued in PR #1787 |
IN PROGRESS:
Working on tool metadata abstraction