-
Notifications
You must be signed in to change notification settings - Fork 107
pcp-pbench: [WIP] added PCPTools(tm), PCPPmlogger(tds), PCPPmie(tds) classes and pcp-mapping.json files #1759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pcp-pbench: [WIP] added PCPTools(tm), PCPPmlogger(tds), PCPPmie(tds) classes and pcp-mapping.json files #1759
Conversation
a03231b to
adf7478
Compare
The goal of the "Tool Meister" is to encapsulate the starting and
stopping of tools into a wrapper daemon which is started once on each
node for the duration of a benchmark script. Instead of the start/stop
tools scripts using SSH to start/stop tools on local or remote hosts, a
Redis Server is use to communicate with all the started Tool Meisters
which execute the tool start/stop operations as a result of messages
they receive using Redis's publish/subscribe pattern.
The Redis server location is passed as a set of parameters (host & port)
to the Tool Meister instance, along with the name of a "key" in the
Redis server which contains that Tool Meister's initial operating
instructions for the duration of the benchmark script's execution:
* What Redis pub/sub channel to use
* What tool group describing the tools to use and their options
The Tool Meister then runs through a simple two phase life-cycle for
tools until it is told to "`terminate`": "`start`" the registered tools
on this host, and "`stop`" the registered tools on this host.
The initial expected phase is "`start`", where it waits to be told when
to start its tools running from a published message on the "tool
meister" channel. Once it starts one or more tools in the background via
`screen`, it waits for a "`stop`" message to invoke the running tools'
`stop` action.
This start/stop cycle is no different from the previous way tools were
started and stopped, except that the start and stop operations no longer
involve `ssh` operations to remote hosts.
Each `start` and `stop` message sent to the Tool Meisters is accompanied
by two parameters: the tool `group` of the registered tool set (only
used to ensure the context of the message is correct), and a path to a
`directory` on the host (the controller) driving the benchmark where all
the tool data will be collected.
Since the benchmark script ensures the directory is unique for each set
of tool data collected (iteration / sample / host), the Tool Meister
running on the same host as the controller just writes its collected
tool data in that given directory.
However, when a Tool Meister is running on a host remote from the
controller, that `directory` path is not present. Instead the remote
Tool Meister uses a temporary directory instead of the given `directory`
path. The given `directory` path is treated as a unique context ID to
track all the tool data collected in temporary directories so that
specific tool data can be retrieved when requested.
Because we are no longer using `ssh` to copy the collected tool data
from the remote hosts to the local controller driving the benchmark, we
have added a "`send`" phase for gathering each tool data set collected
by a start / stop pair.
The controller running the benchmark driver determines when to request
the collected tool data be "sent" back to a new Tool Data Sink process
running on the controller. The `send` can be issued immediately
following a `stop`, or all of the `start`/`stop` sequences can be
executed before all the `send` requests are made, or some combination
thereof. The only requirement is that a `send` has to follow its
related `start`/`stop` sequence.
The Tool Data Sink is responsible for accepting data from remote Tool
Meisters, via an HTTP PUT method, whenever a "`send`" message is posted.
The pseudo code for the use of the Tool Meisters in a benchmark script
is as follows:
```
pbench-tool-meister-start # New interface
for iter in ${iterations}; do
for sample in ${samples}; do
pbench-start-tools --group=${grp} --dir=${iter}/${sample}
... <benchmark> ...
pbench-stop-tools --group=${grp} --dir=${iter}/${sample}
# New interface added for `send` operation
pbench-send-tools --group=${grp} --dir=${iter}/${sample}
pbench-postprocess-tools --group=${grp} --dir=${iter}/${sample}
done
done
pbench-tool-meister-stop # New interface
```
Or having the tool data sent later:
```
pbench-tool-meister-start
for iter in ${iterations}; do
for sample in ${samples}; do
pbench-start-tools --group=${grp} --dir=${iter}/${sample}
... <benchmark> ...
pbench-stop-tools --group=${grp} --dir=${iter}/${sample}
done
done
for iter in ${iterations}; do
for sample in ${samples}; do
pbench-send-tools --group=${grp} --dir=${iter}/${sample}
pbench-postprocess-tools --group=${grp} --dir=${iter}/${sample}
done
done
pbench-tool-meister-stop
```
Note the addition of the new `pbench-send-tools` interface a caller can
use to indicate when remote tool data can be sent.
A behavioral change that comes with this work is that tool
post-processing is no longer performed remotely on the host where it is
collected. Previous work added the necessary "stop-post-processing"
step, so that when tools are stopped any necessary post-processing and
environmental data collection required to allow the tool data to be used
off-host is collected.
This work IMPLIES that we no longer need to record registered tools
remotely. We only need to start a Tool Meister remotely for each host,
passing the initial data it needs at start time via Redis.
Now that pbench-register-tool[-set] supports the ability for a caller to
register a tool [or tool set] for a list of hosts, we keep all the tool
data local on the pbench "controller" node where the pbench-agent's user
registers tools.
By doing this, we remove the need to manage a distributed data set
across multiple hosts, allowing for a "late" binding of tools to be run
on a set of hosts. In other words, the tool registration can be done
without a host being present, with the understanding that it must be
present when a workload is run.
This is particularly powerful for environments like, OpenStack and
OpenShift, where software installation of tools are provided by
container images, VM images (like `qcow2`), and other automated
installation environments.
This is an invasive change, as knowledge about how tool data is
represented on disk was spread out across different pieces of code. We
have attempted to consolidate that knowledge, future work might be
required to adhere to the DRY principle.
**NOTES**:
* The Tool Meister invokes the existing tools in `tool-scripts` as
they operate today without any changes
- [ ] Rewrite `pbench-tool-trigger` into a python application that
talks directly to the Redis server to initiate the start, stop,
send messages
- [ ] Add support for the Tool Meisters to support collecting the
`pbench-sysinfo-dump` data.
adf7478 to
fea3eae
Compare
This work adds the notion of a "collector" to the Tool Data Sink, and
"tools" which run continuously without cycling through the "start", "stop",
and "send" phases. The collector is responsible for continuously pulling
data from those tools which are now started during the new "init" phase,
and stopped during the new "end" phase.
The first actual implementation of this kind of collector is for the
prometheus data collection environment, where a `node-exporter` "tool" is
run providing a end-point for a prometheus server "collector" to pull data
from it and store it locally off the run directory (`${benchmark_run_dir}`,
e.g. `${benchmark_run_dir}/collector/prometheus`).
Co-authored-by: maxusmusti <meyceoz@redhat.com>
fea3eae to
303cee7
Compare
|
The origin branch likely needs to be rebased on the tool-meister branch to reflect the most up-to-date changes (will likely reduce the files changed from 302 to the proper amount) |
|
Ah heck. Yeah. Will rebase it in a while. |
The goal of the "Tool Meister" is to encapsulate the starting and
stopping of tools into a wrapper daemon which is started once on each
node for the duration of a benchmark script. Instead of the start/stop
tools scripts using SSH to start/stop tools on local or remote hosts, a
Redis Server is use to communicate with all the started Tool Meisters
which execute the tool start/stop operations as a result of messages
they receive using Redis's publish/subscribe pattern.
The Redis server location is passed as a set of parameters (host & port)
to the Tool Meister instance, along with the name of a "key" in the
Redis server which contains that Tool Meister's initial operating
instructions for the duration of the benchmark script's execution:
* What Redis pub/sub channel to use
* What tool group describing the tools to use and their options
The Tool Meister then runs through a simple two phase life-cycle for
tools until it is told to "`terminate`": "`start`" the registered tools
on this host, and "`stop`" the registered tools on this host.
The initial expected phase is "`start`", where it waits to be told when
to start its tools running from a published message on the "tool
meister" channel. Once it starts one or more tools in the background via
`screen`, it waits for a "`stop`" message to invoke the running tools'
`stop` action.
This start/stop cycle is no different from the previous way tools were
started and stopped, except that the start and stop operations no longer
involve `ssh` operations to remote hosts.
Each `start` and `stop` message sent to the Tool Meisters is accompanied
by two parameters: the tool `group` of the registered tool set (only
used to ensure the context of the message is correct), and a path to a
`directory` on the host (the controller) driving the benchmark where all
the tool data will be collected.
Since the benchmark script ensures the directory is unique for each set
of tool data collected (iteration / sample / host), the Tool Meister
running on the same host as the controller just writes its collected
tool data in that given directory.
However, when a Tool Meister is running on a host remote from the
controller, that `directory` path is not present. Instead the remote
Tool Meister uses a temporary directory instead of the given `directory`
path. The given `directory` path is treated as a unique context ID to
track all the tool data collected in temporary directories so that
specific tool data can be retrieved when requested.
Because we are no longer using `ssh` to copy the collected tool data
from the remote hosts to the local controller driving the benchmark, we
have added a "`send`" phase for gathering each tool data set collected
by a start / stop pair.
The controller running the benchmark driver determines when to request
the collected tool data be "sent" back to a new Tool Data Sink process
running on the controller. The `send` can be issued immediately
following a `stop`, or all of the `start`/`stop` sequences can be
executed before all the `send` requests are made, or some combination
thereof. The only requirement is that a `send` has to follow its
related `start`/`stop` sequence.
The Tool Data Sink is responsible for accepting data from remote Tool
Meisters, via an HTTP PUT method, whenever a "`send`" message is posted.
The pseudo code for the use of the Tool Meisters in a benchmark script
is as follows:
```
pbench-tool-meister-start # New interface
for iter in ${iterations}; do
for sample in ${samples}; do
pbench-start-tools --group=${grp} --dir=${iter}/${sample}
... <benchmark> ...
pbench-stop-tools --group=${grp} --dir=${iter}/${sample}
# New interface added for `send` operation
pbench-send-tools --group=${grp} --dir=${iter}/${sample}
pbench-postprocess-tools --group=${grp} --dir=${iter}/${sample}
done
done
pbench-tool-meister-stop # New interface
```
Or having the tool data sent later:
```
pbench-tool-meister-start
for iter in ${iterations}; do
for sample in ${samples}; do
pbench-start-tools --group=${grp} --dir=${iter}/${sample}
... <benchmark> ...
pbench-stop-tools --group=${grp} --dir=${iter}/${sample}
done
done
for iter in ${iterations}; do
for sample in ${samples}; do
pbench-send-tools --group=${grp} --dir=${iter}/${sample}
pbench-postprocess-tools --group=${grp} --dir=${iter}/${sample}
done
done
pbench-tool-meister-stop
```
Note the addition of the new `pbench-send-tools` interface a caller can
use to indicate when remote tool data can be sent.
A behavioral change that comes with this work is that tool
post-processing is no longer performed remotely on the host where it is
collected. Previous work added the necessary "stop-post-processing"
step, so that when tools are stopped any necessary post-processing and
environmental data collection required to allow the tool data to be used
off-host is collected.
This work IMPLIES that we no longer need to record registered tools
remotely. We only need to start a Tool Meister remotely for each host,
passing the initial data it needs at start time via Redis.
Now that pbench-register-tool[-set] supports the ability for a caller to
register a tool [or tool set] for a list of hosts, we keep all the tool
data local on the pbench "controller" node where the pbench-agent's user
registers tools.
By doing this, we remove the need to manage a distributed data set
across multiple hosts, allowing for a "late" binding of tools to be run
on a set of hosts. In other words, the tool registration can be done
without a host being present, with the understanding that it must be
present when a workload is run.
This is particularly powerful for environments like, OpenStack and
OpenShift, where software installation of tools are provided by
container images, VM images (like `qcow2`), and other automated
installation environments.
This is an invasive change, as knowledge about how tool data is
represented on disk was spread out across different pieces of code. We
have attempted to consolidate that knowledge, future work might be
required to adhere to the DRY principle.
**NOTES**:
* The Tool Meister invokes the existing tools in `tool-scripts` as
they operate today without any changes
- [ ] Rewrite `pbench-tool-trigger` into a python application that
talks directly to the Redis server to initiate the start, stop,
send messages
- [ ] Add support for the Tool Meisters to support collecting the
`pbench-sysinfo-dump` data.
This work adds the notion of a "collector" to the Tool Data Sink, and
"tools" which run continuously without cycling through the "start", "stop",
and "send" phases. The collector is responsible for continuously pulling
data from those tools which are now started during the new "init" phase,
and stopped during the new "end" phase.
The first actual implementation of this kind of collector is for the
prometheus data collection environment, where a `node-exporter` "tool" is
run providing a end-point for a prometheus server "collector" to pull data
from it and store it locally off the run directory (`${benchmark_run_dir}`,
e.g. `${benchmark_run_dir}/collector/prometheus`).
Co-authored-by: maxusmusti <meyceoz@redhat.com>
92f07c5 to
086c899
Compare
|
The rebase was a bit pain. But yep. Done. |
…ping.json files pcp-pbench: minor bug fix pcp-pbench: integration completed, testing & debugging remain pcp-pbench:fixed bugs with pcptool and string-json conversion rebasing errors debugged rebase errros fixed
086c899 to
0d4f4df
Compare
303cee7 to
d930bac
Compare
|
May need to be re-rebased after final code review updates, hopefully should be much shorter this time 😅 |
ecbe594 to
e756ca2
Compare
8e3f649 to
9dd82cf
Compare
|
This code needs a rebase and verification. It replaces portante#9. |
0b88e17 to
9c29bc0
Compare
|
Closing as we merged this PR into the |
|
Replaced by #1986. |
This PR integrates pbench with pcp. In its early stages.
File changes:
modified: agent/util-scripts/pbench-tool-data-sink
modified: agent/util-scripts/pbench-tool-meister
new file: agent/util-scripts/pcp-mapping.json
new file: agent/tool-scripts/pcptool
modified: agent/util-scripts/gold/pbench-register-tool/test-44.txt
modified: agent/util-scripts/gold/pbench-register-tool/test-46.txt
modified: agent/util-scripts/gold/pbench-register-tool/test-47.txt
modified: agent/util-scripts/pbench-tool-meister-start