Skip to content

Conversation

@not4win
Copy link
Contributor

@not4win not4win commented Aug 19, 2020

This PR integrates pbench with pcp. In its early stages.

File changes:
modified: agent/util-scripts/pbench-tool-data-sink
modified: agent/util-scripts/pbench-tool-meister
new file: agent/util-scripts/pcp-mapping.json
new file: agent/tool-scripts/pcptool
modified: agent/util-scripts/gold/pbench-register-tool/test-44.txt
modified: agent/util-scripts/gold/pbench-register-tool/test-46.txt
modified: agent/util-scripts/gold/pbench-register-tool/test-47.txt
modified: agent/util-scripts/pbench-tool-meister-start

@portante portante force-pushed the tool-meister branch 2 times, most recently from a03231b to adf7478 Compare September 1, 2020 20:10
The goal of the "Tool Meister" is to encapsulate the starting and
stopping of tools into a wrapper daemon which is started once on each
node for the duration of a benchmark script.  Instead of the start/stop
tools scripts using SSH to start/stop tools on local or remote hosts, a
Redis Server is use to communicate with all the started Tool Meisters
which execute the tool start/stop operations as a result of messages
they receive using Redis's publish/subscribe pattern.

The Redis server location is passed as a set of parameters (host & port)
to the Tool Meister instance, along with the name of a "key" in the
Redis server which contains that Tool Meister's initial operating
instructions for the duration of the benchmark script's execution:

  * What Redis pub/sub channel to use
  * What tool group describing the tools to use and their options

The Tool Meister then runs through a simple two phase life-cycle for
tools until it is told to "`terminate`": "`start`" the registered tools
on this host, and "`stop`" the registered tools on this host.

The initial expected phase is "`start`", where it waits to be told when
to start its tools running from a published message on the "tool
meister" channel. Once it starts one or more tools in the background via
`screen`, it waits for a "`stop`" message to invoke the running tools'
`stop` action.

This start/stop cycle is no different from the previous way tools were
started and stopped, except that the start and stop operations no longer
involve `ssh` operations to remote hosts.

Each `start` and `stop` message sent to the Tool Meisters is accompanied
by two parameters: the tool `group` of the registered tool set (only
used to ensure the context of the message is correct), and a path to a
`directory` on the host (the controller) driving the benchmark where all
the tool data will be collected.

Since the benchmark script ensures the directory is unique for each set
of tool data collected (iteration / sample / host), the Tool Meister
running on the same host as the controller just writes its collected
tool data in that given directory.

However, when a Tool Meister is running on a host remote from the
controller, that `directory` path is not present.  Instead the remote
Tool Meister uses a temporary directory instead of the given `directory`
path.  The given `directory` path is treated as a unique context ID to
track all the tool data collected in temporary directories so that
specific tool data can be retrieved when requested.

Because we are no longer using `ssh` to copy the collected tool data
from the remote hosts to the local controller driving the benchmark, we
have added a "`send`" phase for gathering each tool data set collected
by a start / stop pair.

The controller running the benchmark driver determines when to request
the collected tool data be "sent" back to a new Tool Data Sink process
running on the controller. The `send` can be issued immediately
following a `stop`, or all of the `start`/`stop` sequences can be
executed before all the `send` requests are made, or some combination
thereof.  The only requirement is that a `send` has to follow its
related `start`/`stop` sequence.

The Tool Data Sink is responsible for accepting data from remote Tool
Meisters, via an HTTP PUT method, whenever a "`send`" message is posted.

The pseudo code for the use of the Tool Meisters in a benchmark script
is as follows:

```
pbench-tool-meister-start  # New interface
for iter in ${iterations}; do
  for sample in ${samples}; do
    pbench-start-tools --group=${grp} --dir=${iter}/${sample}
    ... <benchmark> ...
    pbench-stop-tools --group=${grp} --dir=${iter}/${sample}
    # New interface added for `send` operation
    pbench-send-tools --group=${grp} --dir=${iter}/${sample}
    pbench-postprocess-tools --group=${grp} --dir=${iter}/${sample}
  done
done
pbench-tool-meister-stop  # New interface
```

Or having the tool data sent later:

```
pbench-tool-meister-start
for iter in ${iterations}; do
  for sample in ${samples}; do
    pbench-start-tools --group=${grp} --dir=${iter}/${sample}
    ... <benchmark> ...
    pbench-stop-tools --group=${grp} --dir=${iter}/${sample}
  done
done
for iter in ${iterations}; do
  for sample in ${samples}; do
    pbench-send-tools --group=${grp} --dir=${iter}/${sample}
    pbench-postprocess-tools --group=${grp} --dir=${iter}/${sample}
  done
done
pbench-tool-meister-stop
```

Note the addition of the new `pbench-send-tools` interface a caller can
use to indicate when remote tool data can be sent.

A behavioral change that comes with this work is that tool
post-processing is no longer performed remotely on the host where it is
collected.  Previous work added the necessary "stop-post-processing"
step, so that when tools are stopped any necessary post-processing and
environmental data collection required to allow the tool data to be used
off-host is collected.

This work IMPLIES that we no longer need to record registered tools
remotely.  We only need to start a Tool Meister remotely for each host,
passing the initial data it needs at start time via Redis.

Now that pbench-register-tool[-set] supports the ability for a caller to
register a tool [or tool set] for a list of hosts, we keep all the tool
data local on the pbench "controller" node where the pbench-agent's user
registers tools.

By doing this, we remove the need to manage a distributed data set
across multiple hosts, allowing for a "late" binding of tools to be run
on a set of hosts.  In other words, the tool registration can be done
without a host being present, with the understanding that it must be
present when a workload is run.

This is particularly powerful for environments like, OpenStack and
OpenShift, where software installation of tools are provided by
container images, VM images (like `qcow2`), and other automated
installation environments.

This is an invasive change, as knowledge about how tool data is
represented on disk was spread out across different pieces of code. We
have attempted to consolidate that knowledge, future work might be
required to adhere to the DRY principle.

**NOTES**:

  * The Tool Meister invokes the existing tools in `tool-scripts` as
    they operate today without any changes

 - [ ] Rewrite `pbench-tool-trigger` into a python application that
       talks directly to the Redis server to initiate the start, stop,
       send messages

 - [ ] Add support for the Tool Meisters to support collecting the
       `pbench-sysinfo-dump` data.
Maxusmusti and others added 2 commits September 2, 2020 21:17
This work adds the notion of a "collector" to the Tool Data Sink, and
"tools" which run continuously without cycling through the "start", "stop",
and "send" phases.  The collector is responsible for continuously pulling
data from those tools which are now started during the new "init" phase,
and stopped during the new "end" phase.

The first actual implementation of this kind of collector is for the
prometheus data collection environment, where a `node-exporter` "tool" is
run providing a end-point for a prometheus server "collector" to pull data
from it and store it locally off the run directory (`${benchmark_run_dir}`,
e.g. `${benchmark_run_dir}/collector/prometheus`).
Co-authored-by: maxusmusti <meyceoz@redhat.com>
@Maxusmusti
Copy link
Member

The origin branch likely needs to be rebased on the tool-meister branch to reflect the most up-to-date changes (will likely reduce the files changed from 302 to the proper amount)

@not4win
Copy link
Contributor Author

not4win commented Sep 3, 2020

Ah heck. Yeah. Will rebase it in a while.

portante and others added 3 commits September 4, 2020 02:11
The goal of the "Tool Meister" is to encapsulate the starting and
stopping of tools into a wrapper daemon which is started once on each
node for the duration of a benchmark script.  Instead of the start/stop
tools scripts using SSH to start/stop tools on local or remote hosts, a
Redis Server is use to communicate with all the started Tool Meisters
which execute the tool start/stop operations as a result of messages
they receive using Redis's publish/subscribe pattern.

The Redis server location is passed as a set of parameters (host & port)
to the Tool Meister instance, along with the name of a "key" in the
Redis server which contains that Tool Meister's initial operating
instructions for the duration of the benchmark script's execution:

  * What Redis pub/sub channel to use
  * What tool group describing the tools to use and their options

The Tool Meister then runs through a simple two phase life-cycle for
tools until it is told to "`terminate`": "`start`" the registered tools
on this host, and "`stop`" the registered tools on this host.

The initial expected phase is "`start`", where it waits to be told when
to start its tools running from a published message on the "tool
meister" channel. Once it starts one or more tools in the background via
`screen`, it waits for a "`stop`" message to invoke the running tools'
`stop` action.

This start/stop cycle is no different from the previous way tools were
started and stopped, except that the start and stop operations no longer
involve `ssh` operations to remote hosts.

Each `start` and `stop` message sent to the Tool Meisters is accompanied
by two parameters: the tool `group` of the registered tool set (only
used to ensure the context of the message is correct), and a path to a
`directory` on the host (the controller) driving the benchmark where all
the tool data will be collected.

Since the benchmark script ensures the directory is unique for each set
of tool data collected (iteration / sample / host), the Tool Meister
running on the same host as the controller just writes its collected
tool data in that given directory.

However, when a Tool Meister is running on a host remote from the
controller, that `directory` path is not present.  Instead the remote
Tool Meister uses a temporary directory instead of the given `directory`
path.  The given `directory` path is treated as a unique context ID to
track all the tool data collected in temporary directories so that
specific tool data can be retrieved when requested.

Because we are no longer using `ssh` to copy the collected tool data
from the remote hosts to the local controller driving the benchmark, we
have added a "`send`" phase for gathering each tool data set collected
by a start / stop pair.

The controller running the benchmark driver determines when to request
the collected tool data be "sent" back to a new Tool Data Sink process
running on the controller. The `send` can be issued immediately
following a `stop`, or all of the `start`/`stop` sequences can be
executed before all the `send` requests are made, or some combination
thereof.  The only requirement is that a `send` has to follow its
related `start`/`stop` sequence.

The Tool Data Sink is responsible for accepting data from remote Tool
Meisters, via an HTTP PUT method, whenever a "`send`" message is posted.

The pseudo code for the use of the Tool Meisters in a benchmark script
is as follows:

```
pbench-tool-meister-start  # New interface
for iter in ${iterations}; do
  for sample in ${samples}; do
    pbench-start-tools --group=${grp} --dir=${iter}/${sample}
    ... <benchmark> ...
    pbench-stop-tools --group=${grp} --dir=${iter}/${sample}
    # New interface added for `send` operation
    pbench-send-tools --group=${grp} --dir=${iter}/${sample}
    pbench-postprocess-tools --group=${grp} --dir=${iter}/${sample}
  done
done
pbench-tool-meister-stop  # New interface
```

Or having the tool data sent later:

```
pbench-tool-meister-start
for iter in ${iterations}; do
  for sample in ${samples}; do
    pbench-start-tools --group=${grp} --dir=${iter}/${sample}
    ... <benchmark> ...
    pbench-stop-tools --group=${grp} --dir=${iter}/${sample}
  done
done
for iter in ${iterations}; do
  for sample in ${samples}; do
    pbench-send-tools --group=${grp} --dir=${iter}/${sample}
    pbench-postprocess-tools --group=${grp} --dir=${iter}/${sample}
  done
done
pbench-tool-meister-stop
```

Note the addition of the new `pbench-send-tools` interface a caller can
use to indicate when remote tool data can be sent.

A behavioral change that comes with this work is that tool
post-processing is no longer performed remotely on the host where it is
collected.  Previous work added the necessary "stop-post-processing"
step, so that when tools are stopped any necessary post-processing and
environmental data collection required to allow the tool data to be used
off-host is collected.

This work IMPLIES that we no longer need to record registered tools
remotely.  We only need to start a Tool Meister remotely for each host,
passing the initial data it needs at start time via Redis.

Now that pbench-register-tool[-set] supports the ability for a caller to
register a tool [or tool set] for a list of hosts, we keep all the tool
data local on the pbench "controller" node where the pbench-agent's user
registers tools.

By doing this, we remove the need to manage a distributed data set
across multiple hosts, allowing for a "late" binding of tools to be run
on a set of hosts.  In other words, the tool registration can be done
without a host being present, with the understanding that it must be
present when a workload is run.

This is particularly powerful for environments like, OpenStack and
OpenShift, where software installation of tools are provided by
container images, VM images (like `qcow2`), and other automated
installation environments.

This is an invasive change, as knowledge about how tool data is
represented on disk was spread out across different pieces of code. We
have attempted to consolidate that knowledge, future work might be
required to adhere to the DRY principle.

**NOTES**:

  * The Tool Meister invokes the existing tools in `tool-scripts` as
    they operate today without any changes

 - [ ] Rewrite `pbench-tool-trigger` into a python application that
       talks directly to the Redis server to initiate the start, stop,
       send messages

 - [ ] Add support for the Tool Meisters to support collecting the
       `pbench-sysinfo-dump` data.
This work adds the notion of a "collector" to the Tool Data Sink, and
"tools" which run continuously without cycling through the "start", "stop",
and "send" phases.  The collector is responsible for continuously pulling
data from those tools which are now started during the new "init" phase,
and stopped during the new "end" phase.

The first actual implementation of this kind of collector is for the
prometheus data collection environment, where a `node-exporter` "tool" is
run providing a end-point for a prometheus server "collector" to pull data
from it and store it locally off the run directory (`${benchmark_run_dir}`,
e.g. `${benchmark_run_dir}/collector/prometheus`).
Co-authored-by: maxusmusti <meyceoz@redhat.com>
@not4win not4win force-pushed the pcp-pbench-int branch 2 times, most recently from 92f07c5 to 086c899 Compare September 3, 2020 20:53
@not4win
Copy link
Contributor Author

not4win commented Sep 3, 2020

The rebase was a bit pain. But yep. Done.

…ping.json files

 pcp-pbench: minor bug fix

pcp-pbench: integration completed, testing & debugging remain

 pcp-pbench:fixed bugs with pcptool and string-json conversion

 rebasing errors debugged

 rebase errros fixed
@portante portante added enhancement Agent tools Of and related to the operation and behavior of various tools (iostat, sar, etc.) labels Sep 4, 2020
@portante portante self-assigned this Sep 4, 2020
@portante portante added this to the v0.70 milestone Sep 4, 2020
@Maxusmusti
Copy link
Member

May need to be re-rebased after final code review updates, hopefully should be much shorter this time 😅

@portante portante force-pushed the tool-meister branch 2 times, most recently from ecbe594 to e756ca2 Compare September 11, 2020 01:24
@portante portante force-pushed the tool-meister branch 3 times, most recently from 8e3f649 to 9dd82cf Compare September 29, 2020 01:01
@portante
Copy link
Member

This code needs a rebase and verification. It replaces portante#9.

@portante portante force-pushed the tool-meister branch 7 times, most recently from 0b88e17 to 9c29bc0 Compare October 1, 2020 21:23
@portante portante modified the milestones: v0.70, v0.71 Oct 1, 2020
@portante
Copy link
Member

Closing as we merged this PR into the pcp-tool-meister branch so that @Maxusmusti could continue this work with his PR #1956.

@portante
Copy link
Member

Replaced by #1986.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Agent enhancement tools Of and related to the operation and behavior of various tools (iostat, sar, etc.)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants