pcp-pbench: [WIP] added PCPTools(tm), PCPPmlogger(tds), PCPPmie(tds) classes and pcp-mapping.json files #1759

not4win · 2020-08-19T09:55:50Z

This PR integrates pbench with pcp. In its early stages.

File changes:
modified: agent/util-scripts/pbench-tool-data-sink
modified: agent/util-scripts/pbench-tool-meister
new file: agent/util-scripts/pcp-mapping.json
new file: agent/tool-scripts/pcptool
modified: agent/util-scripts/gold/pbench-register-tool/test-44.txt
modified: agent/util-scripts/gold/pbench-register-tool/test-46.txt
modified: agent/util-scripts/gold/pbench-register-tool/test-47.txt
modified: agent/util-scripts/pbench-tool-meister-start

The goal of the "Tool Meister" is to encapsulate the starting and stopping of tools into a wrapper daemon which is started once on each node for the duration of a benchmark script. Instead of the start/stop tools scripts using SSH to start/stop tools on local or remote hosts, a Redis Server is use to communicate with all the started Tool Meisters which execute the tool start/stop operations as a result of messages they receive using Redis's publish/subscribe pattern. The Redis server location is passed as a set of parameters (host & port) to the Tool Meister instance, along with the name of a "key" in the Redis server which contains that Tool Meister's initial operating instructions for the duration of the benchmark script's execution: * What Redis pub/sub channel to use * What tool group describing the tools to use and their options The Tool Meister then runs through a simple two phase life-cycle for tools until it is told to "`terminate`": "`start`" the registered tools on this host, and "`stop`" the registered tools on this host. The initial expected phase is "`start`", where it waits to be told when to start its tools running from a published message on the "tool meister" channel. Once it starts one or more tools in the background via `screen`, it waits for a "`stop`" message to invoke the running tools' `stop` action. This start/stop cycle is no different from the previous way tools were started and stopped, except that the start and stop operations no longer involve `ssh` operations to remote hosts. Each `start` and `stop` message sent to the Tool Meisters is accompanied by two parameters: the tool `group` of the registered tool set (only used to ensure the context of the message is correct), and a path to a `directory` on the host (the controller) driving the benchmark where all the tool data will be collected. Since the benchmark script ensures the directory is unique for each set of tool data collected (iteration / sample / host), the Tool Meister running on the same host as the controller just writes its collected tool data in that given directory. However, when a Tool Meister is running on a host remote from the controller, that `directory` path is not present. Instead the remote Tool Meister uses a temporary directory instead of the given `directory` path. The given `directory` path is treated as a unique context ID to track all the tool data collected in temporary directories so that specific tool data can be retrieved when requested. Because we are no longer using `ssh` to copy the collected tool data from the remote hosts to the local controller driving the benchmark, we have added a "`send`" phase for gathering each tool data set collected by a start / stop pair. The controller running the benchmark driver determines when to request the collected tool data be "sent" back to a new Tool Data Sink process running on the controller. The `send` can be issued immediately following a `stop`, or all of the `start`/`stop` sequences can be executed before all the `send` requests are made, or some combination thereof. The only requirement is that a `send` has to follow its related `start`/`stop` sequence. The Tool Data Sink is responsible for accepting data from remote Tool Meisters, via an HTTP PUT method, whenever a "`send`" message is posted. The pseudo code for the use of the Tool Meisters in a benchmark script is as follows: ``` pbench-tool-meister-start # New interface for iter in ${iterations}; do for sample in ${samples}; do pbench-start-tools --group=${grp} --dir=${iter}/${sample} ... <benchmark> ... pbench-stop-tools --group=${grp} --dir=${iter}/${sample} # New interface added for `send` operation pbench-send-tools --group=${grp} --dir=${iter}/${sample} pbench-postprocess-tools --group=${grp} --dir=${iter}/${sample} done done pbench-tool-meister-stop # New interface ``` Or having the tool data sent later: ``` pbench-tool-meister-start for iter in ${iterations}; do for sample in ${samples}; do pbench-start-tools --group=${grp} --dir=${iter}/${sample} ... <benchmark> ... pbench-stop-tools --group=${grp} --dir=${iter}/${sample} done done for iter in ${iterations}; do for sample in ${samples}; do pbench-send-tools --group=${grp} --dir=${iter}/${sample} pbench-postprocess-tools --group=${grp} --dir=${iter}/${sample} done done pbench-tool-meister-stop ``` Note the addition of the new `pbench-send-tools` interface a caller can use to indicate when remote tool data can be sent. A behavioral change that comes with this work is that tool post-processing is no longer performed remotely on the host where it is collected. Previous work added the necessary "stop-post-processing" step, so that when tools are stopped any necessary post-processing and environmental data collection required to allow the tool data to be used off-host is collected. This work IMPLIES that we no longer need to record registered tools remotely. We only need to start a Tool Meister remotely for each host, passing the initial data it needs at start time via Redis. Now that pbench-register-tool[-set] supports the ability for a caller to register a tool [or tool set] for a list of hosts, we keep all the tool data local on the pbench "controller" node where the pbench-agent's user registers tools. By doing this, we remove the need to manage a distributed data set across multiple hosts, allowing for a "late" binding of tools to be run on a set of hosts. In other words, the tool registration can be done without a host being present, with the understanding that it must be present when a workload is run. This is particularly powerful for environments like, OpenStack and OpenShift, where software installation of tools are provided by container images, VM images (like `qcow2`), and other automated installation environments. This is an invasive change, as knowledge about how tool data is represented on disk was spread out across different pieces of code. We have attempted to consolidate that knowledge, future work might be required to adhere to the DRY principle. **NOTES**: * The Tool Meister invokes the existing tools in `tool-scripts` as they operate today without any changes - [ ] Rewrite `pbench-tool-trigger` into a python application that talks directly to the Redis server to initiate the start, stop, send messages - [ ] Add support for the Tool Meisters to support collecting the `pbench-sysinfo-dump` data.

This work adds the notion of a "collector" to the Tool Data Sink, and "tools" which run continuously without cycling through the "start", "stop", and "send" phases. The collector is responsible for continuously pulling data from those tools which are now started during the new "init" phase, and stopped during the new "end" phase. The first actual implementation of this kind of collector is for the prometheus data collection environment, where a `node-exporter` "tool" is run providing a end-point for a prometheus server "collector" to pull data from it and store it locally off the run directory (`${benchmark_run_dir}`, e.g. `${benchmark_run_dir}/collector/prometheus`).

Co-authored-by: maxusmusti <meyceoz@redhat.com>

Maxusmusti · 2020-09-03T12:23:38Z

The origin branch likely needs to be rebased on the tool-meister branch to reflect the most up-to-date changes (will likely reduce the files changed from 302 to the proper amount)

not4win · 2020-09-03T12:43:11Z

Ah heck. Yeah. Will rebase it in a while.

The goal of the "Tool Meister" is to encapsulate the starting and stopping of tools into a wrapper daemon which is started once on each node for the duration of a benchmark script. Instead of the start/stop tools scripts using SSH to start/stop tools on local or remote hosts, a Redis Server is use to communicate with all the started Tool Meisters which execute the tool start/stop operations as a result of messages they receive using Redis's publish/subscribe pattern. The Redis server location is passed as a set of parameters (host & port) to the Tool Meister instance, along with the name of a "key" in the Redis server which contains that Tool Meister's initial operating instructions for the duration of the benchmark script's execution: * What Redis pub/sub channel to use * What tool group describing the tools to use and their options The Tool Meister then runs through a simple two phase life-cycle for tools until it is told to "`terminate`": "`start`" the registered tools on this host, and "`stop`" the registered tools on this host. The initial expected phase is "`start`", where it waits to be told when to start its tools running from a published message on the "tool meister" channel. Once it starts one or more tools in the background via `screen`, it waits for a "`stop`" message to invoke the running tools' `stop` action. This start/stop cycle is no different from the previous way tools were started and stopped, except that the start and stop operations no longer involve `ssh` operations to remote hosts. Each `start` and `stop` message sent to the Tool Meisters is accompanied by two parameters: the tool `group` of the registered tool set (only used to ensure the context of the message is correct), and a path to a `directory` on the host (the controller) driving the benchmark where all the tool data will be collected. Since the benchmark script ensures the directory is unique for each set of tool data collected (iteration / sample / host), the Tool Meister running on the same host as the controller just writes its collected tool data in that given directory. However, when a Tool Meister is running on a host remote from the controller, that `directory` path is not present. Instead the remote Tool Meister uses a temporary directory instead of the given `directory` path. The given `directory` path is treated as a unique context ID to track all the tool data collected in temporary directories so that specific tool data can be retrieved when requested. Because we are no longer using `ssh` to copy the collected tool data from the remote hosts to the local controller driving the benchmark, we have added a "`send`" phase for gathering each tool data set collected by a start / stop pair. The controller running the benchmark driver determines when to request the collected tool data be "sent" back to a new Tool Data Sink process running on the controller. The `send` can be issued immediately following a `stop`, or all of the `start`/`stop` sequences can be executed before all the `send` requests are made, or some combination thereof. The only requirement is that a `send` has to follow its related `start`/`stop` sequence. The Tool Data Sink is responsible for accepting data from remote Tool Meisters, via an HTTP PUT method, whenever a "`send`" message is posted. The pseudo code for the use of the Tool Meisters in a benchmark script is as follows: ``` pbench-tool-meister-start # New interface for iter in ${iterations}; do for sample in ${samples}; do pbench-start-tools --group=${grp} --dir=${iter}/${sample} ... <benchmark> ... pbench-stop-tools --group=${grp} --dir=${iter}/${sample} # New interface added for `send` operation pbench-send-tools --group=${grp} --dir=${iter}/${sample} pbench-postprocess-tools --group=${grp} --dir=${iter}/${sample} done done pbench-tool-meister-stop # New interface ``` Or having the tool data sent later: ``` pbench-tool-meister-start for iter in ${iterations}; do for sample in ${samples}; do pbench-start-tools --group=${grp} --dir=${iter}/${sample} ... <benchmark> ... pbench-stop-tools --group=${grp} --dir=${iter}/${sample} done done for iter in ${iterations}; do for sample in ${samples}; do pbench-send-tools --group=${grp} --dir=${iter}/${sample} pbench-postprocess-tools --group=${grp} --dir=${iter}/${sample} done done pbench-tool-meister-stop ``` Note the addition of the new `pbench-send-tools` interface a caller can use to indicate when remote tool data can be sent. A behavioral change that comes with this work is that tool post-processing is no longer performed remotely on the host where it is collected. Previous work added the necessary "stop-post-processing" step, so that when tools are stopped any necessary post-processing and environmental data collection required to allow the tool data to be used off-host is collected. This work IMPLIES that we no longer need to record registered tools remotely. We only need to start a Tool Meister remotely for each host, passing the initial data it needs at start time via Redis. Now that pbench-register-tool[-set] supports the ability for a caller to register a tool [or tool set] for a list of hosts, we keep all the tool data local on the pbench "controller" node where the pbench-agent's user registers tools. By doing this, we remove the need to manage a distributed data set across multiple hosts, allowing for a "late" binding of tools to be run on a set of hosts. In other words, the tool registration can be done without a host being present, with the understanding that it must be present when a workload is run. This is particularly powerful for environments like, OpenStack and OpenShift, where software installation of tools are provided by container images, VM images (like `qcow2`), and other automated installation environments. This is an invasive change, as knowledge about how tool data is represented on disk was spread out across different pieces of code. We have attempted to consolidate that knowledge, future work might be required to adhere to the DRY principle. **NOTES**: * The Tool Meister invokes the existing tools in `tool-scripts` as they operate today without any changes - [ ] Rewrite `pbench-tool-trigger` into a python application that talks directly to the Redis server to initiate the start, stop, send messages - [ ] Add support for the Tool Meisters to support collecting the `pbench-sysinfo-dump` data.

This work adds the notion of a "collector" to the Tool Data Sink, and "tools" which run continuously without cycling through the "start", "stop", and "send" phases. The collector is responsible for continuously pulling data from those tools which are now started during the new "init" phase, and stopped during the new "end" phase. The first actual implementation of this kind of collector is for the prometheus data collection environment, where a `node-exporter` "tool" is run providing a end-point for a prometheus server "collector" to pull data from it and store it locally off the run directory (`${benchmark_run_dir}`, e.g. `${benchmark_run_dir}/collector/prometheus`).

Co-authored-by: maxusmusti <meyceoz@redhat.com>

not4win · 2020-09-03T20:55:25Z

The rebase was a bit pain. But yep. Done.

…ping.json files pcp-pbench: minor bug fix pcp-pbench: integration completed, testing & debugging remain pcp-pbench:fixed bugs with pcptool and string-json conversion rebasing errors debugged rebase errros fixed

Maxusmusti · 2020-09-04T19:13:42Z

May need to be re-rebased after final code review updates, hopefully should be much shorter this time 😅

portante · 2020-09-30T03:08:49Z

This code needs a rebase and verification. It replaces portante#9.

portante · 2020-11-13T21:42:21Z

Closing as we merged this PR into the pcp-tool-meister branch so that @Maxusmusti could continue this work with his PR #1956.

portante · 2020-11-18T04:00:19Z

Replaced by #1986.

portante force-pushed the tool-meister branch 2 times, most recently from a03231b to adf7478 Compare September 1, 2020 20:10

portante force-pushed the tool-meister branch from adf7478 to fea3eae Compare September 2, 2020 22:06

Maxusmusti and others added 2 commits September 2, 2020 21:17

Added DCGM Tool to pbench-agent (keshavm02)

303cee7

Co-authored-by: maxusmusti <meyceoz@redhat.com>

portante force-pushed the tool-meister branch from fea3eae to 303cee7 Compare September 3, 2020 01:19

portante and others added 3 commits September 4, 2020 02:11

Added DCGM Tool to pbench-agent (keshavm02)

f923406

Co-authored-by: maxusmusti <meyceoz@redhat.com>

not4win force-pushed the pcp-pbench-int branch 2 times, most recently from 92f07c5 to 086c899 Compare September 3, 2020 20:53

pcp-pbench: added PCPTools, PCPPmlogger, PCPPmie classes and pcp-map…

0d4f4df

…ping.json files pcp-pbench: minor bug fix pcp-pbench: integration completed, testing & debugging remain pcp-pbench:fixed bugs with pcptool and string-json conversion rebasing errors debugged rebase errros fixed

not4win force-pushed the pcp-pbench-int branch from 086c899 to 0d4f4df Compare September 3, 2020 21:03

portante added enhancement Agent tools Of and related to the operation and behavior of various tools (iostat, sar, etc.) labels Sep 4, 2020

portante self-assigned this Sep 4, 2020

portante added this to the v0.70 milestone Sep 4, 2020

portante assigned Maxusmusti Sep 4, 2020

portante force-pushed the tool-meister branch from 303cee7 to d930bac Compare September 4, 2020 18:56

portante force-pushed the tool-meister branch 2 times, most recently from ecbe594 to e756ca2 Compare September 11, 2020 01:24

portante force-pushed the tool-meister branch 3 times, most recently from 8e3f649 to 9dd82cf Compare September 29, 2020 01:01

portante mentioned this pull request Sep 30, 2020

pcp-pbench: first pass implementation portante/pbench#9

Closed

portante force-pushed the tool-meister branch 7 times, most recently from 0b88e17 to 9c29bc0 Compare October 1, 2020 21:23

portante modified the milestones: v0.70, v0.71 Oct 1, 2020

portante closed this Nov 13, 2020

portante mentioned this pull request Nov 18, 2020

Containerized PCP Integration #1986

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pcp-pbench: [WIP] added PCPTools(tm), PCPPmlogger(tds), PCPPmie(tds) classes and pcp-mapping.json files #1759

pcp-pbench: [WIP] added PCPTools(tm), PCPPmlogger(tds), PCPPmie(tds) classes and pcp-mapping.json files #1759

Uh oh!

not4win commented Aug 19, 2020 •

edited

Loading

Uh oh!

Maxusmusti commented Sep 3, 2020

Uh oh!

not4win commented Sep 3, 2020

Uh oh!

not4win commented Sep 3, 2020

Uh oh!

Maxusmusti commented Sep 4, 2020

Uh oh!

portante commented Sep 30, 2020

Uh oh!

portante commented Nov 13, 2020

Uh oh!

portante commented Nov 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pcp-pbench: [WIP] added PCPTools(tm), PCPPmlogger(tds), PCPPmie(tds) classes and pcp-mapping.json files #1759

pcp-pbench: [WIP] added PCPTools(tm), PCPPmlogger(tds), PCPPmie(tds) classes and pcp-mapping.json files #1759

Uh oh!

Conversation

not4win commented Aug 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Maxusmusti commented Sep 3, 2020

Uh oh!

not4win commented Sep 3, 2020

Uh oh!

not4win commented Sep 3, 2020

Uh oh!

Maxusmusti commented Sep 4, 2020

Uh oh!

portante commented Sep 30, 2020

Uh oh!

portante commented Nov 13, 2020

Uh oh!

portante commented Nov 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

not4win commented Aug 19, 2020 •

edited

Loading