-
Notifications
You must be signed in to change notification settings - Fork 107
PCP Update #1956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PCP Update #1956
Conversation
|
This pull request introduces 9 alerts when merging 95d627d into a86cbb6 - view on LGTM.com new alerts:
|
| "node-exporter": {"collector": "prometheus", "port": "9100"}, | ||
| "dcgm": {"collector": "prometheus", "port": "8000"} | ||
| "dcgm": {"collector": "prometheus", "port": "8000"}, | ||
| "pcptool": {"collector": "pcp", "port": "44321"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's drop the existing pcp tool entirely, and just replace it with this one.
| ANSIBLE_CONF = "/etc/ansible/ansible.cfg" | ||
| ANSIBLE_ROLES_PATH = "/etc/ansible/roles" | ||
| PLAYBOOK_PATH = "" | ||
| TOOLS_PMDA_MAPPING = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we consider making this a class variable of PCPTools?
95d627d to
f046eca
Compare
| "probe": "bpftrace" | ||
| }, | ||
| "cpuacct": { | ||
| "ident": "cpu accouting Information", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra space between "accouting" (which is misspelled) and "Information" (which shouldn't be capitalized).
| "probe": "dmcache.cache.used values" | ||
| }, | ||
| "docker": { | ||
| "ident": "Docker Information", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "voice" of these "ident" strings are all over the place. It'd be nice to get them all consistent. For example, this one is basically a title, with both words capitalized, and no real indication of what it means; whereas dm-cache was a paragraph introducing the metric. Most people will probably know what Docker is and produce some good guesses about what "Docker Information" might imply, but "docker usage statistics" might be more useful. Or even "docker controller daemon statistics" since the metrics below suggest we're really looking at those rather than wider statistics on container usage within the system.
| "probe": "kernel.percpu.cpu" | ||
| }, | ||
| "kvmstat": { | ||
| "ident": "metrics used by the pcp-kvm(1) command ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there whitespace at the end of this? Again, these "doc strings" are all over the place, and that's distracting at best.
| """ | ||
| input: takes list of hosts and tools | ||
| what init function does: | ||
| 1. checks whether pmcd is active amongst the hosts mentions in the list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"hosts mentioned"? Or maybe just "hosts in the list". I'd probably nuke "amongst" as well and just go with "active in one of the listed hosts".
| input: takes list of hosts and tools | ||
| what init function does: | ||
| 1. checks whether pmcd is active amongst the hosts mentions in the list. | ||
| 2. reads the pcp-mappig file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe be a bit more specific about "pcp-mapping.json" (ignoring the missing "n")
| if ( | ||
| stdout.decode().strip() | ||
| != f'Creating config file "{host}.config" using default settings ...\n.' | ||
| ): | ||
| # pmlogger failed to stop. Log it | ||
| self.logger.error("not able to build pmlogconf file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't check the exit status? Judging success by matching output strings is always risky.
(Not to mention that this module has now adding template string formatting to %s formatting and .format formatting, making the full sweep of Python string formatting styles and making the whole thing look just a bit more messy. I'm assuming this is an inherited issue, but it'd be nice to clean it up.)
| stdout.decode().strip() | ||
| != f'Creating config file "{host}.config" using default settings ...\n.' | ||
| ): | ||
| # pmlogger failed to stop. Log it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Failed to stop"? Were we trying to stop it???
|
|
||
| self.pmlog_config_files.append("{}.config".format(host)) | ||
| except Exception as e: | ||
| # log exception as we are not able to stop logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again we failed to stop it when trying to write a config file. I don't really know how pmlogconf works, although the man page I found didn't suggest it does any more than writing a formatted config file.
| except Exception as e: | ||
| # log exception as we are not able to stop logger | ||
| self.logger.error( | ||
| "not able to create a logger conf file\n", e, exc_info=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exc_info=True causes the logger to add exception information, so I think the e argument is unused since there's no substitution placeholder in the message string.
This pattern is repeated several times below.
| def check_connection(self, hosts): | ||
| """ Check if pmcd is running in given list of hosts""" | ||
| self.logger.info("checking connection") | ||
|
|
||
| for host in hosts: | ||
| res = "" | ||
| try: | ||
| self.logger.debug("checking connection on host:%s", host) | ||
| res = pmapi.pmContext(api.PM_CONTEXT_HOST, host + ":44321") | ||
| except Exception: | ||
| res = None | ||
|
|
||
| if res is not None: | ||
| self.logger.info("Port 44321 is open in:{}".format(host)) | ||
| self.hosts.append( | ||
| host | ||
| ) # logging only on those hosts with port 44321 active | ||
| else: | ||
| self.logger.error("Port 44321 is not open in:{}".format(host)) | ||
| self.logger.error("Ignoring host:%s", host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be repeated twice. Why not promote it to BaseCollector (or a new intermediary PCP class) so they can share?
06f3e84 to
a2a838d
Compare
|
This pull request introduces 1 alert when merging 36b79f8 into ad3e4d3 - view on LGTM.com new alerts:
|
dbutenhof
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stacking tool meister on top of Ansible just seems... "off". This sort of cross-node orchestration of tool start/stop seems to be exactly what tool meister was designed to do in the first place. Unfortunate if we need to stack like this to manage PCP.
| from ansible.parsing.dataloader import DataLoader | ||
| from ansible.playbook.play import Play | ||
|
|
||
| # from ansible.plugins.callback import CallbackBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deliberately left commented-out, or sad and forgotten? ;-)
| loader=self.loader, inventory=self.inventory | ||
| ) | ||
|
|
||
| def find_roles_path(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ansible doesn't support APIs to manage roles cleanly without assuming and poking into internals? This really doesn't make me feel good.
| command = "ls -lA " + self.roles_path | ||
| out = subprocess.Popen( | ||
| [command], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True | ||
| ) | ||
| stdout, stderr = out.communicate() | ||
| # decode the output | ||
| output = stdout.decode() | ||
| self.logger.debug( | ||
| "output from command:%s is:%s\n and stderr:%s", | ||
| command, | ||
| stdout.decode(), | ||
| stderr, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a very cumbersome way to just list files in a directory.
| pass | ||
|
|
||
|
|
||
| class PCPTools: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be great to pull this class out of "tool meister" along with the mapping table, and just import it here.
| if output_err is not None: | ||
| self.logger.error("not able to download role. error:%s", output_err) | ||
|
|
||
| def start(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a lot of duplication between start and stop that could probably be factored out into a common method to simplify maintenance.
| # code to start logging | ||
| self.logger.info("starting logger") | ||
|
|
||
| command = PCP.pmGetConfig("PCP_BINADM_DIR") + "/pmlogger_check -c {}".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed verbally today, let's work on containerizing the pmlogger much like the way we handle Prometheus.
| return | ||
|
|
||
| # given role does not exist. We have to download it from ansible galaxy | ||
| command = "ansible-galaxy collection install performancecopilot.metrics" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we talked about today verbally, let's drop the use of Ansible to install and start PCP on local/remote systems. Philosophically, the pbench-agent should not be in the business of install software on hosts, or changing the host configuration to suite its needs. Instead let's build a container for pmcd which encapsulates what we need, and do the same for node-exporter as well.
|
Replaced by #1986. |
Finally got everything into a good enough state that it's worth putting up a draft.
NOTE: there is still a lot of work left for me to do here, like...
I will continue updating this draft PR regularly
ANOTHER NOTE: The first commit in this draft needs credit to be given to Ashwin @not4win as a lot of the code was from his original work, but I'm not sure how to do that...