Skip to content

Conversation

@Maxusmusti
Copy link
Member

@Maxusmusti Maxusmusti commented Nov 5, 2020

Finally got everything into a good enough state that it's worth putting up a draft.
NOTE: there is still a lot of work left for me to do here, like...

  • Cleaner integration with tool metadata
  • Further abstracting and optimizing solution to work within persistent tool system rather than separately
  • Fix unit test compatibility

I will continue updating this draft PR regularly

ANOTHER NOTE: The first commit in this draft needs credit to be given to Ashwin @not4win as a lot of the code was from his original work, but I'm not sure how to do that...

@lgtm-com
Copy link

lgtm-com bot commented Nov 5, 2020

This pull request introduces 9 alerts when merging 95d627d into a86cbb6 - view on LGTM.com

new alerts:

  • 4 for Except block handles 'BaseException'
  • 3 for Unused import
  • 2 for Unused local variable

@portante portante added Agent enhancement tools Of and related to the operation and behavior of various tools (iostat, sar, etc.) labels Nov 6, 2020
@portante portante added this to the v0.71 milestone Nov 6, 2020
"node-exporter": {"collector": "prometheus", "port": "9100"},
"dcgm": {"collector": "prometheus", "port": "8000"}
"dcgm": {"collector": "prometheus", "port": "8000"},
"pcptool": {"collector": "pcp", "port": "44321"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop the existing pcp tool entirely, and just replace it with this one.

ANSIBLE_CONF = "/etc/ansible/ansible.cfg"
ANSIBLE_ROLES_PATH = "/etc/ansible/roles"
PLAYBOOK_PATH = ""
TOOLS_PMDA_MAPPING = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consider making this a class variable of PCPTools?

"probe": "bpftrace"
},
"cpuacct": {
"ident": "cpu accouting Information",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space between "accouting" (which is misspelled) and "Information" (which shouldn't be capitalized).

"probe": "dmcache.cache.used values"
},
"docker": {
"ident": "Docker Information",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "voice" of these "ident" strings are all over the place. It'd be nice to get them all consistent. For example, this one is basically a title, with both words capitalized, and no real indication of what it means; whereas dm-cache was a paragraph introducing the metric. Most people will probably know what Docker is and produce some good guesses about what "Docker Information" might imply, but "docker usage statistics" might be more useful. Or even "docker controller daemon statistics" since the metrics below suggest we're really looking at those rather than wider statistics on container usage within the system.

"probe": "kernel.percpu.cpu"
},
"kvmstat": {
"ident": "metrics used by the pcp-kvm(1) command ",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there whitespace at the end of this? Again, these "doc strings" are all over the place, and that's distracting at best.

"""
input: takes list of hosts and tools
what init function does:
1. checks whether pmcd is active amongst the hosts mentions in the list.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"hosts mentioned"? Or maybe just "hosts in the list". I'd probably nuke "amongst" as well and just go with "active in one of the listed hosts".

input: takes list of hosts and tools
what init function does:
1. checks whether pmcd is active amongst the hosts mentions in the list.
2. reads the pcp-mappig file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe be a bit more specific about "pcp-mapping.json" (ignoring the missing "n")

Comment on lines +391 to +396
if (
stdout.decode().strip()
!= f'Creating config file "{host}.config" using default settings ...\n.'
):
# pmlogger failed to stop. Log it
self.logger.error("not able to build pmlogconf file")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't check the exit status? Judging success by matching output strings is always risky.

(Not to mention that this module has now adding template string formatting to %s formatting and .format formatting, making the full sweep of Python string formatting styles and making the whole thing look just a bit more messy. I'm assuming this is an inherited issue, but it'd be nice to clean it up.)

stdout.decode().strip()
!= f'Creating config file "{host}.config" using default settings ...\n.'
):
# pmlogger failed to stop. Log it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Failed to stop"? Were we trying to stop it???


self.pmlog_config_files.append("{}.config".format(host))
except Exception as e:
# log exception as we are not able to stop logger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again we failed to stop it when trying to write a config file. I don't really know how pmlogconf works, although the man page I found didn't suggest it does any more than writing a formatted config file.

except Exception as e:
# log exception as we are not able to stop logger
self.logger.error(
"not able to create a logger conf file\n", e, exc_info=True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exc_info=True causes the logger to add exception information, so I think the e argument is unused since there's no substitution placeholder in the message string.

This pattern is repeated several times below.

Comment on lines +524 to +543
def check_connection(self, hosts):
""" Check if pmcd is running in given list of hosts"""
self.logger.info("checking connection")

for host in hosts:
res = ""
try:
self.logger.debug("checking connection on host:%s", host)
res = pmapi.pmContext(api.PM_CONTEXT_HOST, host + ":44321")
except Exception:
res = None

if res is not None:
self.logger.info("Port 44321 is open in:{}".format(host))
self.hosts.append(
host
) # logging only on those hosts with port 44321 active
else:
self.logger.error("Port 44321 is not open in:{}".format(host))
self.logger.error("Ignoring host:%s", host)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be repeated twice. Why not promote it to BaseCollector (or a new intermediary PCP class) so they can share?

@lgtm-com
Copy link

lgtm-com bot commented Nov 12, 2020

This pull request introduces 1 alert when merging 36b79f8 into ad3e4d3 - view on LGTM.com

new alerts:

  • 1 for Syntax error

Copy link
Member

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stacking tool meister on top of Ansible just seems... "off". This sort of cross-node orchestration of tool start/stop seems to be exactly what tool meister was designed to do in the first place. Unfortunate if we need to stack like this to manage PCP.

from ansible.parsing.dataloader import DataLoader
from ansible.playbook.play import Play

# from ansible.plugins.callback import CallbackBase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deliberately left commented-out, or sad and forgotten? ;-)

loader=self.loader, inventory=self.inventory
)

def find_roles_path(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ansible doesn't support APIs to manage roles cleanly without assuming and poking into internals? This really doesn't make me feel good.

Comment on lines +174 to +186
command = "ls -lA " + self.roles_path
out = subprocess.Popen(
[command], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True
)
stdout, stderr = out.communicate()
# decode the output
output = stdout.decode()
self.logger.debug(
"output from command:%s is:%s\n and stderr:%s",
command,
stdout.decode(),
stderr,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a very cumbersome way to just list files in a directory.

pass


class PCPTools:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great to pull this class out of "tool meister" along with the mapping table, and just import it here.

if output_err is not None:
self.logger.error("not able to download role. error:%s", output_err)

def start(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a lot of duplication between start and stop that could probably be factored out into a common method to simplify maintenance.

# code to start logging
self.logger.info("starting logger")

command = PCP.pmGetConfig("PCP_BINADM_DIR") + "/pmlogger_check -c {}".format(
Copy link
Member

@portante portante Nov 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed verbally today, let's work on containerizing the pmlogger much like the way we handle Prometheus.

return

# given role does not exist. We have to download it from ansible galaxy
command = "ansible-galaxy collection install performancecopilot.metrics"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we talked about today verbally, let's drop the use of Ansible to install and start PCP on local/remote systems. Philosophically, the pbench-agent should not be in the business of install software on hosts, or changing the host configuration to suite its needs. Instead let's build a container for pmcd which encapsulates what we need, and do the same for node-exporter as well.

@portante
Copy link
Member

Replaced by #1986.

@portante portante closed this Nov 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Agent enhancement tools Of and related to the operation and behavior of various tools (iostat, sar, etc.)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants