Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store performance data not working #4941

Closed
JGjorgji opened this issue Aug 27, 2016 · 30 comments

Comments

Projects
None yet
7 participants
@JGjorgji
Copy link

commented Aug 27, 2016

It still only collects data when I'm logged in.

OS:
Centos 7 with latest updates as of this moment
Cockpit version:
cockpit-pcp-0.114-2.el7.centos.x86_64
PCP version:
pcp-3.10.6-2.el7.x86_64

I only installed the following packages since i didn't want docker and the default package was pulling that in:

cockpit-pcp
cockpit-bridge
cockpit-shell
cockpit-ws

@petervo

This comment has been minimized.

Copy link
Contributor

commented Aug 29, 2016

If you check the pcp archives from the command line do they show collected data?

@petervo petervo added the question label Aug 29, 2016

@JGjorgji

This comment has been minimized.

Copy link
Author

commented Aug 29, 2016

Good question, i looked into the logs for the logger and found a permissions denied error for the config. These were the permissions:


/var/lib/pcp/config/pmlogger>ls -laZ
drwxrwxr-x. pcp  pcp  system_u:object_r:pcp_var_lib_t:s0 .
drwxr-xr-x. root root system_u:object_r:pcp_var_lib_t:s0 ..
-rw-r--r--. root root system_u:object_r:tmp_t:s0       config.default
-rw-r--r--. root root system_u:object_r:pcp_var_lib_t:s0 config.pmstat
-rw-r--r--. root root system_u:object_r:pcp_var_lib_t:s0 crontab.docker

I changed the SELinux context of the default config to be the same as the other ones, let's wait and see.

@JGjorgji

This comment has been minimized.

Copy link
Author

commented Aug 30, 2016

Apparently not but i noticed there were errors in the logs, here they are: http://paste.fedoraproject.org/417455/77100147/

@petervo

This comment has been minimized.

Copy link
Contributor

commented Aug 30, 2016

You may want to check the permissions on /var/lib/pcp/config/pmlogconf/tools/cockpit. Since there were selinux issues with the pmlogger files, there might be again here.

@petervo

This comment has been minimized.

Copy link
Contributor

commented Aug 30, 2016

So it looks like for some reason in cockpit we are getting Unknown metric name errors when pminto reports those metrics are available. http://paste.fedoraproject.org/417481/14725785/

There was also another error:

  pmlogger: error creating primary logger socket symbolic link /var/run/pcp/pmlogger.primary.socket: Permission denied

That was fixed by

chown pcp:pcp /var/run/pcp

But the missing metrics problem remains. @mvollmer or @fche any ideas about what might cause this?

@fche

This comment has been minimized.

Copy link

commented Aug 31, 2016

Those libpcp segvs should never happen. The pcp metrics being requested should always be available. Something peculiar is going on. Does this problem occur reproducibly on multiple machines, or is it a one-off? What does /usr/bin/pcp report there? Tried sudo service pmcd restart ?

@JGjorgji

This comment has been minimized.

Copy link
Author

commented Sep 1, 2016

I just tried this on a fresh Centos 7 VM and it's the same result. I did try restarting the pmcd service but it didn't help, both before resolving some of the issues above and after. I did not try any of the fixes mentioned before on the VM, just installing the packages and turning on logging, here are the results from /usr/bin/pcp .

Original physical host:

Performance Co-Pilot configuration on arrakis:

 platform: Linux arrakis 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 UTC 2016 x86_64
 hardware: 2 cpus, 3 disks, 1 node, 15943MB RAM
 timezone: CEST-2
 services: pmcd
     pmcd: Version 3.10.6-1, 7 agents, 1 client
     pmda: root pmcd proc xfs linux mmv jbd2
 pmlogger: primary logger: /var/log/pcp/pmlogger/arrakis/20160901.00.10

Test VM:

Performance Co-Pilot configuration on centos72:

 platform: Linux centos72 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 UTC 2016 x86_64
 hardware: 1 cpu, 1 disk, 1 node, 993MB RAM
 timezone: CEST-2
 services: pmcd
     pmcd: Version 3.10.6-1, 7 agents
     pmda: root pmcd proc xfs linux mmv jbd2
@JGjorgji

This comment has been minimized.

Copy link
Author

commented Sep 5, 2016

Any thoughts on this? Can you try replicating this by installing a fresh Centos 7 wtih only the packages i installed? Maybe it's some configuration provided by the main package?

Any other info you would find helpful?

@fche

This comment has been minimized.

Copy link

commented Sep 5, 2016

Short of reproducing the problem here on a new VM, what comes to mind is that maybe the way cockpit-pcp module installs & initializes its customizations on pcp, it makes it worse somehow. Maybe selinux factors, dunno. If you already have a VM handy, could you try (re)installing pcp only, checking that it is working (e.g. pminfo -f kernel.all.cpu.nice produces results), then install cockpit-pcp, and see if that pminfo still works. If not, boo hoo, but we get something closer to look at. (Also, the /var/log/pcp directory contains log files that may also help diagnose.)

@mvollmer

This comment has been minimized.

Copy link
Member

commented Sep 9, 2016

But the missing metrics problem remains.

It's been some time and I might misremember, but when accessing an archive, only the metrics that are actually stored in the archive are "known". So this error might mean that the archive is empty.

@mvollmer mvollmer self-assigned this Sep 9, 2016

@mvollmer

This comment has been minimized.

Copy link
Member

commented Sep 9, 2016

I had a similar situation on my development VM, where the machinery all seemed to be working okay, but no actual data would be collected.

The reason was that /var/lib/pcp/config/pmlogger/config.default was empty, which meant that pmlogger would write (essentially) empty archives.

Cockpit installs the file /var/lib/pcp/config/pmlogconf/tools/cockpit and calls /usr/share/pcp/pmlogg reload. This is supposed to include the Cockpit metrics in /var/lib/pcp/config/pmlogger/config.default, but this wasn't happening for some reason.

In fact, none of the metrics in /var/lib/pcp/config/pmlogconf/ was included in config.default.

The reason seems to be that pmlogconf considers an empty file as invalid and wont touch it:

# echo >/var/lib/pcp/config/pmlogger/config.default
# pmlogconf -c /var/lib/pcp/config/pmlogger/config.default 
pmlogconf: Error: existing "/var/lib/pcp/config/pmlogger/config.default" is not a pmlogconf control file

Removing config.default makes everything work:

# rm /var/lib/pcp/config/pmlogger/config.default
# /usr/share/pcp/pmlogg reload
# grep cockpit /var/lib/pcp/config/pmlogger/config.default
#+ tools/cockpit:y:default:

I don't know how I ended up with a broken config.default file. Maybe yours is broken, too.

@fche

This comment has been minimized.

Copy link

commented Sep 9, 2016

FWIW, service-pmmgr always deletes and recomputes pmlogger configuration files. The cron-job-based service-pmlogger is less predictable. @natoscott

@fche

This comment has been minimized.

@mvollmer

This comment has been minimized.

Copy link
Member

commented Sep 9, 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1374703

Thanks! A system crash might indeed be the reason how I got an empty config.default file. I am pretty harsh with that VM and force reboot it all the time.

@mvollmer

This comment has been minimized.

Copy link
Member

commented Sep 9, 2016

FWIW, service-pmmgr always deletes and recomputes pmlogger configuration files. The cron-job-based service-pmlogger is less predictable.

I see. How do we decide which one to enable/start?

Right now the UI button enables/starts pmlogger.service. Should we change that to pmmgr.service?

@fche

This comment has been minimized.

Copy link

commented Sep 9, 2016

How do we decide which one to enable/start?

Way back when y'all were first building out the pcp bridge, we talked about it. Not quite sure why we didn't go that way. In principle, it is only:

  • arranging to install the pcp subpackage that includes pmmgr (pcp-manager on RH)
  • dropping any pcp-bridge %spec/etc. code that manually runs pmlogconf or edits files or whatever; just drop the new cockpit pmlogconf file into its place
  • starting service pmmgr instead of service logger
  • switching the pcp bridge to look for the resulting archive under /var/log/pcp/pmmgr/$HOSTNAME instead of /var/log/pcp/pmlogger/$HOSTNAME
@natoscott

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2016

@mvollmer to balance things a little (@fche is pmmgr's author so may be slightly biased :) ...

  • be aware pmmgr is a new daemon that will need to always be running - some people don't like additional daemons if they don't really need them
  • be aware switching to pmmgr means incompatible file formats with existing pmlogger and pmie control files, which might cause upgrade / downgrade headaches for Cockpit users
  • pmlogger service is not really less predictable in practice, there were some years-old, years-since-fixed bugs where it didn't always regenerate configuration files when it should have, but that's not an issue these days. One could compare this to pmmgr's old tendency to cause system OOM, which we believe to be fixed nowadays .... though every now and again there's a suggestion of otherwise.
  • pmmgr handles pmie relatively poorly (compared to direct pmie service use), restarting it inappropriately at times and inadvertently breaking some classes of pmie rule.
  • pmmgr is not aware of modern pmloggers ability to reconnect to pmcd, so stops+starts it needlessly at times, causing more temporary archives than necessary ... which can flow into other open fd-resource bugs that the pmlogger service is not exposed to.

IOW, YMMV - all code has bugs & trade-offs of course. There is no one approach to rule them all.

Also, pmmgr is feature-rich - you may wish to expose other functionality it offers too - host discovery via Avahi, and support for running other helper daemons like pcp2graphite(1), pcp2influxdb(1), etc.

Hence, I'd recommend a dual approach - where existing setups can be preserved & supported, rather than a breaking-switch to pmmgr (perhaps have new UI options for pmmgr, alongside existing pmlogger and pmie services?) - and people can opt-in to the more resource-heavy (i.e. permanent daemon) pmmgr use if they wish.

Not to ignore the original problem, I'll look into fixing up that empty file issue this week too.

@fche

This comment has been minimized.

Copy link

commented Sep 12, 2016

be aware switching to pmmgr means incompatible file formats with existing pmlogger and pmie control files

I don't know what you are referring to. The config files that pmmgr-invoked pmlogger / pmie use are the exact same format that shell-invoked pmlogger / pmie use, because they are the same programs. Their output files are the same format because they are the same programs.

pmlogger service is not really less predictable in practice

Please identify the documentation where it spells out under what conditions pmlogconf is rerun by service-pmlogger's scripts (pmlogger_check ?).

@natoscott

This comment has been minimized.

Copy link
Contributor

commented Sep 12, 2016

| be aware switching to pmmgr means incompatible file formats with existing pmlogger and pmie
| control files
^^^^^^^^^^

I don't know what you are referring to.

Take time, read carefully.

natoscott added a commit to performancecopilot/pcp that referenced this issue Sep 12, 2016

@natoscott

This comment has been minimized.

Copy link
Contributor

commented Sep 12, 2016

@JGjorgji @mvollmer the empty file issue is resolved in PCP now and the fix will be in pcp-3.11.5 onward (3.11.5 is scheduled within a couple of weeks).

commit 8e9f44151e47abacd93dbb75b843996d50458652
Author: Nathan Scott nathans@redhat.com
Date: Mon Sep 12 11:13:05 2016 +1000

pmieconf, pmlogconf: allow empty files as input

Resolves https://github.com/cockpit-project/cockpit/issues/4941
Resolves Red Hat BZ #1374703.
@mvollmer

This comment has been minimized.

Copy link
Member

commented Sep 12, 2016

How do we decide which one to enable/start?

Way back when y'all were first building out the pcp bridge, we talked about it. Not quite sure why we
didn't go that way.

Yeah, I remember. I think at the time pmmgr was still pretty new, maybe not even ready yet, and I made a note to look at it more closely once I got the basics working. Of course, I never did that... mostly because pmlogger.service was working well enough after all.

I really don't want to decide between pmlogger and pmmgr, given that you guys don't agree. Can we hide that choice from Cockpit and it's users, while still making it accessible for experts?

Maybe via a data-logging.target systemd unit, which contains pmlogger.service by default, but can be cleanly changed to contain pmmgr.service instead? I don't known enough about target units to know whether this would work out, unfortunately.

@mvollmer

This comment has been minimized.

Copy link
Member

commented Sep 12, 2016

Also, pmmgr is feature-rich - you may wish to expose other functionality it offers too - host discovery via Avahi, and support for running other helper daemons like pcp2graphite(1), pcp2influxdb(1), etc.

A powerful, PCP specific UI would be awesome. But that's a different topic altogether and we would need to look at ManageIQ etc as well, I guess. We don't want to compete against ourselves, really. :-)

@natoscott

This comment has been minimized.

Copy link
Contributor

commented Sep 12, 2016

I really don't want to decide between pmlogger and pmmgr, given that you guys don't agree

+1 - probably best to consider them completely orthogonal - the direct pmlogger/pmie use will always be available with PCP, and pmmgr is there for people who need the additional extras (and compromises to achive those) that it offers. If Cockpit could offer up a rich UI for optional pmmgr use someday, I'm all for that.

A powerful, PCP specific UI would be awesome

BTW, we do get requests for features sometimes that Cockpit could probably help with - e.g. the Red Hat customer support folk would like an easy way to query and set the default pmlogger recording interval for a given site. Being able to see which pmie(1) rules are enabled, and enable/disable individual rules via a clean UI is something we get occasionally prodded for too.

@mvollmer

This comment has been minimized.

Copy link
Member

commented Sep 12, 2016

I really don't want to decide between pmlogger and pmmgr, given that you guys don't agree

+1 - probably best to consider them completely orthogonal - the direct pmlogger/pmie use will always be available with PCP, and pmmgr is there for people who need the additional extras (and compromises to achive those) that it offers. If Cockpit could offer up a rich UI for optional pmmgr use someday, I'm all for that.

The thing is, none of the additional extras of pmmgr are visible in the Cockpit UI, so we would allow people to make an obscure choice without any visible effect, no?

BTW, we do get requests for features sometimes that Cockpit could probably help with - e.g. the Red Hat customer support folk would like an easy way to query and set the default pmlogger recording interval for a given site. Being able to see which pmie(1) rules are enabled, and enable/disable individual rules via a clean UI is something we get occasionally prodded for too.

Interesting. Should we start writing down some use cases and make some mock ups? What you mention doesn't seem difficult, we only have to find nice place in the UI for it...

@JGjorgji

This comment has been minimized.

Copy link
Author

commented Sep 12, 2016

So i reinstalled this and the issue with the configuration file is missing (still needed the permission fix on /var/lib/pcp/config/pmlogger/config.default ). Performance metrics are still no shown on the graphs and cockpit-pcp segfaults. Though now the errors about missing metrics are gone.

@natoscott

This comment has been minimized.

Copy link
Contributor

commented Sep 13, 2016

| Interesting. Should we start writing down some use cases and make some mock ups?

I'll steer some of the interested customer support folk towards this issue, see if that want to chime in or start a discussion elsewhere directly with you. Thanks @mvollmer

@mvollmer

This comment has been minimized.

Copy link
Member

commented Sep 13, 2016

Performance metrics are still no shown on the graphs and cockpit-pcp segfaults.

Ouch. Is the crash repeatable? E.g., does it happen everytime you log into Cockpit?

Is there any chance that you can get a stack trace of the crash? Can you make the core dump available to us somewhow?

@JGjorgji

This comment has been minimized.

Copy link
Author

commented Sep 15, 2016

It crashes every time but it's after i select a longer time period to review (say 1 week).

For which service would coredumps need to be enabled? The main cockpit one?

@stefwalter stefwalter removed the question label Nov 30, 2016

@JGjorgji

This comment has been minimized.

Copy link
Author

commented Oct 15, 2017

Tried this again today and it's still happening, same server cockpit 148, pcp 3.11.8.

@mvollmer

This comment has been minimized.

Copy link
Member

commented Jan 11, 2019

Some crashes in cockpit-pcp have been fixed, I think. Please reopen if it still crashes for you.

@mvollmer mvollmer closed this Jan 11, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.