Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix intel_mp RRD counter collection #1178

Closed
wingo opened this issue Sep 11, 2018 · 3 comments
Closed

Fix intel_mp RRD counter collection #1178

wingo opened this issue Sep 11, 2018 · 3 comments

Comments

@wingo
Copy link

wingo commented Sep 11, 2018

When we made the 2018.06.01 release, we had some issues with intel_mp counters that had just been resolved prior to the release. Unfortunately that refactoring broke RRD recording for intel_mp counters.

The fundamental issue is that the ptree manager uses inotify (or periodic polling) to see what counters are present for its set of workers. However the symlink mechanism meant that the intel counters are no longer under the workers, and instead in /var/run/snabb/intel-mp. As that directory wasn't being inotified, we didn't see its counters, and thusly the ptree manager failed to create RRD files.

Now, we could monitor /var/run/snabb/intel-mp. However that has a problem, in that it could be there are other instances there; we only want to monitor the general statistics and the queue-specific statistics of the stats queue for the ptree workers.

I propose to fix this by making the symlink tree a bit deeper -- instead of symlinking directories, we only symlink files (in practice, .counter files). If we need a directory, we create it under the app instead of symlinking. That way inotify will see things automatically.

This will need some refactors in "snabb top" as well.

@eugeneia
Copy link
Collaborator

eugeneia commented Sep 11, 2018

Not a huge fan of /var/run/snabb/intel-mp, why is this not under .../<pid>/group?

@wingo
Copy link
Author

wingo commented Sep 11, 2018

@eugeneia The issue is that you can have multiple independent users of, say, PCI device 02:00.0 -- via RSS or VMDq -- but only one of those users should collect statistics from the card. So the statistics are a kind of shared resource. But that sharing doesn't necessarily correspond to the group mechanism, so instead we drop it into the intel-mp directory.

We could rename /var/run/snabb/intel-mp to /var/run/snabb/pci, if that's more useful. WDYT?

@takikawa
Copy link
Member

takikawa commented Sep 11, 2018

Symlinking the individual counters sounds fine to me. I tried this at one point in PR #1148 in this commit (373d195) which I subsequently undid. I think I only undid the change because it was more code and I thought it was unnecessary (and hadn't realized it broke the RRD recording of course). So I think the approach should work and having RRD recording back sounds like a win!

Re: the intel-mp shm folder, it's used for a few other pieces of state related to VMDq too in the intel-mp driver but having a different name sounds fine and shouldn't be a problem for the driver (presumably the driver can still assume no other drivers will touch the contents of its pci folder while running).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants