Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share queue stats among all apps on a NIC #1148

Merged
merged 9 commits into from
Jul 12, 2018

Conversation

takikawa
Copy link
Member

@takikawa takikawa commented Jul 6, 2018

This PR revises queue stats handling by providing all the queue stats in each process' pci shm directory. It does this by symlinking all of them to a "global" shm sub-directory in the /intel-mp directory.

This doesn't quite solve the snabb top issues yet because I think symlinks are not yet followed by some layer that snabb top uses. Once snabb top follows the symlinks it should show all the counters properly. I'm looking into fixing it, but if anyone has any ideas on where to find the cause (maybe at the inotify level?) it would be helpful.

Also, this exposes the rxcounter and txcounter values in an extra counter, which could be used to filter the queue stats shown (since there may be non-zero queue counters of no interest for a given process). This part is WIP but could be useful if snabb top actually looks at it.

Instead of storing intel_mp stats in a per-process stats
directory, store it in the global intel_mp directory that's
indexed by PCI address (the same place where pool number
allocations are tracked).

This lets intel_mp apps access counters updated by the main
process even if they live in different processes. It also makes
it easier to track queue counters in `snabb top`

Also exposes the rxcounter/txcounter assignment in the per-process
shm area.
The run_stats flag should be used to specify which app
is in charge of syncing stats. With commit
59f2f8a it is unnecessary
to set this for the lwAFTR.
@dpino
Copy link
Member

dpino commented Jul 6, 2018

Nice work. It looks good to me so far.

I also think snabb top issues are at inotify level. I would need to look into it to actually understand what's the issue.

@dpino
Copy link
Member

dpino commented Jul 6, 2018

It seems like inotify doesn't report events on symbolic links. Here's an example using user-space tool inotifywait:

$ mkdir temp
$ inotifywait -m temp          
Setting up watches.                         
Watches established.
temp $ touch foo
temp/ OPEN foo
temp/ ATTRIB foo
temp/ CLOSE_WRITE,CLOSE foo

But if the observed file is a symbolic link to a directory

$ ln -s temp tmp
$ inotifywait -m tmp
Setting up watches.
Watches established.
temp $ touch bar

Nothing happens.

Good thing is that when monitoring a directory, creation of symbolic links is reported.

$ inotifywait -m temp          
Setting up watches.                         
Watches established.
temp $ ln -s bar qax
temp/ CREATE qax

@dpino
Copy link
Member

dpino commented Jul 6, 2018

I look into the issue. I don't know yet what's the solution but here are some ideas.

I think at a high level view what's necessary to do is when a create event is received, we check whether the filename is a symbolic link or not. If it's a symbolic link, actually we don't monitor this file (because we won't get any event) but its real path.

To check whether a file is a symbolic link:

local function is_symlink (name)
   local stat = S.lstat(name)
   return stat and stat.islnk
end

To convert a symbolic link to a real path there's a C runtime function called realpath. It seems this function is not part of ljsyscall, so it would be necessary to define it with ffi.cdef.

local ffi = require('ffi')
local C = ffi.C

ffi.cdef[[
char *realpath(const char *path, char *resolved_path);
]]

local function resolve_symlink (name)
   realpath = ffi.new("char[?]", 1024)
   C.realpath(name, realpath)
   return realpath
end

In addition, when converting a symbolic link like /var/run/... we're going to obtain a path link /run/.. because /var/run is actually a symbolic link to /run.

This is just simply an idea, maybe there's another way.

@takikawa
Copy link
Member Author

takikawa commented Jul 6, 2018

Update: I think I've solved the inotify issue. It was just a matter of changing how the snabb library defines what a directory is for this case. The queue stats now show up again.

I'll see if I can make it filter based on the actually enabled counters.

This makes `snabb top` work on symlinked pci directories
for intel_mp stats
This filters out queue counters that are unused by a particular
process in its tree view, even if the NIC it's using has traffic
on those queues from another process

Also adjusts some tests to use rx/txcounter in order to show
stats separately in `snabb top` while tests run
@takikawa
Copy link
Member Author

I pushed another commit that filters out irrelevant queue counters in snabb top. This change was more complicated than I thought and to be honest I'm not that happy with it, but it works and passes the intel_mp tests. I think it could use more testing though.

The complication comes from how the rxcounters / txcounters counters work. They're counters that store a bitmap indicating which queue counters are used by a process. Since this is pid-specific, it shouldn't go in the shared shm folder. This means we can't symlink the whole folder, and have to symlink each individual shared counter instead (otherwise the writes would go to the same counter).

I guess an alternative approach would be to add another folder inside the shared shm folder that's indexed by pid, but then the folder structure looks a little silly 8251/pci/02:00.0/8251/rxcounters.counter with the repeated pid. Or the counter could be stored in a path entirely separate from the stats counters, like 8251/intel_mp_config/02:00.0/rxcounters.counter. I'm starting to think the latter approach with a new folder might be better now, so I will try it too.

Use a simpler method to figure out which queue stats to filter
out. Revert change to symlink each counter for intel_mp and
just symlink the whole folder again. Then add a counter in the
per-app shm that just stores the rxcounter/txcounter.

Snabb top then looks through all the apps in a process and filters
out queue counters that don't match any app. This is not totally precise
in the case that a process uses multiple NICs with counters set
(it may show more counters than necessary).
@takikawa takikawa changed the title [WIP] share queue stats among all apps on a NIC Share queue stats among all apps on a NIC Jul 10, 2018
@dpino
Copy link
Member

dpino commented Jul 11, 2018

Related with this PR takikawa#4

takikawa and others added 3 commits July 12, 2018 00:00
Instead of changing inotify to crawl links too, we will treat
the pci link specially in snabb top instead. This also reverts
commit 0d641de which made the
inotify change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants