Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle RX buffer allocation failure; TX queue full as debug message #155

Conversation

joft-mle
Copy link
Contributor

@joft-mle joft-mle commented Jul 5, 2023

Hi Alex,

NOTE: As discussed, this is a pull request is a replacement for this pull request, which was against your "bleeding" edge repository.

In this pull request with a small number of patches, we introduce

  • the possibility to let RX buffer allocation fail and output a hint for the user on how to proceed
  • turning the "TX ring full" message into a debug message, which can be made visible on-demand (given dynamic debug support in the kernel)

…buffers

This basically allows the "refilling" of RX buffers to fail. A fail to allocate
RX buffers can occur on bringing a network device up or when changing the number
of RX channels at run-time through `ethtool -L <ndev> rx <num>`.

Signed-off-by: Joachim Foerster <joachim.foerster@missinglinkelectronics.com>
…able DEBUG is provided

Signed-off-by: Joachim Foerster <joachim.foerster@missinglinkelectronics.com>
…d debug message

To not overwhelm any logging daemon. Full TX queues can be quite common
depending on a system's context and capabilities. Mostly all the details are not
required anyway in such cases.

To enable such debug messages and have their text turn up in dmesg and logging
daemons, the mqnic module has to be compiled with CPP symbol DEBUG being (`make
DEBUG=yes`) defined, or the kernel in use has to support "Dynamic Debug" [1]. In
the latter case all such debug messages in module mqnic can be enabled with:

echo 'module mqnic +pflmt' | sudo tee /sys/kernel/debug/dynamic_debug/control

To enable just this specific messages, based on its (partial) format:

echo 'format "TX ring %d full" +pflmt' | sudo tee /sys/kernel/debug/dynamic_debug/control

[1] https://www.kernel.org/doc/html/latest/admin-guide/dynamic-debug-howto.html

Signed-off-by: Joachim Foerster <joachim.foerster@missinglinkelectronics.com>
@joft-mle
Copy link
Contributor Author

joft-mle commented Jul 5, 2023

Regarding your previous comments (comment 1, comment 2) regarding printk_ratelimited() and netdev_*_once():

For netdev_*_once() we would essentially introduce a custom macro and we also need some background thread to reset the __print_once boolean. "Feels" a little bit odd to me.

If not netdev_dbg(), I would tend towards a printk_ratelimited() based solution. Unfortunately there is no such wrapper in the netdev_* realm - yet. In fact, we went for this at the beginning. However for our "taste", there were still "too many" messages. Sure, the reason for it is the fact, that we do traffic shaping per queue in our custom solution, which can result in often-full queues in even more cases. That's why we settled on netdev_dbg().

@alexforencich
Copy link
Member

That's a good point. I also took a look at some of the other NIC drivers, and it seems like generally nothing is printed in this case. So, another option is to do away with the print entirely. But, maybe it actually makes sense to more fully utilize the debug message filtering capability - apply that to even more messages, like the timestamp-related ones, and perhaps even add some additional debug messages that can be turned on when necessary. This is not necessarily something we need to take care of in this PR, but could be something to consider for the longer term.

@alexforencich
Copy link
Member

After playing around with dynamic debug a bit, it's actually quite nice to use. I think there are probably a few other prints that should be switched from info to debug, and since the overhead of disabled debug prints is quite low, it might also make sense to add a few more here and there.

@alexforencich alexforencich merged commit 1d72981 into corundum:master Jul 7, 2023
10 checks passed
@joft-mle joft-mle deleted the mle/jf/buffer-error-handling branch July 7, 2023 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants