cake: tc shows number of active flows within class #1

ldir-EDB0 · 2015-08-11T11:20:37Z

Hi Dave,

A small tweak with tc to show the number of active flows from cake. Update to pkt_sched.h to match sch_cake's idea of the structure.

Kevin

As Stephen Hemminger mentioned on the last submission the new_json_obj function is always called with fp == stdout, so right now, there's no need of this extra argument. The background for the rework is the following: The ip monitor didn't call `new_json_obj` (even for in non json context), so the static FILE* _fp variable wasn't initialized, thus raising a SIGSEGV in ipaddress.c. This patch should fix this issue for good, new paths won't have to call `new_json_obj`. How to reproduce: $ ip -t mon label link (gdb) bt .#0 _IO_vfprintf_internal (s=s@entry=0x0, format=format@entry=0x45460d “%d: “, ap=ap@entry=0x7fffffff7f18) at vfprintf.c:1278 .#1 0x0000000000451310 in color_fprintf (fp=0x0, attr=<optimized out>, fmt=0x45460d “%d: “) at color.c:108 .#2 0x000000000044a856 in print_color_int (t=t@entry=PRINT_ANY, color=color@entry=4294967295, key=key@entry=0x4545fc “ifindex”, fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132 .#3 0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d “%d: “, key=0x4545fc “ifindex”, t=PRINT_ANY) at ip_common.h:189 .#4 print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107 .#5 0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89 .#6 0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at libnetlink.c:761 .#7 0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310 .#8 0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 “mon”, argc=3, argv=0x7fffffffe588) at ip.c:116 .#9 0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311 Fixes: 6377572 ("ip: ip_print: add new API to print JSON or regular format output") Reported-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>

Ido Schimmel says: ==================== From: Ido Schimmel <idosch@mellanox.com> This patchset adds devlink-trap support in iproute2. Patch dtaht#1 increases the number of options devlink can handle. Patches dtaht#2-dtaht#3 gradually add support for all devlink-trap commands. Patch dtaht#4 adds a man page for devlink-trap. See individual commit messages for example usage and output. Changes in v2: * Remove report option and monitor command since monitoring is done using drop monitor ==================== Signed-off-by: David Ahern <dsahern@gmail.com>

Petr Machata says: ==================== A new Qdisc, "ETS", has been accepted into Linux at kernel commit 6bff00170277 ("Merge branch 'ETS-qdisc'"). Add iproute2 support for this Qdisc. Patch dtaht#1, changes libnetlink to admit NLA_F_NESTED in nested attributes. Patch dtaht#2 then adds ETS support as such. Examples (taken from the kernel patchset): - Add a Qdisc with 6 bands, 3 strict and 3 ETS with 45%-30%-25% weights: # tc qdisc add dev swp1 root handle 1: \ ets strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5 # tc qdisc sh dev swp1 qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5 - Tweak quantum of one of the classes of the previous Qdisc: # tc class ch dev swp1 classid 1:4 ets quantum 1000 # tc qdisc sh dev swp1 qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 1000 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5 # tc class ch dev swp1 classid 1:3 ets quantum 1000 Error: Strict bands do not have a configurable quantum. - Purely strict Qdisc with 1:1 mapping between priorities and TCs: # tc qdisc add dev swp1 root handle 1: \ ets strict 8 priomap 7 6 5 4 3 2 1 0 # tc qdisc sh dev swp1 qdisc ets 1: root refcnt 2 bands 8 strict 8 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7 - Use "bands" to specify number of bands explicitly. Underspecified bands are implicitly ETS and their quantum is taken from MTU. The following thus gives each band the same weight: # tc qdisc add dev swp1 root handle 1: \ ets bands 8 priomap 7 6 5 4 3 2 1 0 # tc qdisc sh dev swp1 qdisc ets 1: root refcnt 2 bands 8 quanta 1514 1514 1514 1514 1514 1514 1514 1514 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7 ==================== Signed-off-by: David Ahern <dsahern@gmail.com>

Petr Machata says: ==================== To allow configuring user-defined actions as a result of inner workings of a qdisc, a concept of qevents was recently introduced to the kernel. Qevents are attach points for TC blocks, where filters can be put that are executed as the packet hits well-defined points in the qdisc algorithms. The attached blocks can be shared, in a manner similar to clsact ingress and egress blocks, arbitrary classifiers with arbitrary actions can be put on them, etc. For example: # tc qdisc add dev eth0 root handle 1: \ red limit 500K avpkt 1K qevent early_drop block 10 # tc filter add block 10 \ matchall action mirred egress mirror dev eth1 This patch set introduces the corresponding iproute2 support. Patch dtaht#1 adds the new netlink attribute enumerators. Patch dtaht#2 adds a set of helpers to implement qevents, and dtaht#3 adds a generic documentation to tc.8. Patch dtaht#4 then adds two new qevents to the RED qdisc: mark and early_drop. ==================== Signed-off-by: David Ahern <dsahern@kernel.org>

Petr Machata says: ==================== When a list of filters at a given block is requested, tc first validates that the block exists before doing the filter query. Currently the validation routine checks ingress and egress blocks. But now that blocks can be bound to qevents as well, qevent blocks should be looked for as well: # ip link add up type dummy # tc qdisc add dev dummy1 root handle 1: \ red min 30000 max 60000 avpkt 1000 qevent early_drop block 100 # tc filter add block 100 pref 1234 handle 102 matchall action drop # tc filter show block 100 Cannot find block "100" This patchset fixes this issue: # tc filter show block 100 filter protocol all pref 1234 matchall chain 0 filter protocol all pref 1234 matchall chain 0 handle 0x66 not_in_hw action order 1: gact action drop random type none pass val 0 index 2 ref 1 bind 1 In patch dtaht#1, the helpers and necessary infrastructure is introduced, including a new qdisc_util callback that implements sniffing out bound blocks in a given qdisc. In patch dtaht#2, RED implements the new callback. v3: - Patch dtaht#1: - Do not pass &ctx->found directly to has_block. Do it through a helper variable, so that the callee does not overwrite the result already stored in ctx->found. v2: - Patch dtaht#1: - In tc_qdisc_block_exists_cb(), do not initialize 'q'. - Propagate upwards errors from q->has_block. ==================== Signed-off-by: David Ahern <dsahern@kernel.org>

Petr Machata says: ==================== The Linux DCB interface allows configuration of a broad range of hardware-specific attributes, such as TC scheduling, flow control, per-port buffer configuration, TC rate, etc. Currently a common libre tool for configuration of DCB is OpenLLDP. This suite contains a daemon that uses Linux DCB interface to configure HW according to the DCB TLVs exchanged over an interface. The daemon can also be controlled by a client, through which the user can adjust and view the configuration. The downside of using OpenLLDP is that it is somewhat heavyweight and difficult to use in scripts, and does not support extensions such as buffer and rate commands. For access to many HW features, one would be perfectly fine with a fire-and-forget tool along the lines of "ip" or "tc". For scripting in particular, this would be ideal. This author is aware of one such tool, mlnx_qos from Mellanox OFED scripts collection[1]. The downside here is that the tool is very verbose, the command line language is awkward to use, it is not packaged in Linux distros, and generally has the appearance of a very vendor-specific tool, despite not being one. This patchset addresses the above issues by providing a seed of a clean, well-documented, easily usable, extensible fire-and-forget tool for DCB configuration: # dcb ets set dev eni1np1 \ tc-tsa all:strict 0:ets 1:ets 2:ets \ tc-bw all:0 0:33 1:33 2:34 # dcb ets show dev eni1np1 tc-tsa tc-bw tc-tsa 0:ets 1:ets 2:ets 3:strict 4:strict 5:strict 6:strict 7:strict tc-bw 0:33 1:33 2:34 3:0 4:0 5:0 6:0 7:0 # dcb ets set dev eni1np1 tc-bw 1:30 2:37 # dcb -j ets show dev eni1np1 | jq '.tc_bw[2]' 37 The patchset proceeds as follows: - Many tools in iproute2 have an option to work in batch mode, where the commands to run are given in a file. The code to handle batching is largely the same independent of the tool in question. In patch dtaht#1, add a helper to handle the batching, and migrate individual tools to use it. - A number of configuration options come in a form of an on-off switch. This in turn can be considered a special case of parsing one of a given set of strings. In patch dtaht#2, extract helpers to parse one of a number of strings, on top of which build an on-off parser. Currently each tool open-codes the logic to parse the on-off toggle. A future patch set will migrate instances of this code over to the new helpers. - The on/off toggles from previous list item sometimes need to be dumped. While in the FP output, one typically wishes to maintain consistency with the command line and show actual strings, "on" and "off", in JSON output one would rather use booleans. This logic is somewhat annoying to have to open-code time and again. Therefore in patch dtaht#3, add a helper to do just that. - The DCB tool is built on top of libmnl. Several routines will be basically the same in DCB as they are currently in devlink. In patches dtaht#4-dtaht#6, extract them to a new module, mnl_utils, for easy reuse. - Much of DCB is built around arrays. A syntax similar to the iplink_vlan's ingress-qos-map / egress-qos-map is very handy for describing changes done to such arrays. Therefore in patch dtaht#7, extract a helper, parse_mapping(), which manages parsing of key-value arrays. In patch dtaht#8, fix a buglet in the helper, and in patch dtaht#9, extend it to allow setting of all array elements in one go. - In patch dtaht#10, add a skeleton of "dcb", which contains common helpers and dispatches to subtools for handling of individual objects. The skeleton is empty as of this patch. In patch dtaht#11, add "dcb_ets", a module for handling of specifically DCB ETS objects. The intention is to gradually add handlers for at least PFC, APP, peer configuration, buffers and rates. [1] https://github.com/Mellanox/mlnx-tools/tree/master/ofed_scripts ==================== Signed-off-by: David Ahern <dsahern@gmail.com>

Ido Schimmel says: ==================== From: Ido Schimmel <idosch@nvidia.com> Patch dtaht#1 prints the recently added 'RTNH_F_TRAP' flag. Patch dtaht#2 makes sure that nexthop flags are always printed for nexthop objects. Even when the nexthop does not have a device, such as a blackhole nexthop or a group. Example output with netdevsim: $ ip nexthop id 1 via 192.0.2.2 dev eth0 scope link trap id 2 blackhole trap id 3 group 2 trap Example output with mlxsw: $ ip nexthop id 1 via 192.0.2.2 dev swp3 scope link offload id 2 blackhole offload id 3 group 2 offload Tested with fib_nexthops.sh that uses "ip nexthop" output: Tests passed: 164 Tests failed: 0 ==================== Signed-off-by: David Ahern <dsahern@gmail.com>

Petr Machata says: ================== The DCB tool will have commands that deal with buffer sizes and traffic rates. TC is another tool that has a number of such commands, and functions to support them: get_size(), get_rate/64(), s/print_size() and s/print_rate(). In this patchset, these functions are moved from TC to lib/ for possible reuse and modernized. s/print_rate() has a hidden parameter of a global variable use_iec, which made the conversion non-trivial. The parameter was made explicit, print_rate() converted to a mostly json_print-like function, and sprint_rate() retired in favor of the new print_rate. Patches dtaht#1 and dtaht#2 deal with this. The intention was to treat s/print_size() similarly, but unfortunately two use cases of sprint_size() cannot be converted to a json_print-like print_size(), and the function sprint_size() had to remain as a discouraged backdoor to print_size(). This is done in patch dtaht#3. Patch dtaht#4 then improves the code of sprint_size() a little bit. Patch dtaht#5 fixes a buglet in formatting small rates in IEC mode. Patches dtaht#6 and dtaht#7 handle a routine movement of, respectively, get_rate/64() and get_size() from tc to lib. This patchset does not actually add any new uses of these functions. A follow-up patchset will add subtools for management of DCB buffer and DCB maxrate objects that will make use of them. ==================== Signed-off-by: David Ahern <dsahern@gmail.com>

Petr Machata says: ==================== Add support to the dcb tool for the following three DCB objects: - PFC, for "Priority-based Flow Control", allows configuration of priority lossiness, and related toggles. - DCBNL buffer interfaces are an extension to the 802.1q DCB interfaces and allow configuration of port headroom buffers. - DCBNL maxrate interfaces are an extension to the 802.1q DCB interfaces and allow configuration of rate with which traffic in a given traffic class is sent. Patches dtaht#1-dtaht#4 fix small issues in the current DCB code and man pages. Patch dtaht#5 adds new helpers to the DCB dispatcher. Patches dtaht#6 and dtaht#7 add support for command line arguments -s and -i. These enable, respectively, display of statistical counters, and ISO/IEC mode of rate units. Patches dtaht#8-dtaht#10 add the subtools themselves and their man pages. ==================== Signed-off-by: David Ahern <dsahern@gmail.com>

Petr Machata says: ==================== Add support to the dcb tool for the following two DCB objects: - APP, which allows configuration of traffic prioritization rules based on several possible packet headers. - DCBX, which is a 1-byte bitfield of flags that configure whether the DCBX protocol is implemented in the device or in the host, and which version of the protocol should be used. Patch dtaht#1 adds a new helper for finding a name of a given dsfield value. This is useful for APP DSCP-to-priority rules, which can use human-readable DSCP names. Patches dtaht#2, dtaht#3 and dtaht#4 extend existing interfaces for, respectively, parsing of the X:Y mappings, for setting a DCB object, and for getting a DCB object. In patch dtaht#5, support for the command line argument -N / --Numeric is added. The APP tool later uses it to decide whether to format DSCP values as human-readable strings or as plain numbers. Patches dtaht#6 and dtaht#7 add the subtools themselves and their man pages. v2: - Two patches dropped and sent to iproute2 branch as "dcb: Fixes". This patch set now depends on that one. - Patch dtaht#5: - Make it -N / --Numeric instead of -n / --no-nice-names - Rename the flag from no_nice_names to numeric as well - Patch dtaht#6: - Adjust to s/no_nice_names/numeric/ from another patch. ==================== Signed-off-by: David Ahern <dsahern@kernel.org>

Petr Machata says: ==================== Support for resilient next-hop groups was recently accepted to Linux kernel[1]. Resilient next-hop groups add a layer of indirection between the SKB hash and the next hop. Thus the hash is used to reference a hash table bucket, which is then used to reference a particular next hop. This allows the system more flexibility when assigning SKB hash space to next hops. Previously, each next hop had to be assigned a continuous range of SKB hash space. With a hash table as an intermediate layer, it is possible to reassign next hops with a hash table bucket granularity. In turn, this mends issues with traffic flow redirection resulting from next hop removal or adjustments in next-hop weights. In this patch set, introduce support for resilient next-hop groups to iproute2. - Patch dtaht#1 brings include/uapi/linux/nexthop.h and /rtnetlink.h up to date. - Patches dtaht#2 and dtaht#3 add new helpers that will be useful later. - Patch dtaht#4 extends the ip/nexthop sub-tool to accept group type as a command line argument, and to dispatch based on the specified type. - Patch dtaht#5 adds the support for resilient next-hop groups. - Patch dtaht#6 adds the support for resilient next-hop group bucket interface. To illustrate the usage, consider the following commands: # ip nexthop add id 1 via 192.0.2.2 dev dummy1 # ip nexthop add id 2 via 192.0.2.3 dev dummy1 # ip nexthop add id 10 group 1/2 type resilient \ buckets 8 idle_timer 60 unbalanced_timer 300 The last command creates a resilient next-hop group. It will have 8 buckets, each bucket will be considered idle when no traffic hits it for at least 60 seconds, and if the table remains out of balance for 300 seconds, it will be forcefully brought into balance. And this is how the next-hop group bucket interface looks: # ip nexthop bucket show id 10 id 10 index 0 idle_time 5.59 nhid 1 id 10 index 1 idle_time 5.59 nhid 1 id 10 index 2 idle_time 8.74 nhid 2 id 10 index 3 idle_time 8.74 nhid 2 id 10 index 4 idle_time 8.74 nhid 1 id 10 index 5 idle_time 8.74 nhid 1 id 10 index 6 idle_time 8.74 nhid 1 id 10 index 7 idle_time 8.74 nhid 1 [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=2a0186a37700b0d5b8cc40be202a62af44f02fa2 ==================== Signed-off-by: David Ahern <dsahern@kernel.org>

cake: tc shows number of active flows within class

e5e1663

ldir-EDB0 closed this Sep 25, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cake: tc shows number of active flows within class #1

cake: tc shows number of active flows within class #1

ldir-EDB0 commented Aug 11, 2015

cake: tc shows number of active flows within class #1

cake: tc shows number of active flows within class #1

Conversation

ldir-EDB0 commented Aug 11, 2015