Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat]: network interface hardware statistics #17015

Open
Forza-tng opened this issue Feb 15, 2024 · 1 comment
Open

[Feat]: network interface hardware statistics #17015

Forza-tng opened this issue Feb 15, 2024 · 1 comment
Labels
feature request New features needs triage Issues which need to be manually labelled

Comments

@Forza-tng
Copy link

Problem

Hi, I think it would be valuable to be able to monitor the hardware statistics of network interfaces.

This would allow you to see things like queue overruns, checksum errors and other things happening before packets reach the Linux network stack.

I have had a case where the hw queue was too short and the nic dropped ethernet packets. Using ethtool i could see this and increase the queue length.

Description

These can be viewed using ethtool -S eth0.

# ethtool -S eth0
NIC statistics:
     rx_queue_0_packets: 77805
     rx_queue_0_bytes: 12909598
     rx_queue_0_drops: 0
     rx_queue_0_xdp_packets: 0
     rx_queue_0_xdp_tx: 0
     rx_queue_0_xdp_redirects: 0
     rx_queue_0_xdp_drops: 0
     rx_queue_0_kicks: 2
     rx_queue_1_packets: 341999
     rx_queue_1_bytes: 34136159
     rx_queue_1_drops: 0
     rx_queue_1_xdp_packets: 0
     rx_queue_1_xdp_tx: 0
     rx_queue_1_xdp_redirects: 0
     rx_queue_1_xdp_drops: 0
     rx_queue_1_kicks: 6
     rx_queue_2_packets: 24064
     rx_queue_2_bytes: 2173385
     rx_queue_2_drops: 0
     rx_queue_2_xdp_packets: 0
     rx_queue_2_xdp_tx: 0
     rx_queue_2_xdp_redirects: 0
     rx_queue_2_xdp_drops: 0
     rx_queue_2_kicks: 1
     rx_queue_3_packets: 156304
     rx_queue_3_bytes: 10067930
     rx_queue_3_drops: 0
     rx_queue_3_xdp_packets: 0
     rx_queue_3_xdp_tx: 0
     rx_queue_3_xdp_redirects: 0
     rx_queue_3_xdp_drops: 0
     rx_queue_3_kicks: 3
     tx_queue_0_packets: 910
     tx_queue_0_bytes: 86309
     tx_queue_0_xdp_tx: 0
     tx_queue_0_xdp_tx_drops: 0
     tx_queue_0_kicks: 909
     tx_queue_0_tx_timeouts: 0
     tx_queue_1_packets: 189926
     tx_queue_1_bytes: 12405502
     tx_queue_1_xdp_tx: 0
     tx_queue_1_xdp_tx_drops: 0
     tx_queue_1_kicks: 189926
     tx_queue_1_tx_timeouts: 0
     tx_queue_2_packets: 111
     tx_queue_2_bytes: 9099
     tx_queue_2_xdp_tx: 0
     tx_queue_2_xdp_tx_drops: 0
     tx_queue_2_kicks: 111
     tx_queue_2_tx_timeouts: 0
     tx_queue_3_packets: 4726
     tx_queue_3_bytes: 208283
     tx_queue_3_xdp_tx: 0
     tx_queue_3_xdp_tx_drops: 0
     tx_queue_3_kicks: 4725
     tx_queue_3_tx_timeouts: 0
# ethtool -S internal0
NIC statistics:
     rx_packets: 11623715
     tx_packets: 21833984
     rx_bytes: 6100699638
     tx_bytes: 26428740420
     rx_broadcast: 17955
     tx_broadcast: 259583
     rx_multicast: 12863
     tx_multicast: 133802
     rx_errors: 0
     tx_errors: 0
     tx_dropped: 0
     multicast: 12863
     collisions: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     rx_no_buffer_count: 0
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 0
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 0
     tx_restart_queue: 0
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 893032
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 1
     rx_flow_control_xoff: 1
     tx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_csum_offload_good: 11577469
     rx_csum_offload_errors: 0
     rx_header_split: 0
     alloc_rx_buff_failed: 0
     tx_smbus: 0
     rx_smbus: 0
     dropped_smbus: 0
     rx_dma_failed: 0
     tx_dma_failed: 0
     rx_hwtstamp_cleared: 0
     uncorr_ecc_errors: 0
     corr_ecc_errors: 0
     tx_hwtstamp_timeouts: 0
     tx_hwtstamp_skipped: 0
# ethtool -g internal0
Ring parameters for internal0:
Pre-set maximums:
RX:                     4096
RX Mini:                n/a
RX Jumbo:               n/a
TX:                     4096
TX push buff len:       n/a
Current hardware settings:
RX:                     4096
RX Mini:                n/a
RX Jumbo:               n/a
TX:                     4096
RX Buf Len:             n/a
CQE Size:               n/a
TX Push:                off
RX Push:                off
TX push buff len:       n/a
TCP data split:         n/a

Importance

nice to have

Value proposition

Being able to monitor hardware queues and statistics can help find otherwise hard to find problems with the network. Especially advanced/high endnetwork cards have multiple offloading and other features that could be problematic if not monitored closely.

Proposed implementation

As far as I know it is only possible to monitor these hw statistics using ethtool, but there is probably an interface that could be used directly.

@Forza-tng Forza-tng added feature request New features needs triage Issues which need to be manually labelled labels Feb 15, 2024
@k0ste
Copy link
Contributor

k0ste commented Feb 18, 2024

There is nice and complex feat

As far as I know it is only possible to monitor these hw statistics using ethtool, but there is probably an interface that could be used directly.

The kernel interfaces is ethtool & ethtool_netlink

Have there been any attempts to add this to netdata? Yes, for example #14674


The implementation of fields from the driver requires a special approach to normalize the metrics. That is, what you presented for Mellanox will not work for Intel. And it won't work for virtio, at least. To solve this the plugin need at least a parser, a good example of implementation that I can give is a project ethq where the parser determines from the driver name what it needs to look for (example)


Currently netdata don't have any part of ethtool interface, but if this interface will be available, there will be another parts that's may be covered:

What this implementation gives to netdata product - expansion of equipment on which netdata will be indispensable! 'White boxes' - the Ethernet Switches with linux as control-plane, for example Edge-Core will be added to servers and virtual machines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New features needs triage Issues which need to be manually labelled
Projects
None yet
Development

No branches or pull requests

2 participants