feat(packetparser): Allow sampling of packets #1628

mmckeen · 2025-05-27T21:01:00Z

Description

For nodes with lots of forwarded packets packetparser can easily get overwhelmed and utilize a large amount of resources.

This allows for optional sampling of packetparser packets with best-effort weighting of metrics based on observed packets. This only applies to dataAggregationLevel: high.

There's only a minor change in the existing behavior when the default sampling rate of 1 is applied, this is that we only look at the current packet for deciding when to report for important TCP flags. Previously we would report if any important flag had been seen on the connection. In practice this would mean we'd always report packets since SYN was considered an important flag. Since we also take into account whether or not any new flags are seen on the connection the number of packets now being reported at the reporting interval versus always as before should be minimal.

Checklist

I have read the contributing documentation.
I signed and signed-off the commits (git commit -S -s ...). See this documentation on signing commits.
I have correctly attributed the author(s) of the code.
I have tested the changes locally.
I have followed the project's style guidelines.
I have updated the documentation, if necessary.
I have added tests, if applicable.

Screenshots (if applicable) or Testing Completed

At sampling rate of 100, forwarded bytes and packets remain similar to those reported by cadvisor.

This correlation gets stronger the lower the sampling rate (below at 2).

Additional Notes

I was unsure whether or not reporting packets for important TCP flags across the entire lifetime of the connection was intentional or not. Without looking only at the current packet sampling doesn't work. I can change the logic to keep the existing behavior for non-sampled packets if it makes sense.

Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.

SRodi

Thanks so much for this contribution and the detailed context, @mmckeen! The sampling feature is a great idea for improving resource utilization on nodes with high traffic. I noticed there are also some improvements to existing packet reporting logic bundled with the new implementation. To keep reviews focused and changes easier to track, would you be open to splitting the new sampling functionality and the general code improvements into two separate PRs? This would help us ensure each change gets the attention it deserves.

Really appreciate your work and your openness to feedback. Let us know how we can support you with the split—happy to help however we can! cc @nddq

mmckeen · 2025-06-04T16:42:01Z

Thanks so much for this contribution and the detailed context, @mmckeen! The sampling feature is a great idea for improving resource utilization on nodes with high traffic. I noticed there are also some improvements to existing packet reporting logic bundled with the new implementation. To keep reviews focused and changes easier to track, would you be open to splitting the new sampling functionality and the general code improvements into two separate PRs? This would help us ensure each change gets the attention it deserves.

Really appreciate your work and your openness to feedback. Let us know how we can support you with the split—happy to help however we can! cc @nddq

@SRodi do you mind pointing out the code improvements?

If you mean the change to the reporting of important packets I can split that out. Without it though the sampling won't work (as mentioned in the PR description).

SRodi · 2025-06-05T13:27:55Z

If you mean the change to the reporting of important packets I can split that out. Without it though the sampling won't work (as mentioned in the PR description).

@mmckeen I would prefer to see the change to packets reporting and the sampling split into 2 separate PRs.

mmckeen · 2025-06-06T17:24:11Z

I split those changes off into #1665, as noted that PR relies on this one to report accurate packet counts and maintain backwards compatibility.

I can also split the metric weighting and counting of packets between reports into that PR so it's isolated, let me know what makes sense 🙇.

SRodi · 2025-06-09T15:21:45Z

I can also split the metric weighting and counting of packets between reports into that PR so it's isolated, let me know what makes sense 🙇.

Yes, that makes sense. Thanks. cc @mmckeen

mmckeen · 2025-06-11T23:03:17Z

@SRodi #1665 has been extended to include the weighting functionality, I'll put this into draft until I have the chance to rebase this on top of that if things look good 🙇

…s, improve scalability (#1665) # Description Previously `packetparser` in `high` `dataAggregationLevel` would report (mostly) every single packet since important flags were observed over the lifetime of the connection. This changes that behavior to only observe the important flags on individual packets and report when necessary. This will mean less packets are reported. However, it also adds back weighting for bytes, packets, and TCP flags so that metrics remain accurate versus before. I also noticed the current docs for the TCP flags metrics are inaccurate, we only report a subset of the supported flags. Not sure if this is intentional, however supporting more flags will put more memory pressure on both conntrack as well as performance pressure on packet reporting. With sampling in place, this should be more than worth it but there may be repercussions for the performance of `low` `dataAggregationLevel`. ## Checklist - [X] I have read the [contributing documentation](https://retina.sh/docs/Contributing/overview). - [X] I signed and signed-off the commits (`git commit -S -s ...`). See [this documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification) on signing commits. - [X] I have correctly attributed the author(s) of the code. - [X] I have tested the changes locally. - [X] I have followed the project's style guidelines. - [X] I have updated the documentation, if necessary. - [X] I have added tests, if applicable. ## Screenshots (if applicable) or Testing Completed eBPF objects compile and load as expected. # `main` Branch <img width="1463" alt="tcpflags main" src="https://github.com/user-attachments/assets/167908f0-7c37-4498-a7f6-20a41110d925" /> <img width="1463" alt="prometheus packets retina main" src="https://github.com/user-attachments/assets/de8ad834-ed91-4673-8643-aa9cc51b3451" /> <img width="1463" alt="prometheus bytes retina main" src="https://github.com/user-attachments/assets/420f54dc-fbbe-4a7b-a290-24c5d6666518" /> # This Branch <img width="1463" alt="tcpflags patched" src="https://github.com/user-attachments/assets/88460ee0-f769-4992-b9be-392b38a64b19" /> <img width="1463" alt="prometheus packets retina patched" src="https://github.com/user-attachments/assets/304bbb97-8c57-477f-9c19-91bb1510b9c7" /> <img width="1463" alt="prometheus bytes retina patched" src="https://github.com/user-attachments/assets/ef917562-487e-45be-8fbb-c9573fa708c1" /> ## Additional Notes #1628 will be a follow-up to this to add additional sampling functionality. --- Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more information on how to contribute to this project. Signed-off-by: Matthew McKeen <matthew.mckeen@fastly.com>

mmckeen requested a review from a team as a code owner May 27, 2025 21:01

mmckeen requested review from mainred and byte-msft May 27, 2025 21:01

mmckeen force-pushed the packetParserSampling branch 4 times, most recently from 28b1e68 to 345f2aa Compare May 27, 2025 22:51

nddq requested review from nddq and SRodi and removed request for mainred and byte-msft May 28, 2025 18:34

nddq added type/enhancement area/plugins area/ebpf labels May 28, 2025

SRodi reviewed Jun 4, 2025

View reviewed changes

mmckeen force-pushed the packetParserSampling branch from 0d5a28a to 345f2aa Compare June 5, 2025 22:07

mmckeen mentioned this pull request Jun 6, 2025

fix(packetparser): Fix under reporting of TCP flags and packet metrics, improve scalability #1665

Merged

7 tasks

mmckeen marked this pull request as draft June 11, 2025 23:02

mmckeen force-pushed the packetParserSampling branch 2 times, most recently from 371e50c to 0e9cd0c Compare July 22, 2025 21:19

mmckeen closed this Jul 22, 2025

mmckeen force-pushed the packetParserSampling branch from 0e9cd0c to 1983b55 Compare July 22, 2025 21:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(packetparser): Allow sampling of packets #1628

feat(packetparser): Allow sampling of packets #1628

Uh oh!

mmckeen commented May 27, 2025 •

edited

Loading

Uh oh!

SRodi left a comment

Uh oh!

mmckeen commented Jun 4, 2025

Uh oh!

SRodi commented Jun 5, 2025

Uh oh!

mmckeen commented Jun 6, 2025 •

edited

Loading

Uh oh!

SRodi commented Jun 9, 2025

Uh oh!

mmckeen commented Jun 11, 2025

Uh oh!

Uh oh!

feat(packetparser): Allow sampling of packets #1628

feat(packetparser): Allow sampling of packets #1628

Uh oh!

Conversation

mmckeen commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Screenshots (if applicable) or Testing Completed

Additional Notes

Uh oh!

SRodi left a comment

Choose a reason for hiding this comment

Uh oh!

mmckeen commented Jun 4, 2025

Uh oh!

SRodi commented Jun 5, 2025

Uh oh!

mmckeen commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SRodi commented Jun 9, 2025

Uh oh!

mmckeen commented Jun 11, 2025

Uh oh!

Uh oh!

mmckeen commented May 27, 2025 •

edited

Loading

mmckeen commented Jun 6, 2025 •

edited

Loading