-
Notifications
You must be signed in to change notification settings - Fork 250
feat(packetparser): Allow sampling of packets #1628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
28b1e68
to
345f2aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for this contribution and the detailed context, @mmckeen! The sampling feature is a great idea for improving resource utilization on nodes with high traffic. I noticed there are also some improvements to existing packet reporting logic bundled with the new implementation. To keep reviews focused and changes easier to track, would you be open to splitting the new sampling functionality and the general code improvements into two separate PRs? This would help us ensure each change gets the attention it deserves.
Really appreciate your work and your openness to feedback. Let us know how we can support you with the split—happy to help however we can! cc @nddq
@SRodi do you mind pointing out the code improvements? If you mean the change to the reporting of important packets I can split that out. Without it though the sampling won't work (as mentioned in the PR description). |
@mmckeen I would prefer to see the change to packets reporting and the sampling split into 2 separate PRs. |
0d5a28a
to
345f2aa
Compare
I split those changes off into #1665, as noted that PR relies on this one to report accurate packet counts and maintain backwards compatibility. I can also split the metric weighting and counting of packets between reports into that PR so it's isolated, let me know what makes sense 🙇. |
Yes, that makes sense. Thanks. cc @mmckeen |
…s, improve scalability (#1665) # Description Previously `packetparser` in `high` `dataAggregationLevel` would report (mostly) every single packet since important flags were observed over the lifetime of the connection. This changes that behavior to only observe the important flags on individual packets and report when necessary. This will mean less packets are reported. However, it also adds back weighting for bytes, packets, and TCP flags so that metrics remain accurate versus before. I also noticed the current docs for the TCP flags metrics are inaccurate, we only report a subset of the supported flags. Not sure if this is intentional, however supporting more flags will put more memory pressure on both conntrack as well as performance pressure on packet reporting. With sampling in place, this should be more than worth it but there may be repercussions for the performance of `low` `dataAggregationLevel`. ## Checklist - [X] I have read the [contributing documentation](https://retina.sh/docs/Contributing/overview). - [X] I signed and signed-off the commits (`git commit -S -s ...`). See [this documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification) on signing commits. - [X] I have correctly attributed the author(s) of the code. - [X] I have tested the changes locally. - [X] I have followed the project's style guidelines. - [X] I have updated the documentation, if necessary. - [X] I have added tests, if applicable. ## Screenshots (if applicable) or Testing Completed eBPF objects compile and load as expected. # `main` Branch <img width="1463" alt="tcpflags main" src="https://github.com/user-attachments/assets/167908f0-7c37-4498-a7f6-20a41110d925" /> <img width="1463" alt="prometheus packets retina main" src="https://github.com/user-attachments/assets/de8ad834-ed91-4673-8643-aa9cc51b3451" /> <img width="1463" alt="prometheus bytes retina main" src="https://github.com/user-attachments/assets/420f54dc-fbbe-4a7b-a290-24c5d6666518" /> # This Branch <img width="1463" alt="tcpflags patched" src="https://github.com/user-attachments/assets/88460ee0-f769-4992-b9be-392b38a64b19" /> <img width="1463" alt="prometheus packets retina patched" src="https://github.com/user-attachments/assets/304bbb97-8c57-477f-9c19-91bb1510b9c7" /> <img width="1463" alt="prometheus bytes retina patched" src="https://github.com/user-attachments/assets/ef917562-487e-45be-8fbb-c9573fa708c1" /> ## Additional Notes #1628 will be a follow-up to this to add additional sampling functionality. --- Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more information on how to contribute to this project. Signed-off-by: Matthew McKeen <matthew.mckeen@fastly.com>
371e50c
to
0e9cd0c
Compare
0e9cd0c
to
1983b55
Compare
Description
For nodes with lots of forwarded packets
packetparser
can easily get overwhelmed and utilize a large amount of resources.This allows for optional sampling of
packetparser
packets with best-effort weighting of metrics based on observed packets. This only applies todataAggregationLevel: high
.There's only a minor change in the existing behavior when the default sampling rate of 1 is applied, this is that we only look at the current packet for deciding when to report for important TCP flags. Previously we would report if any important flag had been seen on the connection. In practice this would mean we'd always report packets since
SYN
was considered an important flag. Since we also take into account whether or not any new flags are seen on the connection the number of packets now being reported at the reporting interval versus always as before should be minimal.Checklist
git commit -S -s ...
). See this documentation on signing commits.Screenshots (if applicable) or Testing Completed
At sampling rate of
100
, forwarded bytes and packets remain similar to those reported by cadvisor.This correlation gets stronger the lower the sampling rate (below at
2
).Additional Notes
I was unsure whether or not reporting packets for important TCP flags across the entire lifetime of the connection was intentional or not. Without looking only at the current packet sampling doesn't work. I can change the logic to keep the existing behavior for non-sampled packets if it makes sense.
Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.