Skip to content

feat(packetparser): Only report important packets #1665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mmckeen
Copy link
Contributor

@mmckeen mmckeen commented Jun 6, 2025

Description

Previously packetparser in high dataAggregationLevel would report (mostly) every single packet since important flags were observed over the lifetime of the connection.

This changes that behavior to only observe the important flags on individual packets and report when necessary.

This will mean less packets are reported. However, it also adds back weighting for bytes, packets, and TCP flags so that metrics remain accurate versus before.

I also noticed the current docs for the TCP flags metrics are inaccurate, we only report a subset of the supported flags. Not sure if this is intentional, however supporting more flags will put more memory pressure on both conntrack as well as performance pressure on packet reporting. With sampling in place, this should be more than worth it but there may be repercussions for the performance of low dataAggregationLevel.

Checklist

  • I have read the contributing documentation.
  • I signed and signed-off the commits (git commit -S -s ...). See this documentation on signing commits.
  • I have correctly attributed the author(s) of the code.
  • I have tested the changes locally.
  • I have followed the project's style guidelines.
  • I have updated the documentation, if necessary.
  • I have added tests, if applicable.

Screenshots (if applicable) or Testing Completed

eBPF objects compile and load as expected.

main Branch

tcpflags main prometheus packets retina main prometheus bytes retina main

This Branch

tcpflags patched prometheus packets retina patched prometheus bytes retina patched

Additional Notes

#1628 will be a follow-up to this to add additional sampling functionality.


Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.

@mmckeen mmckeen requested a review from a team as a code owner June 6, 2025 17:21
@mmckeen mmckeen requested review from vipul-21 and QxBytes June 6, 2025 17:21
@mmckeen mmckeen force-pushed the reportImportantPacketsTweak branch 6 times, most recently from 4384e47 to e6e6161 Compare June 12, 2025 23:31
@nddq nddq requested review from nddq and SRodi and removed request for vipul-21 and QxBytes June 13, 2025 15:12
@mmckeen mmckeen force-pushed the reportImportantPacketsTweak branch from e6e6161 to 5d502a9 Compare June 16, 2025 15:00
Signed-off-by: Matthew McKeen <matthew.mckeen@fastly.com>
@mmckeen mmckeen force-pushed the reportImportantPacketsTweak branch from 5d502a9 to 11be8f2 Compare June 18, 2025 14:37
@mmckeen
Copy link
Contributor Author

mmckeen commented Jun 20, 2025

@nddq @SRodi this is ready for review 🙇

@nddq
Copy link
Member

nddq commented Jun 26, 2025

@mmckeen sorry for the delay, I just got back from a break. I’ve gone through your proposed change a couple of times, and it looks solid to me. In fact, it addresses something we initially overlooked when conntrack was introduced. First, a few points to make sure we're aligned:

Previously packetparser in high dataAggregationLevel would report (mostly) every single packet since important flags were observed over the lifetime of the connection.

That’s correct. For any given packet, we currently:

  • Report it if it contains a flag set we haven’t seen before for this specific connection
  • Report it if a certain amount of time has passed since the last reported packet for this connection (default: 30 seconds; applies to both dataAggregation levels)
  • Otherwise, we skip it

In a typical TCP connection, the reported events would likely look like:
SYN, SYN-ACK, ACK, PSH, PSH-ACK, (30 secs), PSH, PSH-ACK, ... FIN, FIN-ACK, FIN-ACK

As a result, we ignore all packets during those 30-second windows, which skews the reported packet, byte, and TCP flag counts from the actual values.

That said:

This changes that behavior to only observe the important flags on individual packets and report when necessary.

Could you clarify this part? From what I see, conntrack already behaves this way today — so I’m not sure this change introduces new behavior?

This will mean less packets are reported. However, it also adds back weighting for bytes, packets, and TCP flags so that metrics remain accurate versus before.

This is great — it addresses the gap I mentioned earlier around ignored packets. So in that sense, this feels more like a bug fix than a new feature 🙂

@mmckeen
Copy link
Contributor Author

mmckeen commented Jun 26, 2025

@mmckeen sorry for the delay, I just got back from a break. I’ve gone through your proposed change a couple of times, and it looks solid to me. In fact, it addresses something we initially overlooked when conntrack was introduced. First, a few points to make sure we're aligned:

Previously packetparser in high dataAggregationLevel would report (mostly) every single packet since important flags were observed over the lifetime of the connection.

That’s correct. For any given packet, we currently:

  • Report it if it contains a flag set we haven’t seen before for this specific connection
  • Report it if a certain amount of time has passed since the last reported packet for this connection (default: 30 seconds; applies to both dataAggregation levels)
  • Otherwise, we skip it

In a typical TCP connection, the reported events would likely look like: SYN, SYN-ACK, ACK, PSH, PSH-ACK, (30 secs), PSH, PSH-ACK, ... FIN, FIN-ACK, FIN-ACK

As a result, we ignore all packets during those 30-second windows, which skews the reported packet, byte, and TCP flag counts from the actual values.

That said:

This changes that behavior to only observe the important flags on individual packets and report when necessary.

Could you clarify this part? From what I see, conntrack already behaves this way today — so I’m not sure this change introduces new behavior?

This will mean less packets are reported. However, it also adds back weighting for bytes, packets, and TCP flags so that metrics remain accurate versus before.

This is great — it addresses the gap I mentioned earlier around ignored packets. So in that sense, this feels more like a bug fix than a new feature 🙂

I think it's a bit of both a bug fix and a new feature.

This will also always report packets if we hit important flags like TCP_URG TCP_ECE but only for the packet with that flag and not for the rest of the connection.

But yes, I think overall this is more a bug fix and that is just a minor change to reflect expected functionality that the 30 second reporting window is respected for connections without new flags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants