Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hubble: Populate traffic direction for trace and drop events #11062

Merged
merged 2 commits into from Apr 23, 2020

Conversation

gandro
Copy link
Member

@gandro gandro commented Apr 20, 2020

Where possible, populate the TrafficDirection field from the trace
and drop notifications. This is done by comparing the Source
endpoint field of the trace/drop event with source address of the
captured IP packet.

For drop notifications, we assume there are no ongoing connections
which this packet belongs to and just compare the source address. For
trace notifications however, we do assume that there might an ongoing
connection. Therefore, we rely on the connection tracking state to
revert the traffic direction for reply packages. For trace observation
points without connection tracking, we do not assign any traffic
direction.

Pulling in the datapath team for a quick review as well to confirm that.

/cc @michi-covalent

@gandro gandro added pending-review release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/hubble Impacts hubble server or relay labels Apr 20, 2020
@gandro gandro requested review from a team April 20, 2020 12:53
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.8.0 Apr 20, 2020
@gandro
Copy link
Member Author

gandro commented Apr 20, 2020

test-me-please

@coveralls
Copy link

coveralls commented Apr 20, 2020

Coverage Status

Coverage increased (+0.06%) to 44.697% when pulling fa7f33e on pr/gandro/hubble-tracenotify-trafficdirection into 531e61f on master.

// We need to access the connection tracking result from the `Reason`
// field to invert the direction for reply packets. The `Reason` field
// is populated for TRACE_TO_{LXC,HOST,STACK,PROXY} events.
// TRACE_TO_PROXY events happen for both ingress and egress flows,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TRACE_TO_PROXY events happen for both ingress and egress flows

So do TraceToLxc, TraceToStack, TraceToHost flows depending on whether the endpoint, remote peer or host initiates the connection or is responding to the connection. decodeIsReply() below appears to be relying on connection-tracking state to determine whether the traffic is a reply or not, so this seems to be more to do with whether CT runs on the particular path through the datapath or not.

I'd expect for us to decide to send something to the proxy, we'd need CT state so should be able to provide the same visibility, though it's possible that for historic reasons the trace messages don't always contain this information.

I guess this brings us to basically, it looks like these observation points reliably provide the information today, and while other observation points might be able to provide the info, they may/may not be reliably providing it with the current implementation. If this PR is improving the ability to detect flow direction, then great. I wouldn't be surprised if we need to come back and revisit this for other observation points though (for instance, we're missing stuff like ToOverlay which is a common deployment setup but will handle traffic from local endpoints OR the local host, so there's no guarantee that we run the connection tracker).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the feedback @joestringer! Much appreciated.

TRACE_TO_PROXY events happen for both ingress and egress flows

So do TraceToLxc, TraceToStack, TraceToHost flows depending on whether the endpoint, remote peer or host initiates the connection or is responding to the connection.

You are, of course, absolutely correct.

I somehow completely overlooked the obvious fact that I also have to look at the source/destination of the traffic do determine its direction. The code in this PR is not sufficient to determine the traffic direction 🤦‍♂️ And yes, I should be also able to support ToProxy, it reliably contains the CT result. I'm putting this PR back into Draft mode.

FWIW: The reason why ToOverlay is absent from the list, is that it doesn't seem to actually populate the CT reason in the send_trace_notify call (second last argument):

cilium/bpf/lib/encap.h

Lines 138 to 139 in ae5588c

send_trace_notify(ctx, TRACE_TO_OVERLAY, seclabel, 0, 0, ENCAP_IFINDEX,
0, monitor);

Copy link
Member Author

@gandro gandro Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have reworked the code to actually infer the traffic direction. It is now comparing the endpoint ID behind the actual source of the IP packet with the endpoint ID of the trace event to determine the traffic direction.

Note that I still consider the traffic information best effort. I have added support for DropNotify as well, with the (in reality not always valid) assumption that we don't need connection tracking there, since for dropped packets there is (probably) no ongoing connection. However, since we don't have the connection tracking state available for drop events anyways, there is not much we can do otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now comparing the endpoint ID behind the actual source of the IP packet with the endpoint ID of the trace event to determine the traffic direction.

I don't quite follow this part. A packet can be coming from or going to arbitrary peers, but that doesn't tell you whether the connection was initiated from there or somewhere else. That's why we bring in the connection tracker, which stores state that tells you, based upon a 5-tuple lookup, the originally-seen packet 5tuple (and hence direction when compared to the current tuple).

However, since we don't have the connection tracking state available for drop events anyways, there is not much we can do otherwise.

On average for drops we can roughly assume they're forward direction because we're dropping them and it's hard to sustain a communication channel if one of the directions is dropped :) ...but yeah in the corner cases like policy transition from allow -> deny, or other weird cases like Cilium restart (and hence proxy connection gets disrupted), drops can get tricky to classify with 100% confidence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the implementation I think we're on the same page on these points.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for the review and input @joestringer!

You have made me think about some more cases that I haven't considered. For example I'm currently only comparing the source address of the packet. I wonder if I should check for tn.Source == dstEP as well (and assign traffic direction unknown if neither source nor destination are the endpoint at which the event occurred). I also have not considered if NAT has an impact on correctness here as well. Maybe the proper way would be to just extend the trace points in the datapath with a direction flag, instead of trying to re-implement the logic here.

For now, I think I going to leave it as is. I might revisit the approach in the future, but I currently don't want to allocate too many cycles on a best-effort feature.

@gandro gandro marked this pull request as draft April 21, 2020 09:01
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Where possible, populate the TrafficDirection field from the trace
and drop notifications. This is done by comparing the `Source`
endpoint field of the trace/drop event with source address of the
captured IP packet.

For drop notifications, we assume there are no ongoing connections
which this packet belongs to and just compare the source address. For
trace notifications however, we do assume that there might an ongoing
connection. Therefore, we rely on the connection tracking state to
revert the traffic direction for reply packages. For trace observation
points without connection tracking, we do not assign any traffic
direction.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro force-pushed the pr/gandro/hubble-tracenotify-trafficdirection branch from f60216c to fa7f33e Compare April 22, 2020 16:26
@gandro
Copy link
Member Author

gandro commented Apr 22, 2020

test-me-please

@gandro gandro marked this pull request as ready for review April 22, 2020 16:27
@gandro gandro changed the title hubble: Populate traffic direction for TraceTo{Lxc,Host,Stack} events hubble: Populate traffic direction for trace and drop events Apr 22, 2020
@pchaigno pchaigno merged commit 688e42b into master Apr 23, 2020
1.8.0 automation moved this from In progress to Merged Apr 23, 2020
@pchaigno pchaigno deleted the pr/gandro/hubble-tracenotify-trafficdirection branch April 23, 2020 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/hubble Impacts hubble server or relay
Projects
No open projects
1.8.0
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

6 participants