New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue where traffic from a pod could be dropped despite allow policy when DNS L7 rules are used #11764
Conversation
Cilium defines a precedence ordering between sources for mappings from IP to Identity, defined in pkg/source/source.go. These are used to determine which Identity should be used for an IP when multiple sources provide conflicting reports of the Identity to be associated with an IP. This ordering was handled in the case of source.Generated CIDR/FQDN -sourced identities by inserting potentially two overlapping entries into the ipcache.IPIdentityCache when an endpoint's IP is the same as a CIDR mapping: * An endpoint Identity would be inserted with the key 'w.x.y.z', and * A CIDR Identity would be inserted with the key 'w.x.y.z/32'. (IPv6: /128) During Upsert() and Delete(), when overlapping entries existed in the map, this overlap would be resolved by directly checking whether another entry exists with/without the `/32` suffix (IPv6: /128) and either hiding the update from listeners (when a shadowed mapping is upserted), or converting the delete to an update (when a shadowing entry is deleted, revealing the underlying shadowed entry). During DumpToListenerLocked() however, this shadowing would not be resolved and instead both entries would be dumped to the caller in an arbitrary order. This is particularly notable on Linux 4.11 to 4.15 where LPM support is available but deletion is not supported. In these cases, Cilium periodically dumps the ipcache to a listener using this function, and populates the BPF ipcache map using this dump. Any time this dump occurs (default 5 minutes), it would be possible for the ipcache mapping to be modified to map the IP to either of the cached identities. Depending on the Go runtime in use by the version of Cilium, this may or may not consistently provide particular dump ordering. Resolve this issue by keeping track of shadowed entries with an explicit boolean field in the value of the map, and avoiding dumping such entries in the DumpToListenerLocked() function. Fixes: cilium#11517 Reported-by: Will Deuschle <wdeuschle@palantir.com> Suggested-by: Jarno Rajahalme <jarno@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io>
These unit tests validate the bug fixed by the prior commit where entries dumped from the ipcache may not consistently map IPs to the correct Identity. Note that there is a potential Golang map dump ordering aspect to these tests so depending on the Go version used they may/may not consistently fail. They consistently fail for me prior to the fix (eg v1.8.0-rc1), and consistently pass with the fix, but YMMV. Signed-off-by: Joe Stringer <joe@cilium.io>
c01c355
to
272484d
Compare
test-me-please |
retest-4.9 (previous failure: https://jenkins.cilium.io/job/Cilium-PR-K8s-newest-kernel-4.9/465) |
retest-4.19 (previous failure: https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Kernel/1800) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it ok to keep shadowed identities in ipToIdentityCache but not in the listeners? and which identity is going to be shadowed?
@aanm good question, I didn't cover that in the commit/PR details. SummaryThe core issue is the concern of a policy having two entries:
In this scenario, the 'shadowing' rules of Cilium state that the endpoint identity takes precedence over the /32 CIDR allow. You can't accidentally allow access to/from an endpoint based on the IP it gets assigned; you must explicitly declare the intent to allow traffic to/from that endpoint based on its labels. The tricky part with the shadowing is how we resolve it when dynamic changes occur such as creating/destroying endpoints with the overlapping IP, I'll discuss this later in this explanation. Listeners vs. IPIdentityCacheThe listeners represent the BPF datapath map / proxy ip-to-identity map. The ipcache there maps exactly from a CIDR prefix to one identity, there's no way to store multiple values for the same IP. So at that layer, there must be exactly one identity for a given IP. For the in-memory IPIdentityCache, the rules are a bit more loose - it's keyed by a string. The most common case of overlapping identities is between an endpoint IP and a CIDR allocated through policy. To ensure that insertion/deletion of IP->Identity mappings consistently behave in the same manner, we keep track of both. This way, we can support the exact case that is implemented in the unit test; If an endpoint shadows an IP used in CIDR policy, then the endpoint labels take precedence and we can allow traffic to/from that endpoint based on the labels. If we then remove the endpoint again then we fall back to resolving the IP to its CIDR identity, allowing us to implement policy via CIDR/FQDN logic. Overlapping IP->Identity mappingsConsider first for a moment, if we have fresh Cilium and import a CIDR policy that allows ImplementationImplementation-wise the current version tracks both shadowed/non-shadowed entries all in one map with a value field to track whether it's shadowed or not. We could consider instead using a map -> list or two maps - shadowed/non-shadowed. This could potentially help to ensure that only the latest, unshadowed entries are propagated to the listeners and API, but in practical terms I'm not sure there are any differences from the current implementation. |
From the commit message:
This issue would require a pod on the node with the affected pod to make a request for a DNS name that maps to an IP which represents a remote pod, for that request to be processed via L7 DNS policy, and for the remote pod to subsequently make requests to the affected pod in direct routing mode on Linux 4.11-4.15.
This was only observed on v1.7.x, however I've marked it for backport to v1.6.x as well in an abundance of caution. Cilium v1.6 and v1.7 use different Go versions (v1.12 vs v1.13) so it's likely that the map iteration behaviour is different between these versions. Go doesn't make any guarantees about map iteration ordering, so this PR addresses this by explicitly resolving the overlap between identities for the same IP.
Shout outs also to @tklauser @jrajahalme for their help tracking this down.
Fixes: #11517