Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue where traffic from a pod could be dropped despite allow policy when DNS L7 rules are used #11764

Merged
merged 2 commits into from May 29, 2020

Conversation

joestringer
Copy link
Member

@joestringer joestringer commented May 29, 2020

From the commit message:

Cilium defines a precedence ordering between sources for mappings from
IP to Identity, defined in pkg/source/source.go. These are used to
determine which Identity should be used for an IP when multiple sources
provide conflicting reports of the Identity to be associated with an IP.

This ordering was handled in the case of source.Generated CIDR/FQDN
-sourced identities by inserting potentially two overlapping entries
into the ipcache.IPIdentityCache when an endpoint's IP is the same as a
CIDR mapping:

* An endpoint Identity would be inserted with the key 'w.x.y.z', and
* A CIDR Identity would be inserted with the key 'w.x.y.z/32'. (IPv6: /128)

During Upsert() and Delete(), when overlapping entries existed in the
map, this overlap would be resolved by directly checking whether another
entry exists with/without the `/32` suffix (IPv6: /128) and either
hiding the update from listeners (when a shadowed mapping is upserted),
or converting the delete to an update (when a shadowing entry is
deleted, revealing the underlying shadowed entry).

During DumpToListenerLocked() however, this shadowing would not be
resolved and instead both entries would be dumped to the caller in an
arbitrary order. This is particularly notable on Linux 4.11 to 4.15
where LPM support is available but deletion is not supported. In these
cases, Cilium periodically dumps the ipcache to a listener using this
function, and populates the BPF ipcache map using this dump. Any time this
dump occurs (default 5 minutes), it would be possible for the ipcache
mapping to be modified to map the IP to either of the cached identities.
Depending on the Go runtime in use by the version of Cilium, this may or
may not consistently provide particular dump ordering.

Resolve this issue by keeping track of shadowed entries with an
explicit boolean field in the value of the map, and avoiding dumping
such entries in the DumpToListenerLocked() function.

This issue would require a pod on the node with the affected pod to make a request for a DNS name that maps to an IP which represents a remote pod, for that request to be processed via L7 DNS policy, and for the remote pod to subsequently make requests to the affected pod in direct routing mode on Linux 4.11-4.15.

This was only observed on v1.7.x, however I've marked it for backport to v1.6.x as well in an abundance of caution. Cilium v1.6 and v1.7 use different Go versions (v1.12 vs v1.13) so it's likely that the map iteration behaviour is different between these versions. Go doesn't make any guarantees about map iteration ordering, so this PR addresses this by explicitly resolving the overlap between identities for the same IP.

Shout outs also to @tklauser @jrajahalme for their help tracking this down.

Fixes: #11517

Cilium defines a precedence ordering between sources for mappings from
IP to Identity, defined in pkg/source/source.go. These are used to
determine which Identity should be used for an IP when multiple sources
provide conflicting reports of the Identity to be associated with an IP.

This ordering was handled in the case of source.Generated CIDR/FQDN
-sourced identities by inserting potentially two overlapping entries
into the ipcache.IPIdentityCache when an endpoint's IP is the same as a
CIDR mapping:

* An endpoint Identity would be inserted with the key 'w.x.y.z', and
* A CIDR Identity would be inserted with the key 'w.x.y.z/32'. (IPv6: /128)

During Upsert() and Delete(), when overlapping entries existed in the
map, this overlap would be resolved by directly checking whether another
entry exists with/without the `/32` suffix (IPv6: /128) and either
hiding the update from listeners (when a shadowed mapping is upserted),
or converting the delete to an update (when a shadowing entry is
deleted, revealing the underlying shadowed entry).

During DumpToListenerLocked() however, this shadowing would not be
resolved and instead both entries would be dumped to the caller in an
arbitrary order. This is particularly notable on Linux 4.11 to 4.15
where LPM support is available but deletion is not supported. In these
cases, Cilium periodically dumps the ipcache to a listener using this
function, and populates the BPF ipcache map using this dump. Any time this
dump occurs (default 5 minutes), it would be possible for the ipcache
mapping to be modified to map the IP to either of the cached identities.
Depending on the Go runtime in use by the version of Cilium, this may or
may not consistently provide particular dump ordering.

Resolve this issue by keeping track of shadowed entries with an
explicit boolean field in the value of the map, and avoiding dumping
such entries in the DumpToListenerLocked() function.

Fixes: cilium#11517

Reported-by: Will Deuschle <wdeuschle@palantir.com>
Suggested-by: Jarno Rajahalme <jarno@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
@joestringer joestringer added release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.6 labels May 29, 2020
@joestringer joestringer requested a review from a team as a code owner May 29, 2020 00:08
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.8.0 May 29, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.7.5 May 29, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.6.9 May 29, 2020
@joestringer joestringer changed the title Fix issue where traffic from a pod could be dropped despite policy when DNS L7 rules are used Fix issue where traffic from a pod could be dropped despite allow policy when DNS L7 rules are used May 29, 2020
These unit tests validate the bug fixed by the prior commit where
entries dumped from the ipcache may not consistently map IPs to the
correct Identity. Note that there is a potential Golang map dump
ordering aspect to these tests so depending on the Go version used they
may/may not consistently fail. They consistently fail for me prior to
the fix (eg v1.8.0-rc1), and consistently pass with the fix, but YMMV.

Signed-off-by: Joe Stringer <joe@cilium.io>
@joestringer
Copy link
Member Author

test-me-please

@joestringer joestringer added the priority/high This is considered vital to an upcoming release. label May 29, 2020
@coveralls
Copy link

coveralls commented May 29, 2020

Coverage Status

Coverage increased (+0.03%) to 36.9% when pulling 272484d on joestringer:submit/ipcache-bug into 1439b27 on cilium:master.

@tklauser
Copy link
Member

tklauser commented May 29, 2020

retest-4.9 (previous failure: https://jenkins.cilium.io/job/Cilium-PR-K8s-newest-kernel-4.9/465)

@tklauser
Copy link
Member

tklauser commented May 29, 2020

retest-4.19 (previous failure: https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Kernel/1800)

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it ok to keep shadowed identities in ipToIdentityCache but not in the listeners? and which identity is going to be shadowed?

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 29, 2020
@aanm aanm merged commit d909b14 into cilium:master May 29, 2020
1.8.0 automation moved this from In progress to Merged May 29, 2020
@joestringer joestringer deleted the submit/ipcache-bug branch May 29, 2020 15:47
@joestringer
Copy link
Member Author

joestringer commented May 29, 2020

why is it ok to keep shadowed identities in ipToIdentityCache but not in the listeners? and which identity is going to be shadowed?

@aanm good question, I didn't cover that in the commit/PR details.

Summary

The core issue is the concern of a policy having two entries:

  • One that allows traffic to/from an IP based on a CIDR /32 (note: This could also be based on FQDN policy, but I'll stick to CIDR here for simplicity)
  • One that allows traffic to/from pod labels for a pod which happens to have the same IP as the CIDR /32 in the policy rule above.

In this scenario, the 'shadowing' rules of Cilium state that the endpoint identity takes precedence over the /32 CIDR allow. You can't accidentally allow access to/from an endpoint based on the IP it gets assigned; you must explicitly declare the intent to allow traffic to/from that endpoint based on its labels. The tricky part with the shadowing is how we resolve it when dynamic changes occur such as creating/destroying endpoints with the overlapping IP, I'll discuss this later in this explanation.

Listeners vs. IPIdentityCache

The listeners represent the BPF datapath map / proxy ip-to-identity map. The ipcache there maps exactly from a CIDR prefix to one identity, there's no way to store multiple values for the same IP. So at that layer, there must be exactly one identity for a given IP.

For the in-memory IPIdentityCache, the rules are a bit more loose - it's keyed by a string. The most common case of overlapping identities is between an endpoint IP and a CIDR allocated through policy. To ensure that insertion/deletion of IP->Identity mappings consistently behave in the same manner, we keep track of both. This way, we can support the exact case that is implemented in the unit test; If an endpoint shadows an IP used in CIDR policy, then the endpoint labels take precedence and we can allow traffic to/from that endpoint based on the labels. If we then remove the endpoint again then we fall back to resolving the IP to its CIDR identity, allowing us to implement policy via CIDR/FQDN logic.

Overlapping IP->Identity mappings

Consider first for a moment, if we have fresh Cilium and import a CIDR policy that allows w.x.y.z/32 in the policy. We will allocate a CIDR identity A for policy use and push down an ipcache mapping that maps w.x.y.z->A. Let's say we then create a pod that has the same IP, w.x.y.z with labels-based identity B. Now we push down the mapping w.x.y.z -> B. Now we can apply label-based policy to control access from that endpoint. But wait, what happens when we remove the endpoint? Do we remove w.x.y.z -> B? If we simply removed the mapping for w.x.y.z, then we would also no longer allow the traffic based on CIDR; the datapath ipcache would fall back to resolving the identity as 'world' based on LPM rules. This would be inconsistent with the actual policy in place right now that allows traffic to/from that w.x.y.z/32. It's also inconsistent with initial state after we inserted the CIDR policy but didn't yet create the endpoint. To mitigate this, we track the CIDR identities that were previously used (shadowed), and reinstate them when the pod identity is removed.

Implementation

Implementation-wise the current version tracks both shadowed/non-shadowed entries all in one map with a value field to track whether it's shadowed or not. We could consider instead using a map -> list or two maps - shadowed/non-shadowed. This could potentially help to ensure that only the latest, unshadowed entries are propagated to the listeners and API, but in practical terms I'm not sure there are any differences from the current implementation.

@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.7 in 1.7.5 Jun 3, 2020
@joestringer joestringer moved this from Backport pending to v1.7 to Backport done to v1.7 in 1.7.5 Jun 3, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.6 in 1.6.9 Jun 4, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.6 to Backport done to v1.6 in 1.6.9 Jun 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/high This is considered vital to an upcoming release. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.6.9
Backport done to v1.6
1.7.5
Backport done to v1.7
1.8.0
  
Merged
Development

Successfully merging this pull request may close these issues.

Cilium drops pod traffic that should be allowed by policy (due to CIDR / FQDN identity)
5 participants