Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix kube-apiserver policy matching feature with tunneling enabled #18527

Merged

Conversation

christarazi
Copy link
Member

@christarazi christarazi commented Jan 18, 2022

  • bpf: Fix kube-apiserver policy drop in tunneling
  • test/K8sT: Add case for duplicate kube-apiserver policy
  • docs: Remove kube-apiserver policy feature limitation

See commit msgs.

Fixes: #18049
Updates: #17962
Updates: #17829

@christarazi christarazi requested a review from a team January 18, 2022 20:32
@christarazi christarazi requested a review from a team as a code owner January 18, 2022 20:32
@christarazi christarazi requested a review from a team January 18, 2022 20:32
@christarazi christarazi added release-note/bug This PR fixes an issue in a previous release of Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. kind/bug This is a bug in the Cilium logic. needs-backport/1.11 labels Jan 18, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jan 18, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.11.1 Jan 18, 2022
@christarazi christarazi force-pushed the pr/christarazi/kube-apiserver-tunneling branch from a7cdab0 to dcda34e Compare January 18, 2022 20:33
Comment on lines 77 to 93

/* Maybe overwrite the REMOTE_NODE_ID with
* KUBE_APISERVER_NODE_ID to support upgrade. After v1.12,
* this should be removed. */
if (*identity == REMOTE_NODE_ID) {
struct remote_endpoint_info *info;

/* Look up the ipcache for the src IP, it will give us
* the real identity of that IP */
info = ipcache_lookup6(&IPCACHE_MAP,
(union v6addr *) &ip6->saddr,
V6_CACHE_KEY_LEN);
if (info)
*identity = info->sec_label;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to get a look from the @cilium/sig-encryption team on this change.

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit in the commit message.

bpf/bpf_overlay.c Show resolved Hide resolved
@christarazi christarazi force-pushed the pr/christarazi/kube-apiserver-tunneling branch 2 times, most recently from f63ec15 to 1fb0de3 Compare January 18, 2022 21:04
When running in tunneling mode with a kube-apiserver in the cluster, the
node with the kube-apiserver will transmit packets towards remote pods
with identity 6 (remote-node).

The recipient node will receive the packet and trust the identity as
"remote-node" and not "kube-apiserver". If the receiving pod has a
policy to only allow "from kube-apiserver" then this policy would not
work, and thus drop the packet.

To fix this, we switch the source identity in bpf_overlay from
"remote-node" to the identity of the destination IP via the ipcache,
which we expect to be the "kube-apiserver" identity. If the ipcache
entry doesn't exist or the identity in the ipcache is "remote-node"
already, then this is a no-op (besides the extra ipcache lookup).

Going forward, there are two future changes that are needed in order:

1) A subsequent change to make any packet associated with the
   kube-apiserver IP have ID 7 (rather than ID 6; without needing the
   logic added by this commit).
2) Once v1.12.y is released, another subsequent change to remove this
   specific code added in this commit because we can expect all of this
   traffic to be transmitted with "kube-apiserver" identity (7),
   **after** (1) is implemented.

For a hitless upgrade, we would only support a path from Cilium v1.11.x
(this commit would need to be backported to v1.11.x) to v1.12.y. This is
because only v1.11.x or later can successfully receive this traffic and
handle policy correctly, as guaranteed by the above.

Co-authored-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi force-pushed the pr/christarazi/kube-apiserver-tunneling branch from 1fb0de3 to a9316a6 Compare January 18, 2022 21:11
@christarazi
Copy link
Member Author

christarazi commented Jan 18, 2022

/test

Job 'Cilium-PR-K8s-1.21-kernel-5.4' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sPolicyTestExtended Validate toEntities KubeAPIServer Still allows connection to KubeAPIServer with a duplicate policy

Failure Output

FAIL: policy /home/jenkins/workspace/Cilium-PR-K8s-1.21-kernel-5.4/src/github.com/cilium/cilium/test/k8sT/manifests/cnp-to-entities-kube-apiserver.yaml cannot be deleted in "default" namespace

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.21-kernel-5.4 so I can create a new GitHub issue to track it.

Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sPolicyTestExtended Validate toEntities KubeAPIServer Still allows connection to KubeAPIServer with a duplicate policy

Failure Output

FAIL: policy /home/jenkins/workspace/Cilium-PR-K8s-GKE/src/github.com/cilium/cilium/test/k8sT/manifests/cnp-to-entities-kube-apiserver.yaml cannot be deleted in "default" namespace

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create a new GitHub issue to track it.

@joestringer joestringer added this to Needs backport from master in 1.11.2 Jan 18, 2022
@joestringer joestringer removed this from Needs backport from master in 1.11.1 Jan 18, 2022
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, one small nit: Please update Documentation/policy/language.rst to remove the text about the known issue. (Won't need re-running the full CI, can add a small patch at the end before merging or follow up with the change in another PR after this)

Additional sanity check coverage that the kube-apiserver policy matching
feature still works with a duplicate policy added and the previous
removed.

Updates: cilium#17829

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Now that the limitation has been fixed, remove it from the docs.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi force-pushed the pr/christarazi/kube-apiserver-tunneling branch from a9316a6 to d1d3203 Compare January 19, 2022 22:38
@christarazi christarazi requested a review from a team as a code owner January 19, 2022 22:38
@christarazi
Copy link
Member Author

christarazi commented Jan 19, 2022

I've added an extra commit to address Joe's comment above. The CI failures were legitimate (related to test setup). They should be fixed now.

@christarazi
Copy link
Member Author

christarazi commented Jan 19, 2022

/test

Job 'Cilium-PR-K8s-1.22-kernel-4.19' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks graceful termination of service endpoints Checks client terminates gracefully on service endpoint deletion

Failure Output

FAIL: Timed out after 60.001s.

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-4.19 so I can create a new GitHub issue to track it.

Job 'Cilium-PR-K8s-1.23-kernel-net-next' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sPolicyTest Multi-node policy test validates fromEntities policies Validates fromEntities all policy

Failure Output

FAIL: Can not connect to service "10.0.0.31" from outside cluster (1/1)

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.23-kernel-net-next so I can create a new GitHub issue to track it.

@christarazi
Copy link
Member Author

@joamaki ping :)

@christarazi christarazi added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 26, 2022
@kkourt kkourt merged commit bd7ee73 into cilium:master Jan 26, 2022
@christarazi christarazi deleted the pr/christarazi/kube-apiserver-tunneling branch January 26, 2022 21:17
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.10 in 1.11.2 Jan 31, 2022
@joamaki joamaki added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed backport-pending/1.11 labels Feb 8, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.10 to Backport done to v1.11 in 1.11.2 Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
No open projects
1.11.2
Backport done to v1.11
Development

Successfully merging this pull request may close these issues.

kube-apiserver policy matching does not work with tunneling mode
7 participants