Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pod2pod strict mode for WireGuard #21856

Merged
merged 1 commit into from
Jun 30, 2023

Conversation

3u13r
Copy link
Contributor

@3u13r 3u13r commented Oct 23, 2022

When pod2pod encryption is enabled, there is a slight time window, where one pod may send unencrypted data to another one. This happens when a new pod is created but the information of the new endpoint has not propagated to the other nodes.

To prevent this from happening, we block all unencrypted pod2pod traffic. This is done via a filter in the datapath. The filter is configured at the same time the datapth is set up, sine we cannot rely on data which is only eventually updated at runtime.
The filter drops any unencrypted tcp/udp egress traffic which originates from and is sent to the PodCIDR and also leaves the node.

Signed-off-by: Benedict Schlueter bs@edgeless.systems
Signed-off-by: Leonard Cohnen lc@edgeless.systems

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Thanks for contributing!
Add strict mode for WireGuard Pod2Pod encryption

@3u13r 3u13r requested review from a team as code owners October 23, 2022 20:22
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 23, 2022
@maintainer-s-little-helper
Copy link

Commit 690e4daeb0252ee5a248889db024d3fbfb7210a4 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Oct 23, 2022
@3u13r 3u13r force-pushed the feat/wireguardP2PStrictmode branch from 690e4da to 299bcc8 Compare October 23, 2022 23:58
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Oct 23, 2022
@3u13r 3u13r force-pushed the feat/wireguardP2PStrictmode branch from 299bcc8 to b5ba062 Compare October 24, 2022 00:40
@gandro gandro self-requested a review October 24, 2022 10:21
@gandro gandro added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. labels Nov 2, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 2, 2022
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this contribution. I focused on the agent side of things only, few comments.

daemon/cmd/daemon_main.go Outdated Show resolved Hide resolved
install/kubernetes/cilium/values.yaml Outdated Show resolved Hide resolved
pkg/option/config.go Outdated Show resolved Hide resolved
@gandro gandro requested a review from brb November 2, 2022 09:47
@3u13r 3u13r requested a review from a team as a code owner November 7, 2022 12:58
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

pkg/option/config.go Outdated Show resolved Hide resolved
pkg/option/config.go Outdated Show resolved Hide resolved
pkg/option/config.go Outdated Show resolved Hide resolved
pkg/option/config.go Outdated Show resolved Hide resolved
pkg/option/config.go Outdated Show resolved Hide resolved
@3u13r 3u13r requested a review from a team as a code owner November 9, 2022 15:38
@3u13r
Copy link
Contributor Author

3u13r commented Nov 9, 2022

Thanks for running the tests. I hope I fixed all of the linting errors. I will try to investigate the other errors as soon as they arise.

@brb
Copy link
Member

brb commented Jun 7, 2023

The CI failure looks legit. The IPsec tests, which are executed after the newly added WG tests, started to fail. In particular, the connectivity between Pods:

time="2023-06-06T17:50:58Z" level=debug msg="running local command: kubectl exec -n 202306061750k8sdatapathconfigtransparentencryptiondirectrouting testclient-7jrvs -- ping -W 5 -c 5 10.0.0.254"
time="2023-06-06T17:51:07Z" level=error msg="Error executing local command 'kubectl exec -n 202306061750k8sdatapathconfigtransparentencryptiondirectrouting testclient-7jrvs -- ping -W 5 -c 5 10.0.0.254'" error="exit status 1"
time="2023-06-06T17:51:07Z" level=error msg="Error executing local command 'kubectl exec -n 202306061750k8sdatapathconfigtransparentencryptiondirectrouting testclient-7jrvs -- ping -W 5 -c 5 10.0.0.254'" error="exit status 1"
cmd: "kubectl exec -n 202306061750k8sdatapathconfigtransparentencryptiondirectrouting testclient-7jrvs -- ping -W 5 -c 5 10.0.0.254" exitCode: 1 duration: 9.189357468s stdout:
PING 10.0.0.254 (10.0.0.254) 56(84) bytes of data.

I think we are hitting some form of #14959 with the CI

@3u13r Did you check cilium-agent logs to see whether they hit the same symptoms?

@3u13r
Copy link
Contributor Author

3u13r commented Jun 7, 2023

The CI failure looks legit. The IPsec tests, which are executed after the newly added WG tests, started to fail. In particular, the connectivity between Pods:

time="2023-06-06T17:50:58Z" level=debug msg="running local command: kubectl exec -n 202306061750k8sdatapathconfigtransparentencryptiondirectrouting testclient-7jrvs -- ping -W 5 -c 5 10.0.0.254"
time="2023-06-06T17:51:07Z" level=error msg="Error executing local command 'kubectl exec -n 202306061750k8sdatapathconfigtransparentencryptiondirectrouting testclient-7jrvs -- ping -W 5 -c 5 10.0.0.254'" error="exit status 1"
time="2023-06-06T17:51:07Z" level=error msg="Error executing local command 'kubectl exec -n 202306061750k8sdatapathconfigtransparentencryptiondirectrouting testclient-7jrvs -- ping -W 5 -c 5 10.0.0.254'" error="exit status 1"
cmd: "kubectl exec -n 202306061750k8sdatapathconfigtransparentencryptiondirectrouting testclient-7jrvs -- ping -W 5 -c 5 10.0.0.254" exitCode: 1 duration: 9.189357468s stdout:
PING 10.0.0.254 (10.0.0.254) 56(84) bytes of data.

I think we are hitting some form of #14959 with the CI

@3u13r Did you check cilium-agent logs to see whether they hit the same symptoms?

As far as I can tell, the actual strict mode test are skipped in all 3 failed CI runs, correct? When my interpretation is that it's unlikely the strict mode test itself but the only other line that has changed in the tests i.e. the deletion of the ciliumnode resource.

@brb
Copy link
Member

brb commented Jun 7, 2023

kubectl.DeleteResource("ciliumnode", "--all --wait")

Do we actually need those?

@3u13r
Copy link
Contributor Author

3u13r commented Jun 7, 2023

kubectl.DeleteResource("ciliumnode", "--all --wait")

Do we actually need those?

Yes, we need to delete the ciliumnode resouce in order to change the PodCIDR during the strict mode tests. We basically need to clean up the allocated PodCIDR for each node after each strict mode test.
Moving the clean-up to only the strict mode tests itself could theoretically be possible though.

@brb
Copy link
Member

brb commented Jun 7, 2023

Moving the clean-up to only the strict mode tests itself could theoretically be possible though.

Yep, but that would be a ticking bomb, as WG and IPsec tests might eventually end up running in the same job. Anyway, if the transition of the tests from the Ginkgo to ci-e2e / BPF unit tests (I think both can be rewritten into the BPF unit tests) happens soon, then ACK this approach.

@brb
Copy link
Member

brb commented Jun 8, 2023

@3u13r could you do the suggested change (delete ciliumnode obj only in the strict wg tests)? The feature freeze for v1.14 is next Fri (16th June). It would be nice to get this feature merged before that date.

@joestringer joestringer added dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. labels Jun 16, 2023
@joestringer
Copy link
Member

Due to changes in CI, this will need rebasing to get CI to pass. From Martynas' last comment it looks like there's feedback remaining to address, but otherwise this seems close. We've initiated the feature freeze for v1.14 now so it'll miss that release series, but please do fix up the PR, we should be able to push this forward towards merging in a week or two once we've completed the branching process. Thanks for your continued interest in contributing to Cilium :-)

@3u13r 3u13r force-pushed the feat/wireguardP2PStrictmode branch 2 times, most recently from e92152d to 2e765ad Compare June 19, 2023 14:06
@3u13r 3u13r requested a review from a team as a code owner June 19, 2023 14:06
@3u13r
Copy link
Contributor Author

3u13r commented Jun 19, 2023

Sadly, I've been sick over the past week and the obvious changes first didn't work. With the current version I'm able to execute the K8sDatapath tests on netnext (executing the strict mode tests) and without netnext (executing the direct routing ipsec test).
Currently not sure about the failing/aborting test K8sUpstreamNetConformance / kubernetes-e2e-net-conformance (ipv4). This test seems to be successful on other branches.

@brb
Copy link
Member

brb commented Jun 20, 2023

/test

@ldelossa
Copy link
Contributor

@3u13r If you'd like to push this PR forward, can you please rebase and push? The original base this PR is one has an issue with tests. This will be fixed with a rebase.

When pod2pod encryption is enabled, there is a slight time window,
where one pod may send unencrypted data to another one. This happens
when a new pod is created but the information of the new endpoint has
not propagated to the other nodes.

To prevent this from happening, we block all unencrypted pod2pod
traffic. This is done via a filter in the datapath. The filter
is configured at the same time the datapth is set up, sine we cannot
rely on data which is only eventually updated at runtime.
The filter drops any unencrypted tcp/udp egress traffic which originates from
and is sent to the PodCIDR and also leaves the node.

Signed-off-by: Benedict Schlueter <benedict.schlueter@inf.ethz.ch>
Signed-off-by: Leonard Cohnen <lc@edgeless.systems>

Co-authored-by: Benedict Schlueter <benedict.schlueter@inf.ethz.ch>
@3u13r 3u13r force-pushed the feat/wireguardP2PStrictmode branch from 2e765ad to 269df16 Compare June 28, 2023 09:42
@gandro
Copy link
Member

gandro commented Jun 28, 2023

/test

@ldelossa ldelossa removed the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Jun 29, 2023
@ldelossa ldelossa merged commit 5da5882 into cilium:main Jun 30, 2023
65 checks passed
@gandro
Copy link
Member

gandro commented Jul 3, 2023

Glad to see this finally merged 🎉

@3u13r
Copy link
Contributor Author

3u13r commented Jul 3, 2023

Yay 🥳. Thanks a lot for the feedback and discussions from everyone! I'm excited that the first step is merged. Next tasks include IPv6 support, migrating the tests, node-to-node strict mode, ...

brb added a commit that referenced this pull request Nov 6, 2023
Previously, the strict encrypt check [1] was running in bpf_overlay (in
addition to bpf_host). That particular check was assuming that no
pod-to-pod unencrypted packet should be seen by bpf_overlay.

However, after the previous commit it's no longer the case. So, remove
the check, and only keep the one in bpf_host.

A nice side-effect of the previous commit is that for WG+tunnel we
automatically enforce the strict mode w/o relying on strict_allow().
I.e., any tunnel encaped traffic is going to be dropped until
cilium-agent has propogated destination node's IP addr into
WG's allowed-ips list for that node.

This commit also drops the WG strict mode test case for tunneling, as
the test configuration is no longer applicable, and the test is going to
be migrated to the CLI connectivity suite.

[1]: #21856

Signed-off-by: Martynas Pumputis <m@lambda.lt>
brb added a commit that referenced this pull request Nov 7, 2023
Previously, the strict encrypt check [1] was running in bpf_overlay (in
addition to bpf_host). That particular check was assuming that no
pod-to-pod unencrypted packet should be seen by bpf_overlay.

However, after the previous commit it's no longer the case. So, remove
the check, and only keep the one in bpf_host.

A nice side-effect of the previous commit is that for WG+tunnel we
automatically enforce the strict mode w/o relying on strict_allow().
I.e., any tunnel encaped traffic is going to be dropped until
cilium-agent has propogated destination node's IP addr into
WG's allowed-ips list for that node.

This commit also drops the WG strict mode test case for tunneling, as
the test configuration is no longer applicable, and the test is going to
be migrated to the CLI connectivity suite.

[1]: #21856

Signed-off-by: Martynas Pumputis <m@lambda.lt>
aanm pushed a commit that referenced this pull request Nov 7, 2023
Previously, the strict encrypt check [1] was running in bpf_overlay (in
addition to bpf_host). That particular check was assuming that no
pod-to-pod unencrypted packet should be seen by bpf_overlay.

However, after the previous commit it's no longer the case. So, remove
the check, and only keep the one in bpf_host.

A nice side-effect of the previous commit is that for WG+tunnel we
automatically enforce the strict mode w/o relying on strict_allow().
I.e., any tunnel encaped traffic is going to be dropped until
cilium-agent has propogated destination node's IP addr into
WG's allowed-ips list for that node.

This commit also drops the WG strict mode test case for tunneling, as
the test configuration is no longer applicable, and the test is going to
be migrated to the CLI connectivity suite.

[1]: #21856

Signed-off-by: Martynas Pumputis <m@lambda.lt>
pjablonski123 pushed a commit to pjablonski123/cilium that referenced this pull request Dec 15, 2023
Previously, the strict encrypt check [1] was running in bpf_overlay (in
addition to bpf_host). That particular check was assuming that no
pod-to-pod unencrypted packet should be seen by bpf_overlay.

However, after the previous commit it's no longer the case. So, remove
the check, and only keep the one in bpf_host.

A nice side-effect of the previous commit is that for WG+tunnel we
automatically enforce the strict mode w/o relying on strict_allow().
I.e., any tunnel encaped traffic is going to be dropped until
cilium-agent has propogated destination node's IP addr into
WG's allowed-ips list for that node.

This commit also drops the WG strict mode test case for tunneling, as
the test configuration is no longer applicable, and the test is going to
be migrated to the CLI connectivity suite.

[1]: cilium#21856

Signed-off-by: Martynas Pumputis <m@lambda.lt>
@burgerdev burgerdev deleted the feat/wireguardP2PStrictmode branch January 24, 2024 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. kind/community-contribution This was a contribution made by a community member. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet