Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix L7 ingress proxy in endpoint routes mode #8930

Merged
merged 5 commits into from
Aug 27, 2019

Conversation

joestringer
Copy link
Member

@joestringer joestringer commented Aug 16, 2019

Fix per-endpoint route mode with the cilium-health endpoint, and configure
it such that the daemon prefers to connect over IPv4 rather than IPv6 when
both protocols are enabled in the daemon. Otherwise we hit CI issues in
the endpoint routes mode datapath configuration test (See #8956).

Fix up the per-endpoint route mode tests so they actually test per-endpoint
routes in the supported direct routing mode rather than in tunnel mode.

Fix routing from a host proxy towards local pods in endpoint routes mode:

When endpoint routes are enabled, we don't need to install the ip rules
or routes to ensure that the traffic from the ingress proxy goes into
cilium_host; Instead, we configure BPF programs attached to the TC
egress filter of the endpoint device to ensure that it implements the
policy logic.

Treat this case as datapath mode "routed", where we will now skip those
rules. All other datapath configuration should be equivalent to direct
mode.

If we don't do this, then when applying L7 ingress policy, the traffic
from the ingress proxy towards the pod is first directed towards the
cilium_host device, then subsequently passed to the stack and re-routed
towards the endpoint device, which causes the BPF to lose the context
that this traffic comes from the proxy and hence the BPF program treats
it as newly inbound traffic from the world. As a result, inbound HTTP
requests see a message like:

```
$ kubectl exec tiefighter -- curl -s -XPOST 192.168.73.130/v1/request-landing                                                                                                                                                                  
upstream connect error or disconnect/reset before headers. reset reason: connection failure
```

This fixes the following policy cases:

  • Ingress L7 policy in direct routing mode for traffic from the same node
  • Ingress L7 policy in direct routing mode for traffic from a remote node

This change is Reviewable

@joestringer joestringer added wip sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. labels Aug 16, 2019
@joestringer joestringer requested review from a team August 16, 2019 00:29
@joestringer joestringer added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Aug 16, 2019
@joestringer
Copy link
Member Author

This currently causes some havoc with cilium-health IPv6 connectivity which I'm still figuring out. It's easy to reproduce with enable-ipv6: true and enable-endpoint-routes: true, and may be something to do with the new routed mode in bpf/init.sh.

@coveralls
Copy link

coveralls commented Aug 16, 2019

Coverage Status

Coverage decreased (-0.02%) to 44.05% when pulling 9dc92b5 on joestringer:submit/fix-l7-ingress-rules into 125fe08 on cilium:master.

@ianvernon ianvernon added this to To Merge To Master in 1.6.0-rc6 Aug 16, 2019
@ianvernon ianvernon removed this from To Merge To Master in 1.6.0-rc6 Aug 16, 2019
@ianvernon ianvernon added this to To Merge To Master in 1.6.1 Aug 16, 2019
@joestringer joestringer force-pushed the submit/fix-l7-ingress-rules branch 2 times, most recently from 1ef149b to 0460d73 Compare August 19, 2019 21:22
@joestringer
Copy link
Member Author

joestringer commented Aug 19, 2019

test-me-please

EDIT: I think we're finally making progress. Now it's only connectivity to the remote node's cilium-health endpoint IP that is broken in direct-routing mode.

@joestringer joestringer requested review from a team as code owners August 20, 2019 23:13
@joestringer joestringer added the ci/fail-fast This label makes a CI build fail immediately if any test across all test suites fails. label Aug 20, 2019
@joestringer
Copy link
Member Author

test-me-please

@joestringer
Copy link
Member Author

test-me-please

@joestringer joestringer removed the ci/fail-fast This label makes a CI build fail immediately if any test across all test suites fails. label Aug 21, 2019
@joestringer
Copy link
Member Author

test-me-please

@joestringer
Copy link
Member Author

test-me-please

@joestringer joestringer force-pushed the submit/fix-l7-ingress-rules branch 2 times, most recently from 6f05efd to 883685b Compare August 23, 2019 19:45
@joestringer
Copy link
Member Author

joestringer commented Aug 23, 2019

test-me-please

EDIT: Successful k8s-1.10 run:
https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Validated/14623/flowGraphTable/

Other runs failed due to CI infrastructure issues.

@joestringer
Copy link
Member Author

joestringer commented Aug 23, 2019

test-me-please

EDIT: Green CI run.

@joestringer joestringer changed the title Fix L7 ingress proxy in direct routing mode Fix L7 ingress proxy in endpoint routes mode Aug 26, 2019
@ianvernon ianvernon added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Aug 26, 2019
@ianvernon
Copy link
Member

@joestringer needs a rebase to deal with bpf.sha conflict

The endpoint routes mode has only ever been validated in direct routing
mode, but the test was merged without specifying this which meant that
it relied upon tunneling (and avoiding endpoint routes) for forwarding.
Fix it.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
Reorder the initialization of 'healthIP' to prefer IPv4 over IPv6 when
both are enabled. This makes the health endpoint less likely to fail out
in some of the newly introduced datapath configuration modes.

Signed-off-by: Joe Stringer <joe@cilium.io>
Previously, even when the daemon is configured with
--enable-endpoint-routes, the cilium-health endpoint was not configured
with per-endpoint routes which meant that the routing for that endpoint
is different from other endpoints. Fix it by applying the appropriate
endpoint datapath configuration.

Signed-off-by: Joe Stringer <joe@cilium.io>
When endpoint routes are enabled, we don't need to install the ip rules
or routes to ensure that the traffic from the ingress proxy goes into
cilium_host; Instead, we configure BPF programs attached to the TC
egress filter of the endpoint device to ensure that it implements the
policy logic.

Treat this case as datapath mode "routed", where we will now skip those
rules. All other datapath configuration should be equivalent to direct
mode.

If we don't do this, then when applying L7 ingress policy, the traffic
from the ingress proxy towards the pod is first directed towards the
cilium_host device, then subsequently passed to the stack and re-routed
towards the endpoint device, which causes the BPF to lose the context
that this traffic comes from the proxy and hence the BPF program treats
it as newly inbound traffic from the world. As a result, inbound HTTP
requests see a message like:

    upstream connect error or disconnect/reset before headers

Signed-off-by: Joe Stringer <joe@cilium.io>
@joestringer
Copy link
Member Author

test-me-please

@joestringer joestringer removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Aug 26, 2019
@tgraf tgraf merged commit 9eeca30 into cilium:master Aug 27, 2019
@joestringer joestringer deleted the submit/fix-l7-ingress-rules branch August 27, 2019 14:29
@ianvernon ianvernon moved this from To Merge To Master to Backport Pending in 1.6.1 Aug 28, 2019
@ianvernon ianvernon moved this from Backport Pending to Done in 1.6.1 Aug 28, 2019
@aanm aanm added the release-note/misc This PR makes changes that have no direct user impact. label Nov 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/bug This PR fixes an issue in a previous release of Cilium. release-note/misc This PR makes changes that have no direct user impact. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
No open projects
1.6.1
Done
Development

Successfully merging this pull request may close these issues.

None yet

7 participants