Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config: Fix incorrect packet path with IPsec and endpoint routes #17000

Merged
merged 1 commit into from Jul 29, 2021

Conversation

pchaigno
Copy link
Member

@pchaigno pchaigno commented Jul 26, 2021

When endpoint routes are enabled, we attach a BPF program on the way to the container and add a Linux route to the lxc interface. So when coming from bpf_network with IPsec, we should use that route to go directly to the lxc device and its attached BPF program.

In contrast, when endpoint routes are disabled, we run the BPF program for ingress pod policies from cilium_host, via a tail call in bpf_host. Therefore, in that case, we need to jump from bpf_network to cilium_host first, to follow the correct path to the lxc interface.

That's what commit 287f49c ("cilium: encryption, fix redirect when endpoint routes enabled") attempted to implement for when endpoint routes are enabled. It's goal was to go directly from bpf_network to the stack in that case, to use the per-endpoint Linux routes to the lxc device. That commit however implements a noop change: ENABLE_ENDPOINT_ROUTES is defined as a per-endpoint setting, but then used in bpf_network, which is not tied to any endpoint. In practice, that means the macro is defined in the ep_config.h header files used by bpf_lxc, whereas bpf_network (from which the macro is used) relies on the node_config.h header file.

The fix is therefore simple: we need to define ENABLE_ENDPOINT_ROUTES as a global config, written in node_config.h.

Click to show the reproduction steps.

To reproduce the bug and validate the fix, I deployed Cilium on GKE (where endpoint routes are enabled by default) with:

helm install cilium ./cilium --namespace kube-system \
    --set nodeinit.enabled=true \
    --set nodeinit.reconfigureKubelet=true \
    --set nodeinit.removeCbrBridge=true \
    --set cni.binPath=/home/kubernetes/bin \
    --set gke.enabled=true \
    --set ipam.mode=kubernetes \
    --set nativeRoutingCIDR=$NATIVE_CIDR \
    --set nodeinit.restartPods=true \
    --set image.repository=docker.io/pchaigno/cilium-dev \
    --set image.tag=fix-ipsec-ep-routes \
    --set operator.image.repository=quay.io/cilium/operator \
    --set operator.image.suffix="-ci" \
    --set encryption.enabled=true \
    --set encryption.type=ipsec

I then deployed the below manifest and attempted a curl request from pod client to the echo-a service.

metadata:
  name: echo-a
  labels:
    name: echo-a
spec:
  template:
    metadata:
      labels:
        name: echo-a
    spec:
      containers:
      - name: echo-a-container
        env:
        - name: PORT
          value: "8080"
        ports:
        - containerPort: 8080
        image: quay.io/cilium/json-mock:v1.3.0
        imagePullPolicy: IfNotPresent
        readinessProbe:
          timeoutSeconds: 7
          exec:
            command:
            - curl
            - -sS
            - --fail
            - --connect-timeout
            - "5"
            - -o
            - /dev/null
            - localhost:8080
  selector:
    matchLabels:
      name: echo-a
  replicas: 1
apiVersion: apps/v1
kind: Deployment
---
metadata:
  name: echo-a
  labels:
    name: echo-a
spec:
  ports:
  - name: http
    port: 8080
  type: ClusterIP
  selector:
    name: echo-a
apiVersion: v1
kind: Service
---
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "l3-rule"
spec:
  endpointSelector:
    matchLabels:
      name: client
  ingress:
  - fromEndpoints:
    - matchLabels:
        name: echo-a
---
apiVersion: v1
kind: Pod
metadata:
  name: client
  labels:
    name: client
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - echo-a
        topologyKey: kubernetes.io/hostname
  containers:
  - name: netperf
    args:
    - sleep
    - infinity
    image: cilium/netperf

Fixes: #15048
/cc @jrfastab

Fix incorrect packet path with IPsec and endpoint routes, which can cause incorrect policy drops.

When endpoint routes are enabled, we attach a BPF program on the way to
the container and add a Linux route to the lxc interface. So when coming
from bpf_network with IPsec, we should use that route to go directly to
the lxc device and its attached BPF program.

In contrast, when endpoint routes are disabled, we run the BPF program for
ingress pod policies from cilium_host, via a tail call in bpf_host.
Therefore, in that case, we need to jump from bpf_network to cilium_host
first, to follow the correct path to the lxc interface.

That's what commit 287f49c ("cilium: encryption, fix redirect when
endpoint routes enabled") attempted to implement for when endpoint
routes are enabled. It's goal was to go directly from bpf_network to the
stack in that case, to use the per-endpoint Linux routes to the lxc
device. That commit however implements a noop change:
ENABLE_ENDPOINT_ROUTES is defined as a per-endpoint setting, but then
used in bpf_network, which is not tied to any endpoint. In practice,
that means the macro is defined in the ep_config.h header files used by
bpf_lxc, whereas bpf_network (from which the macro is used) relies on
the node_config.h header file.

The fix is therefore simple: we need to define ENABLE_ENDPOINT_ROUTES as
a global config, written in node_config.h.

To reproduce the bug and validate the fix, I deploy Cilium on GKE (where
endpoint routes are enabled by default) with:

    helm install cilium ./cilium --namespace kube-system \
        --set nodeinit.enabled=true \
        --set nodeinit.reconfigureKubelet=true \
        --set nodeinit.removeCbrBridge=true \
        --set cni.binPath=/home/kubernetes/bin \
        --set gke.enabled=true \
        --set ipam.mode=kubernetes \
        --set nativeRoutingCIDR=$NATIVE_CIDR \
        --set nodeinit.restartPods=true \
        --set image.repository=docker.io/pchaigno/cilium-dev \
        --set image.tag=fix-ipsec-ep-routes \
        --set operator.image.repository=quay.io/cilium/operator \
        --set operator.image.suffix="-ci" \
        --set encryption.enabled=true \
        --set encryption.type=ipsec

I then deployed the below manifest and attempted a curl request from pod
client to the service echo-a.

    metadata:
      name: echo-a
      labels:
        name: echo-a
    spec:
      template:
        metadata:
          labels:
            name: echo-a
        spec:
          containers:
          - name: echo-a-container
            env:
            - name: PORT
              value: "8080"
            ports:
            - containerPort: 8080
            image: quay.io/cilium/json-mock:v1.3.0
            imagePullPolicy: IfNotPresent
            readinessProbe:
              timeoutSeconds: 7
              exec:
                command:
                - curl
                - -sS
                - --fail
                - --connect-timeout
                - "5"
                - -o
                - /dev/null
                - localhost:8080
      selector:
        matchLabels:
          name: echo-a
      replicas: 1
    apiVersion: apps/v1
    kind: Deployment
    ---
    metadata:
      name: echo-a
      labels:
        name: echo-a
    spec:
      ports:
      - name: http
        port: 8080
      type: ClusterIP
      selector:
        name: echo-a
    apiVersion: v1
    kind: Service
    ---
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "l3-rule"
    spec:
      endpointSelector:
        matchLabels:
          name: client
      ingress:
      - fromEndpoints:
        - matchLabels:
            name: echo-a
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: client
      labels:
        name: client
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                - echo-a
            topologyKey: kubernetes.io/hostname
      containers:
      - name: netperf
        args:
        - sleep
        - infinity
        image: cilium/netperf

Fixes: 287f49c ("cilium: encryption, fix redirect when endpoint routes enabled")
Signed-off-by: Paul Chaignon <paul@cilium.io>
@pchaigno pchaigno added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. needs-backport/1.10 labels Jul 26, 2021
@pchaigno pchaigno requested a review from a team as a code owner July 26, 2021 22:14
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.10.4 Jul 26, 2021
@pchaigno
Copy link
Member Author

test-me-please

@pchaigno
Copy link
Member Author

pchaigno commented Jul 27, 2021

Previous run failed with a Docker hub rate limiting: https://github.com/cilium/cilium/actions/runs/1069119537.

ci-l4lb

@pchaigno pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 28, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 29, 2021
@vadorovsky vadorovsky merged commit 7ef59aa into cilium:master Jul 29, 2021
@pchaigno pchaigno deleted the fix-ipsec-ep-routes branch July 29, 2021 22:23
christarazi added a commit to cilium/cilium-cli that referenced this pull request Aug 4, 2021
In the Cilium datapath, the identity "world" is a special case. If
traffic cannot be identified, then the datapath falls back to assigning
it as "world". Having only "allow-all" in the connectivity test will
mask failures in which we have datapath bugs that incorrectly assign
traffic as "world", but the traffic is still allowed. One such case is
cilium/cilium#17000.

This commit splits up the "allow-all" test into two, which now does
"allow-all" as well as "allow-all-except-world", where non-world traffic
is *not* allowed. This should cover the datapath special case.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
@joestringer
Copy link
Member

Ah, OK, yes you're right. I was confused between this and ENABLE_ROUTING which is an entirely different config based on different inputs:

$ git grep -B 1 -e ENABLE_ROUTING -e ENDPOINT_R pkg/datapath/linux/config/config.go
pkg/datapath/linux/config/config.go-    if option.Config.EnableEndpointRoutes {
pkg/datapath/linux/config/config.go:            cDefinesMap["ENABLE_ENDPOINT_ROUTES"] = "1"
--
pkg/datapath/linux/config/config.go-    if e.RequireRouting() {
pkg/datapath/linux/config/config.go:            fmt.Fprintf(fw, "#define ENABLE_ROUTING 1\n")
$ git grep ENABLE_ENDPOINT_ROUTES
bpf/lib/encrypt.h:#ifdef ENABLE_ENDPOINT_ROUTES
pkg/datapath/linux/config/config.go:            cDefinesMap["ENABLE_ENDPOINT_ROUTES"] = "1"
$ git grep ENABLE_ROUTING
bpf/Makefile:   -DENABLE_HOST_REDIRECT=1 -DENABLE_ROUTING=1 -DNO_REDIRECT=1 \
bpf/Makefile:    -DENABLE_IPV6:-DENCAP_IFINDEX:-DTUNNEL_MODE:-DPOLICY_VERDICT_NOTIFY:-DENABLE_IPV4:-DENABLE_ROUTING: \
bpf/Makefile:   -DENABLE_ROUTING -DENABLE_IPSEC -DIP_POOLS -DPOLICY_VERDICT_NOTIFY
bpf/bpf_host.c:#if defined(ENABLE_HOST_FIREWALL) && !defined(ENABLE_ROUTING)
bpf/bpf_host.c:#endif /* ENABLE_HOST_FIREWALL && !ENABLE_ROUTING */
bpf/bpf_lxc.c:#ifdef ENABLE_ROUTING
bpf/bpf_lxc.c:  if (is_defined(ENABLE_ROUTING) || hairpin_flow) {
bpf/bpf_lxc.c:#ifdef ENABLE_ROUTING
bpf/bpf_lxc.c:#endif /* ENABLE_ROUTING */
bpf/bpf_lxc.c:#if defined(ENABLE_HOST_FIREWALL) && !defined(ENABLE_ROUTING)
bpf/bpf_lxc.c:#endif /* ENABLE_HOST_FIREWALL && !ENABLE_ROUTING */
bpf/bpf_lxc.c:#ifdef ENABLE_ROUTING
bpf/bpf_lxc.c:#ifdef ENABLE_ROUTING
bpf/bpf_lxc.c:#ifdef ENABLE_ROUTING
bpf/bpf_lxc.c:  /* Allow a hairpin packet to be redirected even if ENABLE_ROUTING is
bpf/bpf_lxc.c:  if (is_defined(ENABLE_ROUTING) || hairpin_flow) {
bpf/bpf_lxc.c:#ifdef ENABLE_ROUTING
bpf/bpf_lxc.c:#endif /* ENABLE_ROUTING */
bpf/bpf_lxc.c:#if defined(ENABLE_HOST_FIREWALL) && !defined(ENABLE_ROUTING)
bpf/bpf_lxc.c:#endif /* ENABLE_HOST_FIREWALL && !ENABLE_ROUTING */
bpf/bpf_lxc.c:#ifdef ENABLE_ROUTING
bpf/bpf_lxc.c:#ifdef ENABLE_ROUTING
bpf/bpf_lxc.c:#if !defined(ENABLE_ROUTING) && defined(TUNNEL_MODE) && !defined(ENABLE_NODEPORT)
bpf/bpf_lxc.c:#endif /* ENABLE_ROUTING && TUNNEL_MODE && !ENABLE_NODEPORT */
bpf/bpf_lxc.c:#if !defined(ENABLE_ROUTING) && defined(TUNNEL_MODE) && !defined(ENABLE_NODEPORT)
bpf/bpf_lxc.c:#endif /* ENABLE_ROUTING && TUNNEL_MODE && !ENABLE_NODEPORT */
bpf/bpf_lxc.c:#if defined(ENABLE_HOST_FIREWALL) && !defined(ENABLE_ROUTING)
bpf/bpf_lxc.c:#endif /* ENABLE_HOST_FIREWALL && !ENABLE_ROUTING */
bpf/lib/encrypt.h:#endif /* ENABLE_ROUTING */
pkg/datapath/linux/config/config.go:            fmt.Fprintf(fw, "#define ENABLE_ROUTING 1\n")

@jrajahalme
Copy link
Member

@pchaigno Will this need to be revisited now that this was reverted in #17057?

@pchaigno
Copy link
Member Author

pchaigno commented Aug 6, 2021

@jrajahalme Yes.

christarazi added a commit to cilium/cilium-cli that referenced this pull request Aug 10, 2021
In the Cilium datapath, the identity "world" is a special case. If
traffic cannot be identified, then the datapath falls back to assigning
it as "world". Having only "allow-all" in the connectivity test will
mask failures in which we have datapath bugs that incorrectly assign
traffic as "world", but the traffic is still allowed. One such case is
cilium/cilium#17000.

This commit replaces the "allow-all" test with "allow-all-except-world"
(and unmanaged), thereby covering the datapath special case. We don't
want to allow unmanaged traffic either because it could also lead mark
underlying datapath bugs.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
christarazi added a commit to cilium/cilium-cli that referenced this pull request Aug 10, 2021
In the Cilium datapath, the identity "world" is a special case. If
traffic cannot be identified, then the datapath falls back to assigning
it as "world". Having only "allow-all" in the connectivity test will
mask failures in which we have datapath bugs that incorrectly assign
traffic as "world", but the traffic is still allowed. One such case is
cilium/cilium#17000.

This commit replaces the "allow-all" test with "allow-all-except-world"
(and unmanaged), thereby covering the datapath special case. We don't
want to allow unmanaged traffic either because it could also lead mark
underlying datapath bugs, such as a delay in propagation of identities.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.10 in 1.10.4 Aug 10, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.9 in 1.9.10 Aug 10, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot removed this from Backport pending to v1.10 in 1.10.4 Aug 10, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot removed this from Backport pending to v1.9 in 1.9.10 Aug 10, 2021
@pchaigno
Copy link
Member Author

I'm removing this PR from backports until we've figured out #17022.

christarazi added a commit to cilium/cilium-cli that referenced this pull request Jan 26, 2022
In the Cilium datapath, the identity "world" is a special case. If
traffic cannot be identified, then the datapath falls back to assigning
it as "world". Having only "allow-all" in the connectivity test will
mask failures in which we have datapath bugs that incorrectly assign
traffic as "world", but the traffic is still allowed. One such case is
cilium/cilium#17000.

This commit replaces the "allow-all" test with "allow-all-except-world"
(and unmanaged), thereby covering the datapath special case. We don't
want to allow unmanaged traffic either because it could also lead mark
underlying datapath bugs, such as a delay in propagation of identities.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
christarazi added a commit to cilium/cilium-cli that referenced this pull request Jan 26, 2022
In the Cilium datapath, the identity "world" is a special case. If
traffic cannot be identified, then the datapath falls back to assigning
it as "world". Having only "allow-all" in the connectivity test will
mask failures in which we have datapath bugs that incorrectly assign
traffic as "world", but the traffic is still allowed. One such case is
cilium/cilium#17000.

This commit replaces the "allow-all" test with "allow-all-except-world"
(and unmanaged), thereby covering the datapath special case. We don't
want to allow unmanaged traffic either because it could also lead mark
underlying datapath bugs, such as a delay in propagation of identities.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
christarazi added a commit to cilium/cilium-cli that referenced this pull request Feb 2, 2022
In the Cilium datapath, the identity "world" is a special case. If
traffic cannot be identified, then the datapath falls back to assigning
it as "world". Having only "allow-all" in the connectivity test will
mask failures in which we have datapath bugs that incorrectly assign
traffic as "world", but the traffic is still allowed. One such case is
cilium/cilium#17000.

This commit replaces the "allow-all" test with "allow-all-except-world"
(and unmanaged), thereby covering the datapath special case. We don't
want to allow unmanaged traffic either because it could also lead mark
underlying datapath bugs, such as a delay in propagation of identities.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
christarazi added a commit to cilium/cilium-cli that referenced this pull request Feb 9, 2022
In the Cilium datapath, the identity "world" is a special case. If
traffic cannot be identified, then the datapath falls back to assigning
it as "world". Having only "allow-all" in the connectivity test will
mask failures in which we have datapath bugs that incorrectly assign
traffic as "world", but the traffic is still allowed. One such case is
cilium/cilium#17000.

This commit replaces the "allow-all" test with "allow-all-except-world"
(and unmanaged), thereby covering the datapath special case. We don't
want to allow unmanaged traffic either because it could also lead mark
underlying datapath bugs, such as a delay in propagation of identities.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
jibi pushed a commit to cilium/cilium-cli that referenced this pull request Feb 15, 2022
In the Cilium datapath, the identity "world" is a special case. If
traffic cannot be identified, then the datapath falls back to assigning
it as "world". Having only "allow-all" in the connectivity test will
mask failures in which we have datapath bugs that incorrectly assign
traffic as "world", but the traffic is still allowed. One such case is
cilium/cilium#17000.

This commit replaces the "allow-all" test with "allow-all-except-world"
(and unmanaged), thereby covering the datapath special case. We don't
want to allow unmanaged traffic either because it could also lead mark
underlying datapath bugs, such as a delay in propagation of identities.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
aditighag pushed a commit to aditighag/cilium-cli that referenced this pull request Apr 21, 2023
In the Cilium datapath, the identity "world" is a special case. If
traffic cannot be identified, then the datapath falls back to assigning
it as "world". Having only "allow-all" in the connectivity test will
mask failures in which we have datapath bugs that incorrectly assign
traffic as "world", but the traffic is still allowed. One such case is
cilium/cilium#17000.

This commit replaces the "allow-all" test with "allow-all-except-world"
(and unmanaged), thereby covering the datapath special case. We don't
want to allow unmanaged traffic either because it could also lead mark
underlying datapath bugs, such as a delay in propagation of identities.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
michi-covalent pushed a commit to michi-covalent/cilium that referenced this pull request May 30, 2023
In the Cilium datapath, the identity "world" is a special case. If
traffic cannot be identified, then the datapath falls back to assigning
it as "world". Having only "allow-all" in the connectivity test will
mask failures in which we have datapath bugs that incorrectly assign
traffic as "world", but the traffic is still allowed. One such case is
cilium#17000.

This commit replaces the "allow-all" test with "allow-all-except-world"
(and unmanaged), thereby covering the datapath special case. We don't
want to allow unmanaged traffic either because it could also lead mark
underlying datapath bugs, such as a delay in propagation of identities.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants