Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host Firewall: Attaching a label to a node before deploying Cilium breaks the host firewall #13676

Closed
qmonnet opened this issue Oct 21, 2020 · 3 comments · Fixed by #15780
Closed
Assignees
Labels
area/host-firewall Impacts the host firewall or the host endpoint. kind/bug This is a bug in the Cilium logic. pinned These issues are not marked stale by our issue bot. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers.

Comments

@qmonnet
Copy link
Member

qmonnet commented Oct 21, 2020

Bug report

When trying to deploy the host firewall, attaching a label to the node before deploying Cilium makes the agent unable to pick up the label and deploy the host policy correctly.

While the issue was observed with the host firewall, the fact that the label is not displayed in cilium endpoint list may not be specific to this feature.

General Information

  • Cilium version: v1.9.0-rc2
  • Kernel version: 4.19.112
  • Orchestration system version in use: GKE, K8s v1.15.12-gke.20

How to reproduce the issue

  1. Create a GKE cluster (see https://docs.cilium.io/en/latest/gettingstarted/k8s-install-gke/).
  2. Attach the node-access=ssh label to a node, see the GSG at https://docs.cilium.io/en/v1.9.0-rc2/gettingstarted/host-firewall/:
    $ kubectl label node $NODE_NAME node-access=ssh
    
  3. Deploy Cilium with the host firewall enabled (don't forget to disable per-endpoint routing, currently incompatible with the host firewall):
    $ kubectl create namespace cilium
    $ helm install cilium ./cilium \
      --namespace cilium \
      --set nodeinit.enabled=true \
      --set nodeinit.reconfigureKubelet=true \
      --set nodeinit.removeCbrBridge=true \
      --set cni.binPath=/home/kubernetes/bin \
      --set ipam.mode=kubernetes \
      --set nativeRoutingCIDR=$NATIVE_CIDR \
      --set hostFirewall=true \
      --set devices=eth0 \
      --set endpointRoutes.enabled=false \
      --set tunnel=disabled
    
    Note: the exact steps I ran were: a) Deploy Cilium, b) Attach the label, c) Uninstall then redeploy CIlium.
  4. To follow the GSG, enable the audit mode, then apply a policy:
    $ CILIUM_NAMESPACE=cilium
    $ CILIUM_POD_NAME=$(kubectl -n $CILIUM_NAMESPACE get pods -o json | jq -r '.items[] | select( .metadata.labels."k8s-app" == "cilium" and .spec.nodeName == env.NODE_NAME ).metadata.name')
    $ HOST_EP_ID=$(kubectl -n $CILIUM_NAMESPACE exec $CILIUM_POD_NAME -- cilium endpoint list -o jsonpath='{[?(@.status.identity.id==1)].id}')
    $ kubectl -n $CILIUM_NAMESPACE exec $CILIUM_POD_NAME -- cilium endpoint config $HOST_EP_ID PolicyAuditMode=Enabled
    Endpoint 293 configuration updated successfully
    $ kubectl -n $CILIUM_NAMESPACE exec $CILIUM_POD_NAME -- cilium endpoint config $HOST_EP_ID | grep PolicyAuditMode
    PolicyAuditMode          Enabled
    $ kubectl create -f https://raw.githubusercontent.com/cilium/cilium/v1.9.0-rc2/examples/policies/host/demo-host-policy.yaml
    ciliumclusterwidenetworkpolicy.cilium.io/demo-host-policy created
    

The node-access=ssh for the host should then appear in the list of endpoints. The ingress policy enforcement should also be marked as Disabled (Audit). What happens instead is that neither the label nor the policy are being reported:

$ kubectl -n $CILIUM_NAMESPACE exec $CILIUM_POD_NAME -- cilium endpoint list
ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                      IPv6   IPv4          STATUS   
           ENFORCEMENT        ENFORCEMENT                                                                                        
293        Disabled           Disabled          1          k8s:cloud.google.com/gke-nodepool=default-pool                        ready   
                                                           k8s:cloud.google.com/gke-os-distribution=cos                                  
                                                           reserved:host                                                                 
1867       Disabled           Disabled          4          reserved:health                                         10.60.3.167   ready

However, Kubernetes is aware of the label:

$ kubectl get nodes $NODE_NAME --show-labels
NAME                                         STATUS   ROLES    AGE   VERSION           LABELS
gke-test-cluster-default-pool-fa00e972-xziz   Ready    <none>   1d   v1.15.12-gke.20   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/fluentd-ds-ready=true,beta.kubernetes.io/instance-type=n1-standard-4,beta.kubernetes.io/os=linux,cloud.google.com/gke-nodepool=default-pool,cloud.google.com/gke-os-distribution=cos,failure-domain.beta.kubernetes.io/region=europe-west2,failure-domain.beta.kubernetes.io/zone=europe-west2-a,kubernetes.io/arch=amd64,kubernetes.io/hostname=gke-test-cluster-default-pool-fa00e972-xziz,kubernetes.io/os=linux,node-access=ssh

Attempting to display the policy verdicts for the host with cilium monitor prints nothing.

The issue can be solved by removing and re-creating the label (or possibly by simply overwriting it, I have not tried):

$ kubectl label node $NODE_NAME node-access-
$ kubectl label node $NODE_NAME node-access=ssh

After that, everything works as expected.

A note is being added to the GSG to make sure users attach the label first, in #13673.

@qmonnet qmonnet added kind/bug This is a bug in the Cilium logic. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. area/host-firewall Impacts the host firewall or the host endpoint. labels Oct 21, 2020
@stale
Copy link

stale bot commented Dec 25, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Dec 25, 2020
@pchaigno pchaigno added pinned These issues are not marked stale by our issue bot. and removed stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. labels Jan 4, 2021
@borkmann borkmann added the sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. label Jan 12, 2021
@pchaigno
Copy link
Member

pchaigno commented Apr 15, 2021

I'm currently unable to reproduce with a GKE cluster. I tried with both Cilium master and v1.9.0-rc2. I also tried two series of steps:

1. Create cluster
2. Add label
3. Install Cilium
1. Create cluster
2. Install Cilium
3. Uninstall Cilium
4. Add label
5. Install Cilium.

The one difference that may matter is the K8s version. I tried with v1.18.16-gke.502 because that's what GKE now supports and I wasn't able to get a working GKE cluster with a v1.15.12-gke.20. I'll see if I can reproduce with the development VM and K8s 1.15.

@pchaigno
Copy link
Member

I was able to reproduce in #15714. Fix is incoming.

@pchaigno pchaigno self-assigned this Apr 19, 2021
@pchaigno pchaigno removed the sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. label Apr 20, 2021
pchaigno added a commit that referenced this issue Apr 21, 2021
Our current host policy tests need to the policy on all nodes. They
therefore use an empty nodeSelector. However, if we want a basic test of
the node label watcher, we can instead implement this with a
nodeSelector matching on a label added to all nodes. This commit
implements that change.

The goal is also to help us catch such potential bugs as [1, 2].

1 - #13676
2 - #13455
Signed-off-by: Paul Chaignon <paul@cilium.io>
pchaigno added a commit that referenced this issue Apr 22, 2021
Our current host policy tests need to the policy on all nodes. They
therefore use an empty nodeSelector. However, if we want a basic test of
the node label watcher, we can instead implement this with a
nodeSelector matching on a label added to all nodes. This commit
implements that change.

The goal is also to help us catch such potential bugs as [1, 2].

1 - #13676
2 - #13455
Signed-off-by: Paul Chaignon <paul@cilium.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/host-firewall Impacts the host firewall or the host endpoint. kind/bug This is a bug in the Cilium logic. pinned These issues are not marked stale by our issue bot. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants