Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate "ownership" of hostPort service being deleted #22587

Merged
merged 1 commit into from Jul 7, 2023

Conversation

yasz24
Copy link
Contributor

@yasz24 yasz24 commented Dec 6, 2022

This addresses a bug where deleting a completed pod would
delete the service map entry for the corresponding running pod on
the same node. This is due to the hostPort mapping keys being the
same for both old and new pods.

Ideally we should validate whether the (completed) pod being deleted
"owns" the host port service -- in order to prevent breaking host port
connectivity for any running pods with the same service as frontend.

This PR adds such a validation.

Testing-
Automated (Control Plane):
This fix is captured by a control plane test case that does the
following:

  1. Create a hostport pod, and terminate it's running containers to mark
    it as "Completed".
  2. Create another hostport pod using the same port as the "Completed" pod.
  3. Delete the "Completed" pod, and verify that the hostport service has
    not been deleted in the Datapath.

Manual Testing -

  1. Add the GracefulNodeShutdown in the kubelet config on all nodes by
    modifying the configuration in /var/lib/kubelet/config.yaml
     featureGates:
       GracefulNodeShutdown: true
     shutdownGracePeriod: 30s
     shutdownGracePeriodCriticalPods: 10s
    
  2. Run sudo systemctl restart kubelet on each node to apply the kubelet
    config change
  3. Deploy an nginx web server with hostPort set, as well as a
    nodeSelector, so pods get scheduled on the same node after node
    restarts.
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment
    spec:
      selector:
        matchLabels:
          app: nginx
     replicas: 1
     template:
       metadata:
         labels:
           app: nginx
       spec:
         nodeSelector:
       kubernetes.io/hostname: <node-name>
     containers:
     - name: nginx
           image: nginx:1.14.2
           ports:
           - containerPort: 80
             hostPort: 8081
    
  4. Run systemctl reboot on the worker node to restart the machine.
  5. After reboot spot the old pod in Completed state, while the new pod
    is Running.
    $ kubectl get pods
    NAME                                READY   STATUS      RESTARTS    AGE
    nginx-deployment-645797c867-8p2hp   0/1     Completed   0           13m
    nginx-deployment-645797c867-dx2m8   1/1     Running     0           4m2s
    
  6. curl nodeIP:hostPort successfully get the result.
  7. Manually deleted the old pod which is in Completed state.
    $ kubectl delete pod/nginx-deployment-645797c867-8p2hp
    
  8. Redo the curl nodeIP:hostPort, and successfully get the result again. // hostPort service has been preserved.

Fixes: #22460

Signed-off-by: Yash Shetty yashshetty@google.com

@yasz24 yasz24 requested a review from a team as a code owner December 6, 2022 23:21
@maintainer-s-little-helper
Copy link

Commit 704e71af6c06ac5edc5dae6f0babf71d4e012466 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@yasz24 yasz24 requested a review from squeed December 6, 2022 23:21
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Dec 6, 2022
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Dec 6, 2022
@maintainer-s-little-helper
Copy link

Commits 704e71af6c06ac5edc5dae6f0babf71d4e012466, 16b827c2c54f8ff20a20658a61f685d2bbc6bd67 do not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Dec 6, 2022
Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR. What happens in case there is only one pod running on the node and that pod is removed? IF we are skipping the generation of service mappings will the entries in the bpf maps be deleted?

@aanm aanm added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. labels Dec 6, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Dec 6, 2022
@aanm aanm requested a review from aditighag December 6, 2022 23:33
Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 The fix doesn't look right to me, see the inline comment.

pkg/k8s/watchers/pod.go Outdated Show resolved Hide resolved
@yasz24
Copy link
Contributor Author

yasz24 commented Dec 7, 2022

Thank you for the PR. What happens in case there is only one pod running on the node and that pod is removed? IF we are skipping the generation of service mappings will the entries in the bpf maps be deleted?

If the single pod on the node is in running status, we would delete the map entries. But, as pointed out, we might still want to delete service map entries for completed status pods that were not hostPort (pods associated with completed jobs, etc.), so we'll need to find another way to fix the hostPort connectivity breaking i guess. Any suggestions for a possible approach would be really appreciated!

@squeed
Copy link
Contributor

squeed commented Dec 8, 2022

One possible option - though it's a pretty significant refactor - is to manage HostPort mappings via CNI capability arguments rather than via pod objects. This would then serialize hostport mappings via the kubelet.

@squeed
Copy link
Contributor

squeed commented Dec 8, 2022

I also wonder if we can elide the pod deletion entirely if the pod is newly "Completed" when we first see it?

@yasz24
Copy link
Contributor Author

yasz24 commented Dec 13, 2022

One possible option - though it's a pretty significant refactor - is to manage HostPort mappings via CNI capability arguments rather than via pod objects. This would then serialize hostport mappings via the kubelet.

@squeed IIUC, the issue isn't really caused by a race condition, so serializing the hostport mappings wouldn't help us I believe. I think what we really need is a way to know if the hostPort mappings "belong" to the "completed" pod being deleted or not. One way to do that would be to inspect the hostPort mappings for the service in question, and verifying that the IP for the "completed" pod is one of the backends. But I wasn't able to find any APIs to do a simple lookup for a service's hostPort mappings to verify this (GetService or similar on the svcManager could help accomplish this). Wondering if it would be okay to add such a method on the interface?

@maintainer-s-little-helper
Copy link

Commit b5e248597f70f568cc41b78a9bcbd79bbf61a9a0 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Dec 16, 2022
@maintainer-s-little-helper
Copy link

Commit b5e248597f70f568cc41b78a9bcbd79bbf61a9a0 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@yasz24 yasz24 changed the title Skip service map generation for non "Running" pods Validate "ownership" of hostPort service being deleted Dec 16, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Dec 16, 2022
@yasz24 yasz24 force-pushed the podHostPortBug branch 2 times, most recently from eef7982 to 9a6623e Compare December 16, 2022 23:43
@yasz24 yasz24 removed the request for review from squeed December 16, 2022 23:46
@aojea
Copy link
Contributor

aojea commented Jun 27, 2023

lgtm, two nits only about the the empty line in the end of two files

Copy link
Contributor

@squeed squeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work!

Copy link
Member

@nebril nebril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! LGTM overall, please add the missing unit test, otherwise it's good to go!

pkg/k8s/watchers/pod.go Outdated Show resolved Hide resolved
pkg/k8s/watchers/pod.go Outdated Show resolved Hide resolved
This addresses a bug where deleting a completed pod would wind up
deleting the service map entry for the corresponding running pod on
the same node, due to the hostPort mapping key being the same for
the old and the new pod.

Ideally we want to validate whether the completed pod "owns" the
host port service, before deleting it, thus preventing breakage of
host port connectivity for any running pods with the same service
as frontend.

This commit adds such a validation.

Testing-
Automated (Control Plane):
This fix is captured by a control plane test case that does the
following:
1. Create a hostport pod, and terminate it's running containers to mark
   it as "Completed".
2. Create another hostport pod using the same port as the "Completed" pod.
3. Delete the "Completed" pod, and verify that the hostport service has
   not been deleted in the Datapath.

Manual Testing -
1. Add the GracefulNodeShutdown in the kubelet config on all nodes by
   modifying the configuration in `/var/lib/kubelet/config.yaml`
   ```
    featureGates:
      GracefulNodeShutdown: true
    shutdownGracePeriod: 30s
    shutdownGracePeriodCriticalPods: 10s
   ```
2. Run `sudo systemctl restart kubelet` on each node to apply the kubelet
   config change
3. Deploy an nginx web server with hostPort set, as well as a
   nodeSelector, so pods get scheduled on the same node after node
   restarts.
   ```
   apiVersion: apps/v1
   kind: Deployment
   metadata:
     name: nginx-deployment
   spec:
     selector:
       matchLabels:
         app: nginx
    replicas: 1
    template:
      metadata:
        labels:
          app: nginx
      spec:
        nodeSelector:
	  kubernetes.io/hostname: <node-name>
	containers:
	- name: nginx
          image: nginx:1.14.2
          ports:
          - containerPort: 80
            hostPort: 8081
   ```
4. Run `systemctl reboot` on the worker node to restart the machine.
5. After reboot spot the old pod in `Completed state`, while the new pod
   is `Running`.
   ```
   $ kubectl get pods
   NAME                                READY   STATUS      RESTARTS    AGE
   nginx-deployment-645797c867-8p2hp   0/1     Completed   0           13m
   nginx-deployment-645797c867-dx2m8   1/1     Running     0           4m2s
   ```
6. `curl nodeIP:hostPort` successfully get the result.
7. Manually deleted the old pod which is in Completed state.
   ```
   $ kubectl delete pod/nginx-deployment-645797c867-8p2hp
   ```
8. Redo the `curl nodeIP:hostPort`, and successfully get the result again.    // hostPort service has been preserved.

Signed-off-by: Yash Shetty <yashshetty@google.com>
@aanm aanm self-requested a review July 4, 2023 09:28
@aanm aanm removed stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. pinned These issues are not marked stale by our issue bot. labels Jul 4, 2023
@aanm
Copy link
Member

aanm commented Jul 4, 2023

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 6, 2023
@borkmann borkmann merged commit af4d761 into cilium:main Jul 7, 2023
63 of 65 checks passed
@borkmann borkmann added the needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch label Jul 7, 2023
@borkmann
Copy link
Member

borkmann commented Jul 7, 2023

@yasz24 looks like the PR broke Go code precheck test on master, could you take a look and fix? Thanks

https://github.com/cilium/cilium/actions/runs/5483316043/jobs/9989522587

[...]
contrib/scripts/lock-check.sh
contrib/scripts/check-viper.sh
contrib/scripts/custom-vet-check.sh
-: # github.com/cilium/cilium/test/controlplane/pod/hostport
Error: test/controlplane/pod/hostport/hostport.go:39:2: not enough arguments in call to test.UpdateObjectsFromFile(abs("init.yaml")).SetupEnvironment(modConfig).StartAgent
	have ()
	want (func(*"github.com/cilium/cilium/pkg/option".DaemonConfig), ...cell.Cell)
Error: test/controlplane/pod/hostport/hostport.go:41:20: too many arguments in call to test.UpdateObjectsFromFile(abs("init.yaml")).SetupEnvironment
	have (func(daemonCfg *"github.com/cilium/cilium/pkg/option".DaemonConfig, _ *"github.com/cilium/cilium/operator/option".OperatorConfig))
	want ()
Error: /home/runner/work/cilium/cilium/src/github.com/cilium/cilium/test/controlplane/pod/hostport/hostport.go:41:20: too many arguments in call to test.UpdateObjectsFromFile(abs("init.yaml")).SetupEnvironment
	have (func(daemonCfg *"github.com/cilium/cilium/pkg/option".DaemonConfig, _ *"github.com/cilium/cilium/operator/option".OperatorConfig))
	want ()
Error: /home/runner/work/cilium/cilium/src/github.com/cilium/cilium/test/controlplane/pod/hostport/hostport.go:42:[14](https://github.com/cilium/cilium/actions/runs/5483316043/jobs/9989522587#step:4:15): not enough arguments in call to test.UpdateObjectsFromFile(abs("init.yaml")).SetupEnvironment(modConfig).StartAgent
	have ()
	want (func(*"github.com/cilium/cilium/pkg/option".DaemonConfig), ...cell.Cell)
customvet: 3 errors during loading
exit status 1
make: *** [Makefile:653: precheck] Error 1
Error: Process completed with exit code 2.

@pippolo84
Copy link
Member

@yasz24 looks like the PR broke Go code precheck test on master, could you take a look and fix? Thanks

Fix is very simple, here the PR.

@yasz24 yasz24 deleted the podHostPortBug branch July 7, 2023 18:34
@jibi jibi mentioned this pull request Jul 10, 2023
19 tasks
@jibi jibi added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Jul 10, 2023
@julianwiedmann julianwiedmann added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. kind/bug This is a bug in the Cilium logic. kind/community-contribution This was a contribution made by a community member. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pod hostPort not reachable after node restart when GracefulNodeShutdown kublet config set