Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service without selector but with matching Endpoint will cause timeout #12438

Closed
galexrt opened this issue Jul 6, 2020 · 7 comments
Closed
Assignees
Labels
kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers.

Comments

@galexrt
Copy link

galexrt commented Jul 6, 2020

Bug report

General Information

  • Cilium version (run cilium version)
Client: 1.8.1 5ce2bc7b3 2020-07-02T20:04:47+02:00 go version go1.14.4 linux/amd64
Daemon: 1.8.1 5ce2bc7b3 2020-07-02T20:04:47+02:00 go version go1.14.4 linux/amd64

Cilium is installed to replace the kube-proxy, kubeadm init has been run with the skip-phase kube-proxy flag!

  • Kernel version (run uname -a)

Ubuntu 20.04 5.4.x and Fedora 32 5.7.7-200

  • Orchestration system version in use (e.g. kubectl version, Mesos, ...)
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", GitCommit:"e6503f8d8f769ace2f338794c914a96fc335df0f", GitTreeState:"clean", BuildDate:"2020-06-26T03:47:41Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", GitCommit:"e6503f8d8f769ace2f338794c914a96fc335df0f", GitTreeState:"clean", BuildDate:"2020-06-26T03:39:24Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  • Link to relevant artifacts (policies, deployments scripts, ...)
  • Generate and upload a system zip: Please let me know if sysdumps are needed in this case.

How to reproduce the issue

A Service without a selector with a matching Endpoint object created does not work (timeout when talking to the Service ClusterIP). cilium service list does not show the Service when the Service doesn't have a selector: but a matching Endpoint object.

Steps:

kubectl create -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      run: my-nginx
  template:
    metadata:
      labels:
        run: my-nginx
    spec:
      containers:
      - image: nginx
        imagePullPolicy: Always
        name: my-nginx
        ports:
        - containerPort: 80
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
---
apiVersion: v1
kind: Service
metadata:
  name: my-nginx
  namespace: default
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  type: ClusterIP
  selector:
    run: my-nginx
status:
  loadBalancer: {}
EOF
kubectl rollout status deployment my-nginx
SERVICE_IP="$(kubectl get service/my-nginx -o jsonpath='{.spec.clusterIP}')"
curl -v $SERVICE_IP:80

curl should return "default" nginx index / welcome page.Run commands:

kubectl get svc my-nginx -o yaml > service.yaml
kubectl get endpoints my-nginx -o yaml > endpoints.yaml
kubectl delete -f service.yaml

Remove the label selector: from the service.yaml.

vim service.yaml endpoints.yaml
kubectl create -f service.yaml
kubectl create -f endpoints.yaml
SERVICE_IP="$(kubectl get service/my-nginx -o jsonpath='{.spec.clusterIP}')"
curl -v $SERVICE_IP:80

curl should timeout / fail here now, even though there is a "perfectly" valid Endpoints object with Pod target endpoints in it.

In minikube this works as expected that the Service without a selector but a matching Endpoint object is still reachable. (edited)

(see https://app.slack.com/client/T1MATJ4SZ/threads/thread/C53TG4J4R-1593434412.267800)

@borkmann borkmann added this to WIP (Martynas + Daniel) in 1.9 kube-proxy removal & general dp optimization Jul 7, 2020
@borkmann borkmann added kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels Jul 7, 2020
@borkmann borkmann moved this from WIP (Martynas + Daniel) to WIP (Daniel) in 1.9 kube-proxy removal & general dp optimization Jul 7, 2020
@galexrt
Copy link
Author

galexrt commented Jul 7, 2020

Sysdump attached to this comment.

cilium-sysdump-20200707-103546.zip

@borkmann borkmann moved this from WIP (Daniel) to WIP (Deepesh) in 1.9 kube-proxy removal & general dp optimization Aug 21, 2020
@fristonio
Copy link
Member

I was able to reproduce the issue and I have figured out the reason this is happening. In the K8s version where we have support for EndpointSlices we don't start a watcher for Endpoint objects.
https://github.com/cilium/cilium/blob/master/pkg/k8s/watchers/watcher.go#L439-L450
In the above example, we are creating the endpoint object that is associated with the service. This is in turn never picked by the watcher and the service remains without a backend due to which cilium does not have a service entry for the mentioned service. As a simple fix, instead of creating an Endpoint object, you should create EndpointSlices object associated with the service.

I will start a discussion on slack as to how we should handle these kinds of issues and will push a fix if we need one.

@fristonio
Copy link
Member

Edit on how to reproduce this:
Make sure you have installed cilium with kube-proxy-replacement=strict mode and you are running a K8s version that has support for EndpointSlices.

@fristonio fristonio added sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. and removed needs/triage This issue requires triaging to establish severity and next steps. labels Aug 26, 2020
@brb
Copy link
Member

brb commented Aug 27, 2020

@fristonio Good catch! Is this a duplicate of #12513 then?

@brb brb unassigned brb and borkmann Aug 27, 2020
@galexrt
Copy link
Author

galexrt commented Aug 27, 2020

@brb Yeah, seems like it from my point of view. I'll update my K8S cluster to v1.19.x and report back here / in #125123 till end of the week(end).

@fristonio
Copy link
Member

Tested on Kubernetes 1.19 with Kubeadm, seems to work out fine in with the latest Kubernetes release. @brb

@brb brb closed this as completed Aug 27, 2020
@galexrt
Copy link
Author

galexrt commented Aug 27, 2020

I can confirm it too now, updating the K8S cluster to v1.19.0 fixed the issue thanks to the added EndpointSlices Mirroring controller.

fristonio added a commit that referenced this issue Aug 27, 2020
Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com>
borkmann pushed a commit that referenced this issue Aug 27, 2020
Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com>
christarazi pushed a commit that referenced this issue Sep 3, 2020
[ upstream commit 326487a ]

Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
joestringer pushed a commit that referenced this issue Sep 3, 2020
[ upstream commit 326487a ]

Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers.
Projects
None yet
Development

No branches or pull requests

4 participants