Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gateway API on EKS not working as expected #27493

Closed
2 tasks done
Smana opened this issue Aug 14, 2023 · 10 comments
Closed
2 tasks done

Gateway API on EKS not working as expected #27493

Smana opened this issue Aug 14, 2023 · 10 comments
Labels
feature/k8s-gateway-api kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/agent Cilium agent related. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.

Comments

@Smana
Copy link
Contributor

Smana commented Aug 14, 2023

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

I'm not able to configure the gateway.

My Cilium configuration seems ok:

cilium-cli config view | grep -w "enable-gateway-api"
enable-gateway-api                                true
enable-gateway-api-secrets-sync                   true

I followed this guide and when I create the gateway along with the httproute it creates a loadbalancer:

kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/HEAD/examples/kubernetes/gateway/basic-http.yaml
gateway.gateway.networking.k8s.io/my-gateway created
httproute.gateway.networking.k8s.io/http-app-1 created

kubectl get svc
NAME                        TYPE           CLUSTER-IP       EXTERNAL-IP                                                                    PORT(S)        AGE
cilium-gateway-my-gateway   LoadBalancer   172.20.100.22    k8s-default-ciliumga-fa85db4e7d-83a59527c5457ff3.elb.eu-west-3.amazonaws.com   80:30311/TCP   4s

The HTTP route seems to be properly configured:

kubectl describe httproutes.gateway.networking.k8s.io http-app-1 | grep -A20 'Status:'
Status:
  Parents:
    Conditions:
      Last Transition Time:  2023-08-15T08:32:55Z
      Message:               Service reference is valid
      Observed Generation:   1
      Reason:                ResolvedRefs
      Status:                True
      Type:                  ResolvedRefs
      Last Transition Time:  2023-08-15T08:32:55Z
      Message:               Accepted HTTPRoute
      Observed Generation:   1
      Reason:                Accepted
      Status:                True
      Type:                  Accepted
    Controller Name:         io.cilium/gateway-controller
    Parent Ref:
      Group:      gateway.networking.k8s.io
      Kind:       Gateway
      Name:       my-gateway
      Namespace:  default
Events:           <none>

However, the gateway never gets ready:

Status:
  Conditions:
    Last Transition Time:  2023-08-14T16:22:53Z
    Message:               Unable to create Service resource
    Observed Generation:   1
    Reason:                NoResources
    Status:                False
    Type:                  Accepted
    Last Transition Time:  2023-08-14T16:22:53Z
    Message:               Address is not ready
    Observed Generation:   1
    Reason:                ListenersNotReady
    Status:                False
    Type:                  Programmed
  Listeners:
    Attached Routes:  1
    Conditions:
      Last Transition Time:  2023-08-14T16:22:53Z
      Message:               Listener Programmed
      Observed Generation:   1
      Reason:                Programmed
      Status:                True
      Type:                  Programmed
      Last Transition Time:  2023-08-14T16:22:53Z
      Message:               Listener Accepted
      Observed Generation:   1
      Reason:                Accepted
      Status:                True
      Type:                  Accepted
    Name:                    web-gw
    Supported Kinds:
      Group:  gateway.networking.k8s.io
      Kind:   HTTPRoute

Even though the curl command reaches the envoy service, it returns a 404:

curl --fail -s http://15.188.175.188/details/1 -vvv
* processing: http://15.188.175.188/details/1
*   Trying 15.188.175.188:80...
* Connected to 15.188.175.188 (15.188.175.188) port 80
> GET /details/1 HTTP/1.1
> Host: 15.188.175.188
> User-Agent: curl/8.2.1
> Accept: */*
> 
< HTTP/1.1 404 Not Found
< date: Mon, 14 Aug 2023 16:36:18 GMT
< server: envoy
< content-length: 0
* The requested URL returned error: 404
* Closing connection

Am I missing something on EKS? Note that I'm using a Kyverno workaround to set the annotations properly.

Cilium Version

1.14.0

Kernel Version

I'm not able to ssh to the instance right now but it's the latest bottlerocket AMI. I'll try to enable SSM.

Kubernetes Version

1.27

Sysdump

cilium-sysdump-20230814-183832.zip

Relevant log output

No response

Anything else?

I'm writing a blog post about Cilium with Gateway API on EKS. My code is here if you want to look at it.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Smana Smana added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels Aug 14, 2023
@Smana
Copy link
Contributor Author

Smana commented Aug 15, 2023

I just tried again in order to try that with Amazon LInux instances (kernel version: 5.10.184-175.749.amzn2.x86_64). I got exactly the same behavior. The gateway is not properly configured, the status Programmed=False.

@ti-mo ti-mo added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. sig/agent Cilium agent related. feature/k8s-gateway-api and removed needs/triage This issue requires triaging to establish severity and next steps. labels Aug 16, 2023
@sergeyshevch
Copy link
Contributor

sergeyshevch commented Aug 17, 2023

Same behaviour. Created dupe issue for this. Here is a resources and logs

Gateway with status

apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: rest-external
spec:
  gatewayClassName: cilium
  listeners:
    - allowedRoutes:
        namespaces:
          from: All
      hostname: '*.myhost'
      name: web-gw
      port: 443
      protocol: HTTP
status:
  conditions:
    - lastTransitionTime: "2023-08-17T09:19:21Z"
      message: Unable to create Service resource
      observedGeneration: 2
      reason: NoResources
      status: "False"
      type: Accepted
    - lastTransitionTime: "2023-08-17T09:19:21Z"
      message: Address is not ready
      observedGeneration: 2
      reason: ListenersNotReady
      status: "False"
      type: Programmed
  listeners:
    - attachedRoutes: 1
      conditions:
        - lastTransitionTime: "2023-08-17T09:19:21Z"
          message: Listener Programmed
          observedGeneration: 2
          reason: Programmed
          status: "True"
          type: Programmed
        - lastTransitionTime: "2023-08-17T09:19:21Z"
          message: Listener Accepted
          observedGeneration: 2
          reason: Accepted
          status: "True"
          type: Accepted
      name: web-gw
      supportedKinds:
        - group: gateway.networking.k8s.io
          kind: HTTPRoute

Service

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:us-east-1:123456789012:certificate/12345678-1234-1234-1234-123456789012
    service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
  finalizers:
    - service.k8s.aws/resources
  labels:
    io.cilium.gateway/owning-gateway: rest-external
  name: cilium-gateway-rest-external
  namespace: gateway-api
  ownerReferences:
    - apiVersion: gateway.networking.k8s.io/v1beta1
      kind: Gateway
      name: rest-external
      uid: 66cbf2c8-67ff-44cd-aa66-862c922d8114
spec:
  allocateLoadBalancerNodePorts: true
  clusterIP: 10.4.59.0
  clusterIPs:
    - 10.4.59.0
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ipFamilies:
    - IPv4
  ipFamilyPolicy: SingleStack
  loadBalancerClass: service.k8s.aws/nlb
  ports:
    - name: port-443
      nodePort: 32762
      port: 443
      protocol: TCP
      targetPort: 443
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
      - hostname: actualhostname-1234567890.us-east-1.elb.amazonaws.com

And some logs from cilium operator

level=error msg="Unable to create Service" controller=gateway error="Service \"cilium-gateway-rest-external\" is invalid: spec.loadBalancerClass: Invalid value: \"null\": may not change once set" resource=gateway-api/rest-external subsys=gateway-controller

@sergeyshevch
Copy link
Contributor

@Smana I find another workaround on this issue. You need to slightly tune policy that you using (afaik also written by me)

      mutate:
        patchStrategicMerge:
          metadata:
            annotations:
              service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
              service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
              service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
          spec:
            loadBalancerClass: service.k8s.aws/nlb # This is fixes issue

@Smana
Copy link
Contributor Author

Smana commented Aug 17, 2023

@sergeyshevch Cool! I'm gonna give a try right now. Just curious: how did you find this option, I've never heard about it.

@Smana Smana closed this as completed Aug 17, 2023
@Smana Smana reopened this Aug 17, 2023
@Smana
Copy link
Contributor Author

Smana commented Aug 17, 2023

Well, it doesn't seem to work on my side even though I defined the loadBalancerClass.

apiVersion: v1
kind: Service
metadata:
  annotations:
    external-dns.alpha.kubernetes.io/hostname: echo-mycluster-0.cloud.ogenki.io
    policies.kyverno.io/last-applied-patches: |
      mutate-svc-annotations.mutate-cilium-gateway-echo-gateway.kyverno.io: added /metadata/annotations
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
  labels:
    io.cilium.gateway/owning-gateway: echo-gateway
  name: cilium-gateway-echo-gateway
  namespace: echo
spec:
  allocateLoadBalancerNodePorts: true
  clusterIP: 172.20.58.62
  clusterIPs:
  - 172.20.58.62
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  loadBalancerClass: service.k8s.aws/nlb
  ports:
  - name: port-80
    nodePort: 30131
    port: 80
  type: LoadBalancer

I have this error in the cilium-operator logs:

cilium-operator-76c4f97d54-mtlbg cilium-operator level=error msg="Unable to create Service" controller=gateway error="Service \"cilium-gateway-echo-gateway\" is invalid: spec.loadBalancerClass: Invalid value: \"null\": may not change once set" resource=echo/echo-gateway subsys=gateway-controller

@Smana
Copy link
Contributor Author

Smana commented Aug 20, 2023

I managed to get it work using only Classic LB. I had to configure the AWS LoadBalancer Controller accordingly with the option enableServiceMutatorWebhook set to false.

kubectl get svc -n echo cilium-gateway-echo-gateway -o yaml | head -n 10
apiVersion: v1
kind: Service
metadata:
  annotations:
    external-dns.alpha.kubernetes.io/hostname: echo-mycluster-0.cloud.ogenki.io
    policies.kyverno.io/last-applied-patches: |
      mutate-svc-annotations.mutate-cilium-gateway-echo-gateway.kyverno.io: added /metadata/annotations
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
...
GATEWAY=$(kubectl get gateway echo-gateway -n echo -o jsonpath='{.status.addresses[0].value}')

curl http://$GATEWAY |jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2491  100  2491    0     0   101k      0 --:--:-- --:--:-- --:--:--  105k
{
  "host": {
    "hostname": "aa4ae59f99b1b4d20b74f35a06aa7ab7-228497968.eu-west-3.elb.amazonaws.com",
    "ip": "::ffff:10.0.15.188",
    "ips": []
  },
  "http": {
    "method": "GET",
    "baseUrl": "",
    "originalUrl": "/",
...

Well I don't know what's the recommendations here, as the Classic LB are deprecated, I guess I have to wait for this issue to be addressed.

Note: It also works fine with the https example.

@sergeyshevch
Copy link
Contributor

@Smana It works with my workaround policy. But you need to apply policy first and then recreate the gateway (or just underlying service)

@Smana
Copy link
Contributor Author

Smana commented Sep 10, 2023

@sergeyshevch thank you I just tried one more time and indeed that works fine. thank you!

@lucidprogrammer
Copy link

Did anyone managed to get http to https redirect working with the configuration mentioned in this issue? I tried the redirect http route and getting infinite redirect. Any ideas?

@Smana
Copy link
Contributor Author

Smana commented Dec 13, 2023

@lucidprogrammer , I think you're facing this issue kubernetes-sigs/gateway-api#1185

bdalpe added a commit to bdalpe/cilium that referenced this issue Mar 1, 2024
Related to the comment here: cilium#30038 (comment)

If the Service contains either label indicating that it is managed by Cilium, return all nodes as LoadBalancer IPs, or filter the nodes based on an Annotation applied to the Service `io.cilium.nodeipam/match-labels`.

Otherwise, leaves original Node IPAM LB behavior intact for other non-Cilium managed Services.

Services for Gateways still need to have a mutating webhook apply the loadBalancerClass value, as there is not currently a way to do this with the Gateway config. See cilium#27493.

Signed-off-by: Brendan Dalpe <bdalpe@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/k8s-gateway-api kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/agent Cilium agent related. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

No branches or pull requests

4 participants