Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: KubernetesEndpointResolver doesn’t work #3369

Closed
eddy-curv opened this issue Apr 28, 2021 · 4 comments
Closed

Regression: KubernetesEndpointResolver doesn’t work #3369

eddy-curv opened this issue Apr 28, 2021 · 4 comments
Assignees
Labels
t:bug Something isn't working

Comments

@eddy-curv
Copy link

eddy-curv commented Apr 28, 2021

Describe the bug
I’ve been trying to enable k8s endpoint resolver to use more advanced load balancing. Despite precisely following the documentation and the official helm charts I still can’t get it to work.

To Reproduce
Steps to reproduce the behavior:

  1. Use Ambassador OSS 1.13.1 (single namespace) with configuration from k8s service annotations.
  2. Define rbac according to the official helm chart (single namespace)
  3. Don't no CRDs and related rbac
  4. Define KubernetesEndpointResolver
  5. use the resolver in one of the mappings (round_robin load_balancer policy for example) to a NodePort k8s service

Expected behavior
I would expect endpoint based resolving to work.

Versions (please complete the following information):

  • Ambassador OSS 1.13.1
  • GKE 1.17

Additional context

Some details

  • I use single namespace configuration (AMBASSADOR_SINGLE_NAMESPACE="true")
  • Ambassador is configured using K8s service annotations
  • Architecture: Ingress(GCE)->Ambassador->services(nodePort)
  • In the Ambassador logs I saw no healthy host for HTTP connection pool in the context of services that use the resolver. defined rbac in a single namespace scope (official helm chart for reference) didn’t create CRDs (since k8s service annotation are used for configuration.

What I tired to do

  • Although I’m running in a single namespace configuration (namespace is automatically prefixed to generated envoy clusters), I even tried adding the namespace as a suffix (.${AMBASSADOR_NAMESPACE}) to service like suggested by @cindy Mullins, didn’t help.
  • deleting the ports from the service (i.e. service: notification-http:80 -> service: notification-http)
  • adding namespace to KubernetesEndpointResolver definition
  • Changing the k8s service to ClusterIP from NodePort

Attached is

  • yaml spec of one mapping which uses the resolver
  • useful screenshots from Ambassador diagnostic UI
  • Ambassador module configuration
  • TLSContext is omitted for bravity

Needless to say that when I delete the resolver and load_balancer keys from the mappings - everything works as before, so I know that my configuration is correct and there is a specific issue with KubernetesEndpointResolver.
My guess is that the misconfiguration is related to namespaces, because there are no permissions related error logs in the a8r container.

---
apiVersion: getambassador.io/v2
kind: Mapping
name: notification-server
prefix: /api/notifications/
precedence: 10
service: notification-http:80
rewrite: ""
timeout_ms: 10000
resolver: endpoint
load_balancer:
  policy: round_robin
labels:
  ambassador:
    - header_request_label:
        - pathheader:
            header: ":path"
        - headerkey:
            header: ":method"
    - authorization_request_label:
        - authorizationheader:
            header: "Authorization"
            omit_if_not_present: true
    - proxy_authenticate_request_label:
        - proxyauthenticateheader:
            header: "Proxy-Authenticate"
            omit_if_not_present: true
remove_response_headers:
  - x-envoy-upstream-service-time
  - x-envoy-upstream-healthchecked-cluster
  - x-envoy-decorator-operation
---
apiVersion: getambassador.io/v2
kind: Module
name: ambassador
config:
  service_port: 443
  diagnostics:
    enabled: true
  liveness_probe:
    enabled: true
  readiness_probe:
    enabled: true
  use_ambassador_namespace_for_service_resolution: true
  preserve_external_request_id: true
  default_label_domain: ambassador
  default_labels:
    ambassador:
      defaults:
        - remote_address
  # allow retry + json log outputs
  envoy_log_type: json
  enable_http10: true
---
apiVersion: getambassador.io/v2
kind: KubernetesEndpointResolver
name: endpoint

Screen Shot 2021-04-28 at 11 09 51
Screen Shot 2021-04-28 at 11 10 04
Screen Shot 2021-04-28 at 11 54 53
Screen Shot 2021-04-28 at 11 55 58

Also this is a log from Ambassador when a request is made to another service that uses the resolver:

{"bytes_sent":"84","upstream_cluster":"cluster_ui_http_80_c11726v18_er_round_robin","downstream_remote_address":"194.5.53.169:0","path":"/","authority":"c11726v18.dev.curv.co","protocol":"HTTP/2","upstream_service_time":"-","upstream_local_address":"-","duration":"3002","downstream_local_address":"10.44.11.71:443","upstream_transport_failure_reason":"-","dd.trace_id":"7099345945080560459","response_code":"503","response_flags":"LR","requested_server_name":"-","bytes_received":"0","istio_policy_status":"-","dd.span_id":"7099345945080560459","user_agent":"curl/7.64.1","start_time":"2021-04-28T14:44:57.925Z","method":"GET","request_id":"5ef335a7-ec34-985d-97a1-6c46ddc6ad52","upstream_host":"10.44.17.68:80","x_forwarded_for":"194.5.53.169, 35.244.148.53,10.142.0.24"}
@eddy-curv
Copy link
Author

eddy-curv commented Apr 28, 2021

I’ve noticed the following behaviour.

  1. deleting/restarting the pods exposed by the k8s service in the mapping which uses the endpoints resolver
  2. deleting/restarting Ambassador pods

Now Ambassador will discover the endpoints and route traffic to the relevant pods.
I don’t understand why it works like that, shouldn’t endpoint discovery be automatic? our deployments are upgraded quite frequently and I don’t see a reason to restart Ambassador after each deployment.

@khussey khussey changed the title KubernetesEndpointResolver doesn’t work Regression: KubernetesEndpointResolver doesn’t work Apr 28, 2021
@khussey khussey added the t:bug Something isn't working label Apr 28, 2021
@khussey khussey added this to the 2021 Cycle 3 Cool-down milestone Apr 28, 2021
@eddy-curv
Copy link
Author

Hi, @khussey
I've noticed that you assigned the ticket to a developer. So can you confirm that this is indeed a bug on Ambassador's side?
Thanks,

@kflynn
Copy link
Member

kflynn commented Apr 29, 2021

@eddy-curv Hey! We can confirm it’s a bug, we have a fix thanks to @acookin’s fast work, and we expect it to be in your hands very soon. 🙂

@khussey
Copy link
Contributor

khussey commented Apr 30, 2021

This is fixed in the 1.13.2 release, which is now available.

@khussey khussey closed this as completed Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants