Bug: with ECS, /etc/hosts file in application container pollutes the envoy container as well #471

cormac-ainc · 2023-07-07T05:53:48Z

Summary
Following the Troubleshooting guide, I appended some lines to /etc/hosts in my application container so that the envoy proxy could pick up those requests. It turns out that, in awsvpc networking mode, /etc/hosts is actually a volume mount, and so modifying the /etc/hosts file on the application container ALSO modifies it for the envoy proxy container.

This prevents envoy from discovering the correct IP addresses for the backend hosts. Instead, it uses whatever random non-loopback address you put in the hosts file, and returns 404 for every request you ever do.

Steps to Reproduce

Make an ECS cluster + service. Enable the enable-execute-command flag.
(Optional: Register a virtual service in an App Mesh cluster called backend-service.mesh.local, and spin up a task that registers it there.)
Define an application container + envoy sidecar in the task definition etc etc etc as per usual. Tell the envoy sidecar to register in a virtual node that has backend-service.mesh.local as one of its backends. The application container should be:

FROM nginx:alpine
RUN echo "10.10.10.10\tbackend-service.mesh.local" >> /etc/hosts

Then

aws ecs execute-command --cluster <cluster> --task <task> --container app --interactive --command '/bin/sh'
   $ cat /etc/hosts
   ...
   10.10.10.10    backend-service.mesh.local
   # that's all as expected.... but...
   
   $ apk add curl
   $ curl http://backend-service.mesh.local -vvv
   ...
   < HTTP/1.1 404 Not Found
   < date: Fri, 07 Jul 2023 05:42:50 GMT
   < server: envoy
   < content-length: 0

   # envoy is looking very confused
   $ curl http://localhost:9901/clusters
   ...
   cds_egress_mesh_service_http_80::10.10.10.10:80::hostname::backend-service.mesh.local
   cds_egress_mesh_service_http_80::10.10.10.10:80::health_flags::/failed_active_hc/active_hc_timeout
   
   # it really is a shared volume mount of some kind...
   $ echo "hello" > /etc/hosts

aws ecs execute-command --cluster <cluster> --task <task> --container envoy --interactive --command '/bin/sh'
   $ cat /etc/hosts
   ...
   10.10.10.10    backend-service.mesh.local
   hello

Are you currently working around this issue?
I am seriously thinking of using a competitor? Idk, this cluster isn't in production yet.

The only other options according to the docs are spinning a DNS resolver, or a Route 53 private hosted zone
I do not feel like spinning up anything more complicated than a hosts file to serve a dummy IP address. It is a single 32-bit number. Please.
I am not convinced that making a private hosted zone won't also affect the envoy container, leading to exactly the same problem, so I almost can't be bothered trying.

The text was updated successfully, but these errors were encountered:

cormac-ainc · 2023-07-10T03:01:13Z

Never mind. I missed one single line of terraform, setting the user id of the envoy container to 1337 to match the IgnoredUID in the proxy configuration on the task definition, so envoy itself was also subject to the proxying. Man, that was tough to diagnose.

cormac-ainc added the Bug Something isn't working label Jul 7, 2023

cormac-ainc closed this as completed Jul 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: with ECS, /etc/hosts file in application container pollutes the envoy container as well #471

Bug: with ECS, /etc/hosts file in application container pollutes the envoy container as well #471

cormac-ainc commented Jul 7, 2023 •

edited

Loading

cormac-ainc commented Jul 10, 2023

Bug: with ECS, /etc/hosts file in application container pollutes the envoy container as well #471

Bug: with ECS, /etc/hosts file in application container pollutes the envoy container as well #471

Comments

cormac-ainc commented Jul 7, 2023 • edited Loading

cormac-ainc commented Jul 10, 2023

cormac-ainc commented Jul 7, 2023 •

edited

Loading