Skip to content

[HotelReservation] Stale Service IPs accumulate in Consul due to missing Graceful Shutdown (Signal Handling) #367

@jukebox03

Description

@jukebox03

I have encountered an issue where stale (zombie) IP addresses accumulate in Consul when running the hotelReservation benchmark on Kubernetes. This leads to connection failures and high CPU usage due to persistent SYN-SENT states.

Observed Behavior: Each service pod retrieves the target service's IP address via grpc.Dial("consul://...") and listens for updates via long-polling. However, when a pod is deleted or restarted (e.g., during scaling or manual restart), the old IP address is not removed from Consul.

As a result, upstream services continue to receive dead IP addresses from Consul and attempt to connect to them. This causes:

  • A large number of connections stuck in the SYN-SENT state.
  • Unnecessary CPU waste due to continuous connection retries.
  • Duplicate entries for the same service in the Consul catalog.

Checking the Consul service catalog shows multiple duplicate and unreachable IPs for the same service (e.g., srv-rate).

$  kubectl exec -it netshoot -n hotel-res -- curl -s http://consul:8500/v1/catalog/service/srv-rate | jq '.[].ServiceAddress
"10.244.0.2"
"10.244.0.108"
"10.244.0.2"  <-- Duplicate/Stale
"10.244.0.108"
"10.244.0.2"
"10.244.0.53"
...

Root Cause: The Go microservices in hotelReservation do not seem to have a signal handler defined for SIGTERM or SIGINT. When Kubernetes terminates a pod, the application exits immediately without calling registry.Deregister(). Consequently, the service ID remains in Consul even after the pod is gone.

Suggested Fix: Implement a signal handler in the main function of each service (e.g., cmd/rate/main.go, cmd/search/main.go, etc.) to intercept syscall.SIGINT and syscall.SIGTERM. The handler should invoke registry.Deregister() to clean up the service entry before the process exits.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions