Skip to content

fix: use getent instead of nslookup for redis readiness check#201

Merged
vishnu-narayanan merged 2 commits intomainfrom
fix/init-redis-nslookup-hang
Apr 1, 2026
Merged

fix: use getent instead of nslookup for redis readiness check#201
vishnu-narayanan merged 2 commits intomainfrom
fix/init-redis-nslookup-hang

Conversation

@vishnu-narayanan
Copy link
Copy Markdown
Member

@vishnu-narayanan vishnu-narayanan commented Apr 1, 2026

Summary

Fixes #200
Fixes https://linear.app/chatwoot/issue/INF-65

The init-redis init container in the migration job hangs indefinitely on upgrade to 2.0.19. The root cause is that nslookup in BusyBox v1.37+ (shipped in the chatwoot app image) exits 1 when intermediate search domain lookups return NXDOMAIN, even when the final lookup succeeds. Combined with Kubernetes' default ndots:5, the until loop never exits.

This replaces nslookup with getent hosts, which uses the system C library resolver. It returns the first successful result and exits 0 regardless of search domain behavior.

Why not the other suggested fixes?

  • Revert to busybox:1.28: This was changed intentionally in 2.0.19 (for INF-61) because orgs with strict image pull policies block external images like busybox. Reverting reintroduces that problem.
  • FQDN with .Release.Namespace.svc.cluster.local: Only works for bundled Redis. The chatwoot.redis.host template can return an external hostname when redis.enabled=false, and appending .namespace.svc.cluster.local to that would break resolution.
  • getent hosts: Works for both bundled and external Redis. Uses the same resolver path as every other application in the container. Available in the chatwoot image (confirmed).

Test Plan

  • Deploy both 2.0.19 (broken) and the fix to separate namespaces on a test cluster

nslookup in BusyBox v1.37+ (shipped in the chatwoot image) exits 1
when intermediate search domain lookups return NXDOMAIN, even if the
final lookup succeeds. This causes the init-redis until loop to hang
indefinitely on clusters with ndots:5 (the Kubernetes default).

getent hosts uses the system C library resolver which returns the first
successful result and exits 0, matching the expected behavior.

Fixes #200
@vishnu-narayanan vishnu-narayanan merged commit 67f0a92 into main Apr 1, 2026
1 check passed
@vishnu-narayanan vishnu-narayanan deleted the fix/init-redis-nslookup-hang branch April 1, 2026 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] init-redis init container image change in 2.0.19 causes migrate job to hang indefinitely on upgrade

1 participant