Skip to content

fix: auto-detect system DNS, add forward cache with TTL to resolver#43

Merged
tito merged 1 commit intomainfrom
mathieu/dns-resolution-and-cache
Apr 9, 2026
Merged

fix: auto-detect system DNS, add forward cache with TTL to resolver#43
tito merged 1 commit intomainfrom
mathieu/dns-resolution-and-cache

Conversation

@tito
Copy link
Copy Markdown
Contributor

@tito tito commented Apr 9, 2026

Closes GreyhavenHQ/greywall#52

Problem

Under sustained load, greyproxy was performing a live DNS lookup on every new upstream connection via net.DefaultResolver.LookupIP(). With large payloads creating many concurrent in-flight connections, the system resolver would saturate, causing timeouts and NXDOMAIN failures that cascaded into broken connections.

Additionally, the DNS proxy forwarder had 1.1.1.1:53 hardcoded, which could get rate-limited under benchmark load.

Changes

Auto-detect system DNS upstream

  • Removes the hardcoded 1.1.1.1:53 from greyproxy.yml; the DNS proxy forwarder is now injected at startup from the host's system resolver
  • On Linux/macOS: reads /etc/resolv.conf, with a fallback to /run/systemd/resolve/resolv.conf when only the systemd-resolved stub (127.0.0.53) is present (handles container environments)
  • On Windows: reads from the registry
  • Falls back to 1.1.1.1:53 only when detection fails entirely
  • If the user already has a forwarder configured in their YAML, it is left completely untouched (opt-out)

TTL-aware forward DNS cache in the resolver plugin

  • Switches from net.DefaultResolver.LookupIP() to a direct miekg/dns query against the detected system resolver, which returns the actual record TTL
  • Cache TTL is clamped between 10s (floor, so aggressive CDN records don't defeat the cache) and 5m (ceiling, so providers with TTL=86400 don't serve stale IPs all day)
  • Falls back to net.DefaultResolver with a 30s TTL when the raw query fails (mDNS, split-horizon DNS, etc.)

Static localhost mappings

  • Adds 127.0.0.1 and ::1 as static entries for localhost in hosts-0 so gost never triggers a DNS lookup for localhost under load

Behaviour

Situation Result
No forwarder in config System DNS auto-detected and injected
Detection fails Falls back to 1.1.1.1:53
User has a forwarder configured Left completely alone

Startup logs two lines confirming which DNS server is active for both the forwarder and the resolver plugin.

- Remove hardcoded 1.1.1.1:53 from greyproxy.yml; DNS proxy forwarder is
  now injected at startup from the host's system resolver. If the user
  already has a forwarder configured it is left untouched. Falls back to
  1.1.1.1:53 only when detection fails.
- Detect system DNS on Linux/macOS via /etc/resolv.conf with a fallback to
  /run/systemd/resolve/resolv.conf when only the systemd-resolved stub
  (127.0.0.53) is present (handles container environments). Windows reads
  from the registry.
- Add TTL-aware forward DNS cache to the Resolver plugin using miekg/dns
  to query the system resolver directly. Cache TTL is clamped between 10s
  and 5m to handle both aggressive CDN records and overly long provider
  TTLs. Falls back to net.DefaultResolver with a 30s TTL when the raw
  query fails.
- Add static localhost mappings (127.0.0.1, ::1) to hosts-0 so gost never
  triggers a DNS lookup for localhost under load.
@tito tito merged commit a0d7d95 into main Apr 9, 2026
3 checks passed
@tito tito deleted the mathieu/dns-resolution-and-cache branch April 9, 2026 15:17
tito added a commit that referenced this pull request Apr 14, 2026
## Summary
- Stop bypassing systemd-resolved in `linuxMacDNSServers()`: read
`/etc/resolv.conf` verbatim instead of swapping the stub out for the raw
upstreams in `/run/systemd/resolve/resolv.conf`.
- Fall back to `/run/systemd/resolve/stub-resolv.conf` only when
`/etc/resolv.conf` is absent (minimal images). The raw upstream file is
never read.
- Add `sysdns_test.go` covering the 127.0.0.53 regression,
multi-nameserver order, IPv6 bracketing, malformed lines, and missing
files.

## Why
0.4.1 (#43) replaced 127.0.0.53 with the raw Mullvad upstream from
`/run/systemd/resolve/resolv.conf`, so greyproxy tried plain UDP/53 to
an upstream that systemd-resolved had been reaching over DoT. On hosts
configured with `DNSOverTLS=opportunistic` (matclab's setup in #47) the
direct UDP path is unreachable and every lookup times out. Letting
queries flow through 127.0.0.53 keeps DoT, DNSSEC, and split-DNS intact.

The container concern the original code cited (127.x.x.x being
container-local) doesn't apply: greyproxy runs on the host, and the
sandboxed client connects to it via the host's loopback, so 127.0.0.53
is reachable.

Fixes #47

## Test plan
- [x] `go test ./cmd/greyproxy/ -run TestResolvConf -v`
- [x] `go build ./...`
- [x] `go vet ./cmd/greyproxy/`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dns issue with wsl

1 participant