🐞 WithServiceBinding + WithExec fails: dnsmasq addnhosts has IP entries with empty hostnames

### What is the issue?

When a consumer container has any `WithServiceBinding(alias, svc)` and is then evaluated with `WithExec(...).Stdout(ctx)` (or `.Sync(ctx)`), the exec fails immediately at host-alias setup time with:

```
lookup <hash> for hosts file: lookup <hash> on 10.87.0.1:53: no such host
lookup <hash>.<session>.dagger.local on 10.87.0.1:53: no such host
```

This happens **before the consumer's command runs at all** — the consumer's command can be a harmless `echo`, never referencing the bound alias, and it still fails. Plain `WithExec` (no service binding) works. Starting the service in isolation (`Service.Start(ctx)` + `Service.Hostname(ctx)` + `Service.Endpoint(ctx)`) also works.

Direct inspection of the engine's dnsmasq hosts file while the service is running shows the bug's mechanism:

```sh
$ docker exec dagger-engine-v0.20.8 cat /var/run/containers/cni/dnsname/dagger/addnhosts
10.87.0.2	
10.87.0.3	
...
10.87.0.17	
```

Every entry has an IP but an **empty hostname**. The expected line for the service container (e.g. `10.87.0.18 <hash>`) is never written. So when `engine/buildkit/executor_spec.go` does `net.LookupIP(<hash>)` to populate the consumer's `/etc/hosts` (around line 283), dnsmasq has no record and returns NXDOMAIN.

Looks related to #6951 (closed; same error string) and #13060 (open; different trigger). This reproduction does not use `PRIVATE` cache mounts or modules, so it's not #13060 specifically.

### Dagger version

`dagger v0.20.8 (image://registry.dagger.io/engine:v0.20.8) linux/amd64`

Also reproduces on `v0.18.19`, where the failure was *silent / indefinite hang* instead of fast error — the upgrade made it diagnosable.

### Steps to reproduce

Minimal Go test (`go 1.26`, `dagger.io/dagger v0.20.8`):

```go
//go:build daggerE2ETest

package minrepro

import (
	"context"
	"testing"
	"time"

	"dagger.io/dagger"
	"github.com/stretchr/testify/require"
)

func TestServiceBindingDNS(t *testing.T) {
	ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
	defer cancel()
	dag, err := dagger.Connect(ctx, dagger.WithLogOutput(testWriter{t}))
	require.NoError(t, err)
	t.Cleanup(func() { _ = dag.Close() })

	svc := dag.Container().
		From("alpine:3.20").
		WithExposedPort(8888).
		AsService(dagger.ContainerAsServiceOpts{
			Args: []string{"sh", "-c", "echo serving; nc -lk -p 8888 -s 0.0.0.0"},
		})

	// Consumer binds the alias but its exec never references it.
	out, err := dag.Container().
		From("alpine:3.20").
		WithServiceBinding("served", svc).
		WithExec([]string{"sh", "-c", "echo 'consumer running'; cat /etc/hosts"}).
		Stdout(ctx)
	require.NoError(t, err, "bound-but-unused consumer must succeed")
	t.Logf("stdout:\n%s", out)
}

type testWriter struct{ t *testing.T }

func (w testWriter) Write(p []byte) (int, error) {
	w.t.Logf("[dagger] %s", string(p))
	return len(p), nil
}
```

Run with `go test -tags daggerE2ETest -count=1 -v -run TestServiceBindingDNS .`.

### What we ruled out

- **Test design / runtime DNS use**: consumer's command never resolves the alias; the failure is at WithExec network setup time, not inside the container.
- **Stale state**: failure is identical on a fresh `docker rm -f` engine container.
- **Contention**: failure is identical on a near-idle host (loadavg ~5) and a loaded one.
- **IPv4-only listener / readiness probe mismatch**: explicit `nc -s 0.0.0.0` (IPv4 wildcard) makes no difference.
- **Cache key drift (#13060 trigger)**: this repro uses no `PRIVATE` cache mounts.
- **`Start(ctx)` anchoring**: pre-anchoring the service identity via `svc.Start(ctx)` before binding to a consumer makes no difference.
- **Custom hostname**: `Service.WithHostname("named")` makes no difference — same error with the chosen name in place of the random hash.

### Workaround that works

`dag.Host().Tunnel(svc).Start(ctx)` + dial the tunnel endpoint from the host process succeeds in ~1s on v0.20.8. (Tunnel was reportedly broken on v0.18.19 in similar environments.) This sidesteps the `WithServiceBinding` consumer setup path entirely.

### Where it appears to break (engine side, code references)

- Lookup site: `engine/buildkit/executor_spec.go:283` — `net.LookupIP(qualified)` returns NXDOMAIN because dnsmasq doesn't know the service hostname.
- Registration site: `cmd/dnsname/files.go:14-35` writes `IP\tpodname\t…` to addnhosts. With `podname == ""`, the empty-hostname entries we observe are produced.
- CNI args wiring: `internal/buildkit/util/network/cniprovider/cni.go:269-277` — `K8S_POD_NAME` is only passed when the namespace is created with `hostname != ""`. Pool slots (created via `pool.provider.newNS(ctx, "")`) intentionally have empty hostnames. **What's unclear is why the *service* container — which should be created via `c.newNS(ctx, hostname)` — also lacks a hostname entry in addnhosts.** Three plausible failure modes:
  1. The service container is being routed through the pool path instead of the named-hostname path.
  2. `K8S_POD_NAME` *is* passed but lost between Dagger and the dnsname plugin on this host (libcni / kernel version interaction).
  3. The plugin runs but silently fails to write the hostname for hostnamed containers.

I don't have an engine dev-loop set up to instrument further, but happy to test patches.

### Log output

```
=== RUN   TestServiceBindingDNS
    ✘ withExec sh -c 'echo …' ERROR
    ! lookup <hash> for hosts file: lookup <hash> on 10.87.0.1:53: no such host
      lookup <hash>.<session>.dagger.local on 10.87.0.1:53: no such host
--- FAIL: TestServiceBindingDNS (0.9s)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐞 WithServiceBinding + WithExec fails: dnsmasq addnhosts has IP entries with empty hostnames #13169

What is the issue?

Dagger version

Steps to reproduce

What we ruled out

Workaround that works

Where it appears to break (engine side, code references)

Log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

🐞 WithServiceBinding + WithExec fails: dnsmasq addnhosts has IP entries with empty hostnames #13169

Description

What is the issue?

Dagger version

Steps to reproduce

What we ruled out

Workaround that works

Where it appears to break (engine side, code references)

Log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions