Skip to content

Fix fully-qualified dns name#686

Merged
lhotari merged 3 commits into
apache:masterfrom
smbecker:sb-fix-dnsnames
May 18, 2026
Merged

Fix fully-qualified dns name#686
lhotari merged 3 commits into
apache:masterfrom
smbecker:sb-fix-dnsnames

Conversation

@smbecker
Copy link
Copy Markdown
Contributor

@smbecker smbecker commented May 8, 2026

Motivation

The generated dnsNames on certificates were incorrect for fully-qualified service names. The template emitted *.<release>-<component>.<namespace>.svc.<clusterDomain> for every component, but a wildcard subdomain only makes sense for headless services that resolve individual pod hostnames (pod-name.<headless-service>....). For non-headless services (e.g. proxy, standalone) the cert should match the plain <release>-<component>.<namespace>.svc.<clusterDomain> name, and clients connecting via the fully-qualified service name would otherwise fail hostname verification.

Modifications

  • charts/pulsar/templates/_certs.tpl: emit headless SANs (with wildcard) for components backed by a headless StatefulSet (broker, zookeeper, plus bookkeeper via the existing headless block), and emit the wildcard regular-service SAN only for non-headless components. The plain <release>-<component>.<namespace>.svc.<clusterDomain> SAN is always included so clients can connect via the fully-qualified service name.
  • .ci/helm.sh: update the producer/consumer smoke tests to use the fully-qualified DNS names (pulsar-ci-standalone.<namespace>.svc.cluster.local, pulsar-ci-proxy.<namespace>.svc.cluster.local) so TLS hostname verification exercises the corrected cert SANs.

Verifying this change

  • Make sure that the change passes the CI checks.

@lhotari
Copy link
Copy Markdown
Member

lhotari commented May 8, 2026

The generated dnsNames configuration on certificates is incorrect for fully qualified names. It generates *.\u003Crelease>-\u003Ccomponent>.namespace.svc.local rather than \u003Crelease>-\u003Ccomponent>.namespace.svc.local

I'm assuming that a wildcard certificate should also exist so that TLS hostname verification works when connections are made to a pod in a headless service used for either the broker or zookeeper STS. Please adds CI tests to cover possible gaps.

@smbecker
Copy link
Copy Markdown
Contributor Author

When I tested this locally, the wildcard does not allow me to hit the service using the fully-qualified name. Only removing the wildcard allows the fully-qualified name to work.

@smbecker
Copy link
Copy Markdown
Contributor Author

I updated the ci tests to use the fully qualified name when testing producer/consumer. Is that sufficient?

@lhotari
Copy link
Copy Markdown
Member

lhotari commented May 18, 2026

Thanks for the clarification @smbecker. I dug into the templates and I think the PR is half-right — it correctly drops *. on the regular service line, but the wildcard on the headless service line is still needed. Some details below.

Why the wildcard is still needed on the headless line

TLS wildcards (RFC 6125) match only one DNS label, so *.foo.bar matches x.foo.bar but not foo.bar itself. That's why your test passed once *. was removed from the regular service line — but the same rule means the wildcard on the headless line is covering a different set of names entirely:

  • broker-statefulset.yaml:33 uses serviceName: {{ pulsar.broker.service.headless }}. StatefulSet pods get DNS records of the form <pod>.<release>-broker-headless.<ns>.svc.<clusterDomain> (e.g. pulsar-ci-broker-0.pulsar-ci-broker-headless.default.svc.cluster.local). These pod-level FQDNs are what brokers advertise to each other and to clients/proxies during lookups, so they need to match the SAN.
  • Same story for zookeeper StatefulSet via zookeeper-headless-service.
  • standalone-deployment.yaml:90-91,158 sets hostname: standalone + subdomain: {{ pulsar.standalone.service.headless }} and runs --advertised-address "$(hostname -f)", which expands to standalone.<release>-standalone-headless.<ns>.svc.<clusterDomain> — again a pod-level FQDN under the headless service.

Without *.<release>-<component>-headless.<ns>.svc.<clusterDomain> in the SAN, TLS hostname verification will fail for any client (or peer broker) that uses the advertised/pod FQDN. The local test in this PR connects to the regular service FQDN, which doesn't exercise this path — that's why it looks OK locally.

Suggested change

{{- if or (eq .componentConfig.component "broker") (eq .componentConfig.component "zookeeper") }}
- {{ printf "*.%s-%s-headless.%s.svc.%s" ... | quote }}   # keep — pod FQDNs
- {{ printf "%s-%s-headless.%s.svc.%s" ... | quote }}     # optional — bare headless name
{{- end }}
- {{ printf "%s-%s.%s.svc.%s" ... | quote }}              # PR change is correct here
- {{ printf "%s-%s" ... | quote }}

The wildcard for the regular service was indeed useless (DNS for x.<regular-service>.<ns>... doesn't resolve) and your PR's change there is good — just please keep the wildcard on the headless line.

CI gap

The current ci::test_pulsar_producer_consumer only verifies client → proxy/standalone connectivity. To catch this regression we'd want a test that exercises a pod-level FQDN under a headless service with TLS — e.g. a broker → broker or proxy → broker lookup that ends up using <pod>.<release>-broker-headless..., or a direct TLS connect from the toolset to a broker pod via its headless FQDN. Could you add something along those lines?

Comment thread charts/pulsar/templates/_certs.tpl
Comment thread charts/pulsar/templates/_certs.tpl
…s services used for BK, ZK and broker statefulset pods
@lhotari lhotari merged commit 6b7ab04 into apache:master May 18, 2026
76 of 78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants