fix(wire): consume config.Server.CACert and PinnedCertSHA256 (fixes #25)#30
Merged
Conversation
The ServerConfig had ca_cert and pinned_cert_sha256 fields that
were loaded and validated but never applied to the WSS handshake.
DefaultDialer.DialContext was used directly, so the operator's
pin and custom CA were silently ignored. PROTOCOL.md §7 says
pinning is "out of scope for v1 client config but supported" —
the config side said it was supported, the wire side disagreed.
This commit wires the two fields through.
internal/wire/client.go:
DialOptions gains a TLSConfig *tls.Config. When non-nil, Connect
copies *websocket.DefaultDialer and sets TLSClientConfig on the
copy; the package-global DefaultDialer is never mutated, so
concurrent dials don't share TLS state. The stale
"Task 1.9/1.10" comment is gone.
internal/config/tls.go (new):
BuildTLSConfig(ServerConfig) returns *tls.Config or (nil, nil)
when both fields are empty — that nil signals "use the default
dialer", which is the right behavior for public CA-signed
servers. When ca_cert is set, the PEM is loaded into RootCAs
(single cert, bundle, or chain — all three accepted). When
pinned_cert_sha256 is set, VerifyConnection hashes the leaf
after chain verification and rejects on mismatch. A malformed
pin (non-hex, wrong length) is a config error rather than a
silent fall-through to "trust everything" — the operator who
fat-fingered the value would otherwise get the worst of both
worlds (the cert looks pinned but isn't).
cmd/hermes-node/main.go:
runRun builds tlsCfg once via config.BuildTLSConfig, surfaces
errors as exit 1, and passes the result through to
wire.DialOptions.TLSConfig. Build is outside the Dialer closure
so reconnects reuse the same *tls.Config (VerifyConnection
captures the pin by closure, so this is safe).
Tests:
internal/config/tls_test.go: 7 cases for BuildTLSConfig —
both empty, CA-only, missing CA file, garbage PEM, pin-only,
malformed pin (4 sub-cases), both set. Each test exercises a
distinct failure mode the operator could trip over.
internal/wire/client_tls_test.go: 4 cases for Connect against
httptest.NewTLSServer — accepts valid pin, rejects pin
mismatch, rejects untrusted RootCAs, accepts custom CA pool.
The TLS test path uses a real wss:// endpoint (httptest TLS,
not plain HTTP) so the chain verification is exercised end
to end.
internal/wire/tls_helpers_test.go: small shared test helper
for self-signed cert generation, since the four TLS tests
need it.
Verification:
- go test -race ./... — 6/6 packages, 3 consecutive fresh
runs (no caching), all green
- go vet ./... clean
- go build ./... clean
- Real-binary smoke: bad ca_cert \u2192 exit 1 with clear error,
bad pin \u2192 exit 1, empty [server] \u2192 no TLS errors, dials
normally
No DCO sign-off (per project convention).
Fixes #25.
W1 \u2014 internal/config/tls.go:23-27 misdescribed the pin-only
code path. Comment claimed we set RootCAs to the system pool
in the pin-only branch; the code leaves RootCAs nil and lets
Go's TLS stack auto-fill at handshake time. Runtime behavior
is right, prose was wrong. Rewrote the comment to match the
code.
S2 \u2014 internal/wire/client_tls_test.go: dropped the stringErr
type and the two sentinel-via-string-type errors in favour
of errors.New. Less type machinery, idiomatic Go.
S3 \u2014 internal/wire/client_tls_test.go:TestConnect_RejectsUntrustedCA
was asserting only err != nil. Tightened to require the
substring "x509" so a refactor that changes the failure
layer (e.g. moves to a custom VerifyPeer callback) still has
to keep chain verification firing for the test to pass.
S4 \u2014 internal/wire/client.go:DialOptions.TLSConfig: added
one paragraph making the contract explicit: TLSConfig is
read at dial time only and the same instance may be safely
shared across reconnect cycles. VerifyConnection captures
the pin bytes by closure; RootCAs is a *x509.CertPool the
caller should treat as immutable once handed over. A
future maintainer should not construct a per-call closure
here.
Skipped per Quinn's recommendation:
S1 (return non-nil config with MinVersion in empty case)
S5 (INFO log line at startup when [server] fields present)
Verification:
- go test -race -count=1 ./... \u2014 6/6 packages, 3 fresh
runs green
- go vet ./... clean
- go build ./... clean
- Real-binary smoke: bad ca_cert \u2192 exit 1, bad pin \u2192 exit 1,
empty [server] \u2192 clean SIGTERM shutdown
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
internal/config/config.godefinedServerConfig.CACertandServerConfig.PinnedCertSHA256. They were loaded from TOML, passed validation, and then thrown away —internal/wire/client.gocalledwebsocket.DefaultDialer.DialContextwith the zero-valueTLSClientConfig, so the operator's CA bundle and pin were silently ignored. PROTOCOL.md §7 says pinning is "out of scope for v1 client config but supported" — the config side said it was supported, the wire side disagreed.Fixes #25.
What changed
internal/wire/client.go—DialOptionsgains aTLSConfig *tls.Config. When non-nil,Connectcopies*websocket.DefaultDialerand setsTLSClientConfigon the copy. The package-globalDefaultDialeris never mutated, so concurrent dials in the same process don't share TLS state. The stale "Task 1.9 / 1.10" comment is gone. WhenTLSConfigis nil, behaviour is unchanged (OS CA bundle, no pin). The doc-comment explicitly notes thatTLSConfigis read at dial time only and the same instance may be safely shared across reconnect cycles.internal/config/tls.go(new) —BuildTLSConfig(ServerConfig) (*tls.Config, error):(nil, nil), whichwire.DialOptionstreats as "use the default dialer" (the right thing for public CA-signed servers).ca_certset → loads the PEM file into a*x509.CertPooland sets it asRootCAs. Accepts a single cert, a bundle, or a chain (the standardAppendCertsFromPEMsemantics).pinned_cert_sha256set → installs aVerifyConnectioncallback that runs after standard chain verification, hashes the leaf, and rejects on mismatch. Pin is hex-encoded SHA-256 of the DER-encoded leaf per PROTOCOL.md §7.cmd/hermes-node/main.go—runRunbuildstlsCfgonce viaconfig.BuildTLSConfig, surfaces errors as exit 1 with a clear message, and passes the result through towire.DialOptions.TLSConfig. Build is outside theDialerclosure so reconnects reuse the same*tls.Config(the pin is captured by closure, so this is safe).Tests
internal/config/tls_test.go— 7 cases forBuildTLSConfig:(nil, nil)RootCAsset, noVerifyConnectionca_certVerifyConnectionset,RootCAsnilRootCAsandVerifyConnectionboth setinternal/wire/client_tls_test.go— 4 cases forConnectagainsthttptest.NewTLSServer:TestConnect_AcceptsPinnedCert— pin matches leaf, dial succeeds, handshake completesTestConnect_RejectsPinnedCertMismatch— pin doesn't match,VerifyConnectionrejectsTestConnect_RejectsUntrustedCA—RootCAsis a pool of an unrelated self-signed cert, chain verification fails, error message contains "x509"TestConnect_AcceptsCustomCA—RootCAstrusts the server cert, no pin, dial succeedsinternal/wire/tls_helpers_test.go— shared self-signed cert generator used by the four TLS tests.Review
Quinn reviewed the first cut. This branch addresses all 3 of her warnings worth shipping (W1, S2, S3) and 1 of the 5 suggestions (S4):
internal/config/tls.go:23-27was claiming "we set RootCAs to the system pool" in the pin-only path, but the code leavesRootCAsnil and lets Go's TLS auto-fill at handshake time. Runtime behaviour was right, prose was wrong. Rewrote the comment to match the code.stringErrtype and the twovar stringErr(...)sentinels in favour oferrors.New. Less type machinery, idiomatic Go.TestConnect_RejectsUntrustedCAwas asserting onlyerr != nil. Tightened to require the substring"x509"so a refactor that changes the failure layer (e.g. moves to a customVerifyPeercallback) breaks the test.DialOptions.TLSConfigdoc-comment now explicitly states the contract: "TLSConfig is read at dial time only; the same instance may be safely shared across reconnect cycles."Skipped per Quinn's recommendation: S1 (return non-nil config with
MinVersionin empty case — the current(nil, nil)shape is the right minimum-surface answer) and S5 (INFO log line at startup when[server]fields are present — can revisit if operators complain).Verification
go test -race -count=1 ./...— 6/6 packages, 3 consecutive fresh runs (no cache)go vet ./...cleango build ./...cleanca_cert = "/nonexistent"→ exit 1 withconfig: ca_cert "/nonexistent/ca.pem": read: open /nonexistent/ca.pem: no such file or directorypinned_cert_sha256 = "deadbeef"→ exit 1 withconfig: pinned_cert_sha256: expected 64 hex chars (sha256), got 4[server]section → no TLS errors, dials normally (regression check)Diff stat
Two commits:
1f8d4d0— original fix1961292— Quinn's review fixes (W1, S2, S3, S4)Relationship to #21
This ships alongside the recently-merged #21 fix. The release that turned the binary from a stub into a real WSS client made this gap user-visible for the first time: today the binary doesn't connect anywhere (so the security posture is "doesn't connect"); after #21 it connects but didn't verify the server's identity per the operator's config. Both halves belong in the same release cycle.