Skip to content

fix(wire): consume config.Server.CACert and PinnedCertSHA256 (fixes #25)#30

Merged
blaspat merged 2 commits into
mainfrom
fix/25-consume-tls-config
Jun 11, 2026
Merged

fix(wire): consume config.Server.CACert and PinnedCertSHA256 (fixes #25)#30
blaspat merged 2 commits into
mainfrom
fix/25-consume-tls-config

Conversation

@blaspat

@blaspat blaspat commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Summary

internal/config/config.go defined ServerConfig.CACert and ServerConfig.PinnedCertSHA256. They were loaded from TOML, passed validation, and then thrown away — internal/wire/client.go called websocket.DefaultDialer.DialContext with the zero-value TLSClientConfig, so the operator's CA bundle and pin were silently ignored. PROTOCOL.md §7 says pinning is "out of scope for v1 client config but supported" — the config side said it was supported, the wire side disagreed.

Fixes #25.

What changed

internal/wire/client.goDialOptions gains a TLSConfig *tls.Config. When non-nil, Connect copies *websocket.DefaultDialer and sets TLSClientConfig on the copy. The package-global DefaultDialer is never mutated, so concurrent dials in the same process don't share TLS state. The stale "Task 1.9 / 1.10" comment is gone. When TLSConfig is nil, behaviour is unchanged (OS CA bundle, no pin). The doc-comment explicitly notes that TLSConfig is read at dial time only and the same instance may be safely shared across reconnect cycles.

internal/config/tls.go (new)BuildTLSConfig(ServerConfig) (*tls.Config, error):

  • Both fields empty → returns (nil, nil), which wire.DialOptions treats as "use the default dialer" (the right thing for public CA-signed servers).
  • ca_cert set → loads the PEM file into a *x509.CertPool and sets it as RootCAs. Accepts a single cert, a bundle, or a chain (the standard AppendCertsFromPEM semantics).
  • pinned_cert_sha256 set → installs a VerifyConnection callback that runs after standard chain verification, hashes the leaf, and rejects on mismatch. Pin is hex-encoded SHA-256 of the DER-encoded leaf per PROTOCOL.md §7.
  • A malformed pin (non-hex, wrong length) is a configuration error rather than a silent fall-through to "trust everything" — the operator who fat-fingered the value would otherwise get the worst of both worlds (the cert looks pinned but isn't).

cmd/hermes-node/main.gorunRun builds tlsCfg once via config.BuildTLSConfig, surfaces errors as exit 1 with a clear message, and passes the result through to wire.DialOptions.TLSConfig. Build is outside the Dialer closure so reconnects reuse the same *tls.Config (the pin is captured by closure, so this is safe).

Tests

internal/config/tls_test.go — 7 cases for BuildTLSConfig:

  • both empty → (nil, nil)
  • CA-only with valid PEM → RootCAs set, no VerifyConnection
  • CA file missing → error mentioning ca_cert
  • CA file is garbage PEM → error mentioning "no certificates"
  • pin-only → VerifyConnection set, RootCAs nil
  • pin malformed (4 sub-cases: non-hex, too short, too long, right length but non-hex)
  • both set → RootCAs and VerifyConnection both set

internal/wire/client_tls_test.go — 4 cases for Connect against httptest.NewTLSServer:

  • TestConnect_AcceptsPinnedCert — pin matches leaf, dial succeeds, handshake completes
  • TestConnect_RejectsPinnedCertMismatch — pin doesn't match, VerifyConnection rejects
  • TestConnect_RejectsUntrustedCARootCAs is a pool of an unrelated self-signed cert, chain verification fails, error message contains "x509"
  • TestConnect_AcceptsCustomCARootCAs trusts the server cert, no pin, dial succeeds

internal/wire/tls_helpers_test.go — shared self-signed cert generator used by the four TLS tests.

Review

Quinn reviewed the first cut. This branch addresses all 3 of her warnings worth shipping (W1, S2, S3) and 1 of the 5 suggestions (S4):

  • W1internal/config/tls.go:23-27 was claiming "we set RootCAs to the system pool" in the pin-only path, but the code leaves RootCAs nil and lets Go's TLS auto-fill at handshake time. Runtime behaviour was right, prose was wrong. Rewrote the comment to match the code.
  • S2 — dropped the stringErr type and the two var stringErr(...) sentinels in favour of errors.New. Less type machinery, idiomatic Go.
  • S3TestConnect_RejectsUntrustedCA was asserting only err != nil. Tightened to require the substring "x509" so a refactor that changes the failure layer (e.g. moves to a custom VerifyPeer callback) breaks the test.
  • S4DialOptions.TLSConfig doc-comment now explicitly states the contract: "TLSConfig is read at dial time only; the same instance may be safely shared across reconnect cycles."

Skipped per Quinn's recommendation: S1 (return non-nil config with MinVersion in empty case — the current (nil, nil) shape is the right minimum-surface answer) and S5 (INFO log line at startup when [server] fields are present — can revisit if operators complain).

Verification

  • go test -race -count=1 ./... — 6/6 packages, 3 consecutive fresh runs (no cache)
  • go vet ./... clean
  • go build ./... clean
  • ✅ Real-binary smoke:
    • ca_cert = "/nonexistent" → exit 1 with config: ca_cert "/nonexistent/ca.pem": read: open /nonexistent/ca.pem: no such file or directory
    • pinned_cert_sha256 = "deadbeef" → exit 1 with config: pinned_cert_sha256: expected 64 hex chars (sha256), got 4
    • Empty [server] section → no TLS errors, dials normally (regression check)

Diff stat

 cmd/hermes-node/main.go           |  10 ++
 internal/config/tls.go            | 140 +++++++++++++++++
 internal/config/tls_test.go       | 163 ++++++++++++++++++
 internal/wire/client.go           |  37 +++-
 internal/wire/client_tls_test.go  | 313 +++++++++++++++++++++++++++++++++++++
 internal/wire/tls_helpers_test.go |  51 +++++
 6 files changed, 707 insertions(+), 7 deletions(-)

Two commits:

  1. 1f8d4d0 — original fix
  2. 1961292 — Quinn's review fixes (W1, S2, S3, S4)

Relationship to #21

This ships alongside the recently-merged #21 fix. The release that turned the binary from a stub into a real WSS client made this gap user-visible for the first time: today the binary doesn't connect anywhere (so the security posture is "doesn't connect"); after #21 it connects but didn't verify the server's identity per the operator's config. Both halves belong in the same release cycle.

blaspat added 2 commits June 11, 2026 17:44
The ServerConfig had ca_cert and pinned_cert_sha256 fields that
were loaded and validated but never applied to the WSS handshake.
DefaultDialer.DialContext was used directly, so the operator's
pin and custom CA were silently ignored. PROTOCOL.md §7 says
pinning is "out of scope for v1 client config but supported" —
the config side said it was supported, the wire side disagreed.

This commit wires the two fields through.

internal/wire/client.go:
  DialOptions gains a TLSConfig *tls.Config. When non-nil, Connect
  copies *websocket.DefaultDialer and sets TLSClientConfig on the
  copy; the package-global DefaultDialer is never mutated, so
  concurrent dials don't share TLS state. The stale
  "Task 1.9/1.10" comment is gone.

internal/config/tls.go (new):
  BuildTLSConfig(ServerConfig) returns *tls.Config or (nil, nil)
  when both fields are empty — that nil signals "use the default
  dialer", which is the right behavior for public CA-signed
  servers. When ca_cert is set, the PEM is loaded into RootCAs
  (single cert, bundle, or chain — all three accepted). When
  pinned_cert_sha256 is set, VerifyConnection hashes the leaf
  after chain verification and rejects on mismatch. A malformed
  pin (non-hex, wrong length) is a config error rather than a
  silent fall-through to "trust everything" — the operator who
  fat-fingered the value would otherwise get the worst of both
  worlds (the cert looks pinned but isn't).

cmd/hermes-node/main.go:
  runRun builds tlsCfg once via config.BuildTLSConfig, surfaces
  errors as exit 1, and passes the result through to
  wire.DialOptions.TLSConfig. Build is outside the Dialer closure
  so reconnects reuse the same *tls.Config (VerifyConnection
  captures the pin by closure, so this is safe).

Tests:
  internal/config/tls_test.go: 7 cases for BuildTLSConfig —
  both empty, CA-only, missing CA file, garbage PEM, pin-only,
  malformed pin (4 sub-cases), both set. Each test exercises a
  distinct failure mode the operator could trip over.

  internal/wire/client_tls_test.go: 4 cases for Connect against
  httptest.NewTLSServer — accepts valid pin, rejects pin
  mismatch, rejects untrusted RootCAs, accepts custom CA pool.
  The TLS test path uses a real wss:// endpoint (httptest TLS,
  not plain HTTP) so the chain verification is exercised end
  to end.

  internal/wire/tls_helpers_test.go: small shared test helper
  for self-signed cert generation, since the four TLS tests
  need it.

Verification:
  - go test -race ./... — 6/6 packages, 3 consecutive fresh
    runs (no caching), all green
  - go vet ./... clean
  - go build ./... clean
  - Real-binary smoke: bad ca_cert \u2192 exit 1 with clear error,
    bad pin \u2192 exit 1, empty [server] \u2192 no TLS errors, dials
    normally

No DCO sign-off (per project convention).

Fixes #25.
W1 \u2014 internal/config/tls.go:23-27 misdescribed the pin-only
  code path. Comment claimed we set RootCAs to the system pool
  in the pin-only branch; the code leaves RootCAs nil and lets
  Go's TLS stack auto-fill at handshake time. Runtime behavior
  is right, prose was wrong. Rewrote the comment to match the
  code.

S2 \u2014 internal/wire/client_tls_test.go: dropped the stringErr
  type and the two sentinel-via-string-type errors in favour
  of errors.New. Less type machinery, idiomatic Go.

S3 \u2014 internal/wire/client_tls_test.go:TestConnect_RejectsUntrustedCA
  was asserting only err != nil. Tightened to require the
  substring "x509" so a refactor that changes the failure
  layer (e.g. moves to a custom VerifyPeer callback) still has
  to keep chain verification firing for the test to pass.

S4 \u2014 internal/wire/client.go:DialOptions.TLSConfig: added
  one paragraph making the contract explicit: TLSConfig is
  read at dial time only and the same instance may be safely
  shared across reconnect cycles. VerifyConnection captures
  the pin bytes by closure; RootCAs is a *x509.CertPool the
  caller should treat as immutable once handed over. A
  future maintainer should not construct a per-call closure
  here.

Skipped per Quinn's recommendation:
  S1 (return non-nil config with MinVersion in empty case)
  S5 (INFO log line at startup when [server] fields present)

Verification:
  - go test -race -count=1 ./... \u2014 6/6 packages, 3 fresh
    runs green
  - go vet ./... clean
  - go build ./... clean
  - Real-binary smoke: bad ca_cert \u2192 exit 1, bad pin \u2192 exit 1,
    empty [server] \u2192 clean SIGTERM shutdown
@blaspat blaspat merged commit 1495fa7 into main Jun 11, 2026
@blaspat blaspat deleted the fix/25-consume-tls-config branch June 12, 2026 04:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

config.Server.CACert and PinnedCertSHA256 are read but never consumed

1 participant