Skip to content

fix(box): make CRI reachable over UDS — patch h2 to accept grpc-go authority#10

Merged
ZhiXiao-Lin merged 3 commits into
release/v2.0.4from
fix/cri-grpc-uds-authority
May 31, 2026
Merged

fix(box): make CRI reachable over UDS — patch h2 to accept grpc-go authority#10
ZhiXiao-Lin merged 3 commits into
release/v2.0.4from
fix/cri-grpc-uds-authority

Conversation

@ZhiXiao-Lin
Copy link
Copy Markdown
Contributor

P0-#0: a3s-box-cri was unreachable by every standard gRPC client

While bringing up the critest scoreboard (CRI-maturity roadmap #2) on a /dev/kvm host, crictl could not even connect — every call failed at CRI-API validation:

rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: PROTOCOL_ERROR

Root cause

grpc-go >= 1.57 (used by crictl, the kubelet, and critest) sends the percent-encoded UDS socket path as the HTTP/2 :authority pseudo-header (e.g. %2Frun%2Fa3s-box.sock). The server's h2 layer rejects it:

h2::server: malformed headers: malformed authority (b"%2F…%2F….sock"): invalid authority
→ send Reset { stream_id: 1, error_code: PROTOCOL_ERROR }

So the request is killed before any CRI RPC runs. The crictl_smoke integration test is #[ignore], which is why this was never caught.

Why not a version bump

  • h2 #487 (relax UDS authority) was never merged into any release.
  • grpc/grpc #38142 was closed "not planned" — the client behaviour is permanent, and crictl/kubelet expose no authority override.

tonic/prost are used only in the cri crate (the runtime gRPC uses raw UnixStream framing), so the fix is contained to a workspace patch.

Fix

Vendor h2 0.3.27 into third_party/h2 with a surgical relaxation in src/server.rs: when :authority fails to parse and looks like a UDS path (empty, leading /, contains %2F, or ends in .sock), drop it instead of resetting the stream — gRPC routes by :path, so the authority is unused server-side. Wired via [patch.crates-io].

Verification (real /dev/kvm host)

Step Before After
crictl version ❌ PROTOCOL_ERROR a3s-box v2.0.4, ApiVersion v1
images / runp / create / start / ps ✅ pod Ready, container Running
stop / rm / stopp / rmp ✅ all succeed
exec / logs / stats ⚠️ unwired (tracked separately)

Release build + clippy -p a3s-box-cri -D warnings clean with the patch in place.

Refs: hyperium/h2#487, grpc/grpc#38142

Roy Lin added 3 commits May 31, 2026 16:38
…thority

a3s-box-cri rejected every standard gRPC client (crictl, the kubelet,
critest). grpc-go >= 1.57 sends the percent-encoded UDS socket path as the
HTTP/2 `:authority` pseudo-header (e.g. "%2Frun%2Fa3s-box.sock"), which h2's
server validation treats as a malformed authority and answers with a
PROTOCOL_ERROR stream reset — before any CRI RPC runs. The `crictl_smoke`
integration test is `#[ignore]`, so this was never caught in CI.

The fix cannot be a version bump: h2 PR #487 (relax UDS authority) was never
merged into any release, and grpc/grpc#38142 was closed "not planned" — the
client behaviour is here to stay, and crictl/kubelet expose no authority
override. tonic/prost are used only in the cri crate (the runtime gRPC uses
raw UnixStream framing), so the fix is contained to a workspace patch.

Vendor h2 0.3.27 into third_party/h2 with a surgical relaxation: when
`:authority` fails to parse AND looks like a UDS path (empty, leading '/',
contains %2F, or ends in .sock), drop it instead of resetting the stream —
gRPC routes by `:path`, so the authority is unused server-side. Wired via
`[patch.crates-io]` in the workspace manifest.

Verified on a /dev/kvm host: crictl now completes the full pod+container
lifecycle (version, images, runp, create, start, ps, stop, rm, stopp, rmp)
against a3s-box-cri; previously every call failed at CRI-API validation with
PROTOCOL_ERROR. Streaming exec/logs/stats remain unwired (separate items).

Refs: hyperium/h2#487, grpc/grpc#38142
…il / 17 skip)

First critest scoreboard, now that the CRI is reachable over UDS. Captured on a
/dev/kvm host with test images mapped to a single cached alpine. Core
pod+container lifecycle is conformant; failures group into SecurityContext,
seccomp/AppArmor/sysctls, streaming (exec/attach), container logs, volumes/
mounts, namespaces, and networking — each mapped to a roadmap item. ~8 image
specs fail as test-setup artifacts (single-image mapping, no registry egress).
@ZhiXiao-Lin ZhiXiao-Lin merged commit d5bbc5a into release/v2.0.4 May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant