feat: add cycle detection debug trace for LookupResources and LookupSubjects#3036
feat: add cycle detection debug trace for LookupResources and LookupSubjects#3036Jdepp007004 wants to merge 8 commits intoauthzed:mainfrom
Conversation
…ubjects Fixes authzed#2056 - Add LookupDebugTrace proto message to all 3 Lookup dispatch responses - Add enable_debug_trace bool to all 3 Lookup dispatch requests - Add internal/graph/debug.go with zero-cost traversalTracker - Wire trackVisit in lookupresources2.go, lookupresources3.go, lookupsubjects.go - Wire debug trace enable via requestmeta header in permissions.go - Add unit tests in internal/graph/cycle_detection_test.go - Add integration tests in internal/dispatch/graph/cycle_detection_test.go
|
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
|
How would these actually be consumed if you're not changing the public proto interface? The dispatch paths are for internal communication between SpiceDB nodes - a client never invokes them directly and wouldn't receive their output unless it's propagated through the external interface. |
|
@tstirrat15 — fair point, I missed this. The trace gets populated through the dispatch layer but never actually surfaces to the client , it's dropped at the permissions.go stream handler and never sent back. |
|
@tstirrat15 - Just pushed the fixes for the trailer surfacing we talked about. As discussed: Built the Snapshot() logic to assemble the trace internally, keeping it off the public proto. Wrapped the trailer writing in a defer block so the trace still surfaces even if the lookup terminates early (like hitting a max depth error). |
|
Ah I'm sorry - this isn't going to work either. We originally put I think we also recently had a discussion about changing how debug information is propagated, which might mean that we'd want a different shape here. Lemme see if i can dig that up. |
- remove all trailer-based debug tracing (responsemeta, SerializeLookupDebugTrace) - fix streaming logic (buffer lastResp, no double send) - attach DebugInformationV2 only on final streamed response - update tests to validate DebugInformationV2 instead of trailers - clean up unused imports and legacy code Note: build errors expected until authzed/api and authzed-go updates land
1eaa4fa to
50c858b
Compare
|
Updated PR:
Build errors are expected until authzed/api changes propagate to authzed-go. |
- add root node to DebugTree for clearer traversal structure - ensure stable node identifiers for debug annotations - minor test validation improvement - no behavioral changes
- replace traversal map with stack-based trace - capture per-edge traversal frames (resource, relation, permission) - emit trace on MaxRecursionDepthExceeded when debug is enabled - ensure concurrency safety via CloneTraversalStack - remove batch-based sampling and fix incorrect traversal representation - update tests for stack-based validation Note: current implementation prioritizes trace correctness over batching by splitting dispatches per ID. This may introduce additional overhead and can be optimized in a follow-up by preserving batching while maintaining per-edge trace accuracy.
- restore batched dispatch behavior (remove per-ID dispatch) - decouple traversal trace from execution - maintain append-only traversal stack to preserve trace at error boundary - attach trace via gRPC error details on MaxRecursionDepthExceeded - fix stack being cleared before snapshot - ensure trace is non-empty and visible via status.FromError(err).Details() - update tests to validate error detail trace extraction preserves execution performance while enabling debug trace visibility
- remove placeholder identifiers (*batch*, id1,...) - ensure frames represent real traversal edges - add depth field for deterministic ordering - improve trace readability and structure - strengthen tests for ordering and validity - verify error details extraction no changes to execution behavior
… set (#3070) ## Description Opening as an alternative/continuation to #3036. This takes a slightly different approach, which is to wait until a max depth error is encountered and then build the trace as the dispatch stack is unwound. It tracks across dispatch boundaries as well. ## Notes This code is a no-op if the request flag isn't present or if a max recursion depth error isn't encountered. ## Changes Will annotate. ## Testing Review. See that tests pass. See authzed/zed#682 and its testing section for an example of code that drives this. --------- Co-authored-by: Jdepp007004 <johnnydepp050403@gmail.com> Co-authored-by: Maria Ines Parnisari <maria.ines.parnisari@authzed.com>
|
Closing in favor of #3070 |
Fixes #2056
What this does
Adds opt-in debug trace output to
LookupResourcesandLookupSubjectsthat tracksnode visits during graph traversal and flags any node visited more than once as cyclic.
Mirrors the philosophy of
CheckDebugTraceused byCheckPermission.Changes
dispatch.proto— newLookupDebugTracemessage,enable_debug_traceon all 3 Lookup requests,debug_traceon all 3 Lookup responsesinternal/graph/debug.go— sharedtraversalTracker,nodeKey,trackVisit,buildLookupDebugTrace(zero-cost when disabled)lookupresources2.go,lookupresources3.go,lookupsubjects.go—trackVisitwired at each recursion site,EnableDebugTracepropagated to nested dispatch callsinternal/services/v1/permissions.go— debug trace enabled via samerequestmeta.RequestDebugInformationgRPC header as Checkinternal/graph/cycle_detection_test.gointernal/dispatch/graph/cycle_detection_test.goHow to test
Enable tracing by setting the
x-spicedb-debug: 1gRPC metadata header on anyLookupResourcesorLookupSubjectsrequest. Cyclic nodes will appear in theLookupDebugTracewithis_cyclic = trueandtraversal_count > 1.Notes
enable_debug_trace = false— no change to hot path behaviourbuf generateconfirmed working on Windows (buf 1.67.0)