perf(expand): pre-prune duplicate (source, descendant) actions (CE-703) by arreyder · Pull Request #805 · ConductorOne/baton-sdk

arreyder · 2026-05-06T20:30:28Z

Summary

expander.RunSingleStep previously generated next-depth actions without checking whether the same (source, descendant) edge was already queued or had just been yielded earlier in the same iteration. This was a throughput cost — MarkEdgeExpanded is idempotent so correctness was not at risk, but every duplicate paid one full connector `ListGrants` round-trip plus the SQLite write to expand the same edge twice.

Why this matters

The path runs once per depth-bump in expandGrantsForEntitlements, which is the dominant phase for enterprise Entra-class connectors. Today's a large tenant Entra-Global expansion processed ~2161 actions over 12 hours with a c1z file growing to 142 GB. Even a small fraction of duplicates would compound across that envelope.

The most likely real-world reproducer: SCC cycle reduction collapses a cycle into a single node whose EntitlementIDs slice carries the merged entitlements. If the same ID ends up listed twice (which can happen across some manual graph edits or a JSON round-trip), GetExpandableEntitlements yields the source twice, the inner loop runs twice, and we end up with two identical actions in the queue.

Fix

Build a `seen` set seeded from existing `graph.Actions` and skip duplicates as we iterate:

type edgeKey struct{ source, descendant string }
seen := make(map[edgeKey]struct{}, len(e.graph.Actions))
for _, a := range e.graph.Actions {
    seen[edgeKey{a.SourceEntitlementID, a.DescendantEntitlementID}] = struct{}{}
}
for src := range e.graph.GetExpandableEntitlements(ctx) {
    for dst, grantInfo := range e.graph.GetExpandableDescendantEntitlements(ctx, src) {
        k := edgeKey{src, dst}
        if _, dup := seen[k]; dup { skipped++; continue }
        seen[k] = struct{}{}
        e.graph.Actions = append(e.graph.Actions, &EntitlementGraphAction{...})
    }
}

When `skipped > 0` we log an Info line with `depth`, `queued`, and `skipped` so operators can grep duplicate-rate per depth without flipping debug globally. The log fires at most once per depth-bump (the same cadence as the action-generation loop).

Seeding the seen-set from existing `graph.Actions` is a safety net: the production code path only enters action-generation when the queue is empty, but a future code change (checkpoint-restore, retry-with-saved-queue) could call it with non-empty Actions, and the dedup would handle that correctly.

Tests

TestExpander_DedupQueuesEachEdgeOnce\ — constructs a node whose `EntitlementIDs` slice contains "A" twice (the SCC-style reproducer), runs `RunSingleStep`, asserts the queue contains `(A,B)` exactly once instead of twice.
TestExpander_DedupAlreadyQueuedAction\ — companion test documenting the existing-queue safety-net invariant.
All existing `./pkg/sync/expand/...` tests still pass.

Validation

go test -count=1 ./pkg/sync/expand/...\ — green
go vet ./pkg/sync/expand/...\ — clean
gofmt -l pkg/sync/expand/\ — clean

Linear

CE-703

Companion PR

#804 (CE-702) adds OTel metrics for grant-expansion progress. Once both ship and operators see real `skipped` counts on Entra-Global syncs, we'll know whether this fix is meaningful in practice or just a safety net.

When the action queue empties and RunSingleStep walks GetExpandableEntitlements × GetExpandableDescendantEntitlements to generate the next depth's actions, it previously appended without checking whether the same (source, descendant) edge was already in the queue or had been yielded earlier in the same iteration. In the SCC-merged case (and after some manual graph edits) the same entitlement ID can appear in a node's EntitlementIDs slice more than once, causing GetExpandableEntitlements to yield it repeatedly and the inner loop to enqueue identical actions. MarkEdgeExpanded is idempotent so correctness was not at risk, but every duplicate paid one full connector ListGrants round-trip plus the SQLite I/O to write the same expansion twice — directly multiplied by the slowest part of large enterprise tenants (Eli Lilly Entra-Global ran ~2161 actions over 12h with a 142 GB c1z file; even a small fraction of duplicates compounds across that envelope). The fix builds a `seen` set seeded from existing actions and skips duplicates as the iteration progresses. A skipped > 0 path emits an Info log with depth, queued, and skipped counts so operators can grep without flipping debug, but stays rate-limited because the loop fires once per depth-bump. Refs: CE-703

linear-code · 2026-05-06T20:30:32Z

CE-703 Pre-prune duplicate (source, descendant) actions in grant-expander queueing

Address review feedback on the initial CE-703 commit: - Drop the no-op TestExpander_DedupAlreadyQueuedAction. The test never called RunSingleStep — it asserted only on the slice it had just set itself. Replace with two real tests: - TestExpander_DedupHandlesDestinationSideDuplicate covers the inner-loop yield-twice case (destination node EntitlementIDs duplicate), the mirror of the existing source-side test. - TestExpander_DedupEmitsLogOnSkip verifies the operator-visible "expander: pre-pruned duplicate (source, descendant) actions" log line carries the expected `queued` and `skipped` fields, since those are the signal operators will grep for to estimate dedup win across syncs. - Remove the dead seen-set seeding loop over e.graph.Actions. The control flow above guarantees the queue is empty at the action-generation branch, so iterating it was a no-op. Updated the surrounding comment to explain why. - Resize the seen map hint from len(e.graph.Actions) (always 0 here) to len(e.graph.Nodes), a coarse upper bound that avoids rehashes on large graphs. - Drop the redundant zap.Int("depth", e.graph.Depth) field on the dedup log line — `l.With(...)` at the top of RunSingleStep already attaches it. - Hoist edgeKey out of RunSingleStep to package scope so the type is declared once. Refs: CE-703

btipling · 2026-05-08T01:51:06Z

but every duplicate paid one full connector ListGrants round-trip plus the SQLite write to expand the same edge twice.

This is false. expansion does not have anything to do with connector ListGrants round-trips

btipling

I ran the benhmarks as so:

go test -run=NONE -bench='BenchmarkExpand|BenchmarkSCC_FixCycles' -benchmem -benchtime=1x -count=6 ./pkg/sync/expand/

I did this on main and this branch

  - `ExpandSmall` — main 42.02 s → branch 59.40 s
  - `ExpandSmallMedium` — main 55.02 s → branch 79.29 s
  - `SCC_FixCycles_Large N=10000 M=100000` — main 64.82 ms → branch 93.19 ms
  - `SCC_FixCycles_Large N=20000 M=200000` — main 155.1 ms → branch 242.1 ms

Seems like a regression in perf

ggreer · 2026-05-08T22:12:14Z

I've tried with a few c1zs and can't seem to get a duplicate source + descendant action. I tested this out by checking out this branch and adding some logging in the conditional. It never went into that code.

ggreer · 2026-05-08T22:14:16Z

Performance improvements have been merged since this PR was created, so this branch would need to be rebased for benchmarks to be accurate. I notice no difference in performance, but that's because my test data never hits a duplicate source + descendant.

arreyder · 2026-05-15T15:01:13Z

Closing this — rebased onto current main (which now includes #807 SQLite perf, #781 slim-blob, #826 grants index, and #804 the companion CE-702 metrics) and re-ran @btipling's bench command (-count=6 -benchtime=1x):

Benchmark	main	rebased	p
`ExpandSmall`	107.7s	104.3s	0.180
`ExpandSmallMedium`	134.5s	133.5s	0.093
`SCC_FixCycles_Large N=10k`	122.3ms	121.9ms	0.937
`SCC_FixCycles_Large N=20k`	254.1ms	259.4ms	0.240

All deltas statistically insignificant. The regression @btipling saw earlier was against pre-#807 main — it doesn't survive the rebase. But the perf claim also doesn't survive: allocs/op is byte-identical between branches, confirming @ggreer's observation that the dedup branch doesn't fire on representative c1z data.

Leaving CE-703 open so we can revisit if production ever surfaces real (source, descendant) duplicates (e.g. via the CE-702 metrics from #804).

btipling self-assigned this May 6, 2026

btipling requested changes May 8, 2026

View reviewed changes

arreyder closed this May 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(expand): pre-prune duplicate (source, descendant) actions (CE-703)#805

perf(expand): pre-prune duplicate (source, descendant) actions (CE-703)#805
arreyder wants to merge 2 commits into
mainfrom
chrisrhodes/ce-703-pre-prune-duplicate-source-descendant-actions-in-grant

arreyder commented May 6, 2026 •

edited

Loading

Uh oh!

linear-code Bot commented May 6, 2026

Uh oh!

btipling commented May 8, 2026

Uh oh!

btipling left a comment •

edited

Loading

Uh oh!

ggreer commented May 8, 2026 •

edited

Loading

Uh oh!

ggreer commented May 8, 2026

Uh oh!

arreyder commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

arreyder commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this matters

Fix

Tests

Validation

Linear

Companion PR

Uh oh!

linear-code Bot commented May 6, 2026

Uh oh!

btipling commented May 8, 2026

Uh oh!

btipling left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggreer commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggreer commented May 8, 2026

Uh oh!

arreyder commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arreyder commented May 6, 2026 •

edited

Loading

btipling left a comment •

edited

Loading

ggreer commented May 8, 2026 •

edited

Loading