Skip to content

fix: enterprise SAML emails lost on warm-start re-sync#140

Merged
madison-c-evans merged 1 commit intomainfrom
cxh-1180/fix-resourcesyncers-gate
Apr 8, 2026
Merged

fix: enterprise SAML emails lost on warm-start re-sync#140
madison-c-evans merged 1 commit intomainfrom
cxh-1180/fix-resourcesyncers-gate

Conversation

@madison-c-evans
Copy link
Copy Markdown
Contributor

Summary

  • fix: remove enterpriseLicensesAvailable gate from ResourceSyncers — the SDK calls ResourceSyncers() before Validate(), so the gate was always false and enterprises were never passed to the user builder
  • fix: use session-based check for enterprise SAML fetch — the enterpriseSAMLFetched struct flag persisted across sync cycles on Lambda warm starts, but the session store is scoped per SyncID. On the second sync the flag said "already fetched" while the session was empty, silently dropping all SAML emails. Replaced with a session-based sentinel (presence of enterpriseSAMLKeysIndex in the current sync's session)

Test plan

  • go build ./cmd/baton-github compiles clean
  • go test ./... passes
  • Local sync with PAT + enterprise + 2 orgs produces users with SAML emails
  • Deploy to C1, trigger sync, confirm emails persist through both sync cycles (resource-only + full)

🤖 Generated with Claude Code

@madison-c-evans madison-c-evans requested a review from a team April 8, 2026 17:06
@linear
Copy link
Copy Markdown

linear bot commented Apr 8, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Claude finished @madison-c-evans's task in 1m 37s —— View job


PR Review: fix: enterprise SAML emails lost on warm-start re-sync

  • Gather context and read changed files
  • Review connector.go changes
  • Review user.go changes
  • Post review findings

Review

Overall: This is a well-targeted fix for two real bugs that interact to silently drop enterprise SAML emails. The root causes are clearly identified and the fixes are appropriate. Approving with one observation and one minor concern.

Fix 1: Remove enterpriseLicensesAvailable gate from ResourceSyncers

The analysis is correct — the SDK calls ResourceSyncers() before Validate(), so the enterpriseLicensesAvailable field was always false at the time it was checked. This meant enterprises were never passed to userBuilder or enterpriseRoleBuilder.

The fix correctly removes the gate and passes gh.enterprises directly. The Validate() call now just logs a warning if the API is unreachable, which is the right behavior — errors will surface naturally at sync time with proper context.

One thought: the Validate log message was changed from "will be skipped" to "may fail at sync time" (connector.go:215, connector.go:237). This is accurate since there's no longer a gate, but if the consumed-licenses API is genuinely inaccessible (e.g., wrong token scopes), the sync will now fail hard in fetchAndStoreEnterpriseSAML rather than gracefully degrading. The List() fallback at user.go:158-163 catches this by disabling SAML state, so this is handled — just wanted to confirm that was intentional. Looks like it is, since the l.Debug("failed to fetch enterprise SAML emails, falling back to REST API emails") path covers it.

Fix 2: Replace struct flag with session-based sentinel ✅

This is the more subtle and important fix. The enterpriseSAMLFetched bool on the userResourceType struct persisted across Lambda warm starts because the struct is reused, but the session store is scoped per SyncID. On the second sync:

  1. enterpriseSAMLFetched is true (stale from previous sync)
  2. Session store is empty (new SyncID)
  3. loadEnterpriseSAMLEmails returns nothing → SAML emails silently lost

The fix replaces this with session.GetJSON[[]string](ctx, opts.Session, enterpriseSAMLKeysIndex) — checking presence of the key index in the session itself. This is the correct approach because the session is the source of truth for what's been fetched in the current sync.

Good details:

  • The empty sentinel write on failure (user.go:161) prevents retry storms across orgs in the same sync
  • Moving the key index write outside the len(samlByLogin) > 0 check (user.go:481-486) ensures the sentinel is written even when zero SAML mappings are found — this is critical for correctness

Remaining warm-start concern: samlStates map

The samlStates map (user.go:114) is also a struct-level cache that persists across warm starts. Unlike enterpriseSAMLFetched, this is likely safe because the SAML configuration for an org shouldn't change between syncs in practice. However, if an org enables/disables SAML between syncs, the stale cache would return the wrong state. This is pre-existing behavior and out of scope for this PR, but worth noting for future hardening — the same session-based pattern could be applied here if it ever becomes an issue.

Nit

user.go:161 — the error from session.SetJSON is silently discarded with _ =. This is in a fallback/error path so it's probably fine, but a l.Debug("failed to write empty SAML sentinel", zap.Error(err)) would make debugging easier if the session store ever misbehaves.


Verdict: The two bugs are real, the fixes are correct, and the session-based approach is the right pattern for Lambda warm-start resilience. LGTM. 👍

@madison-c-evans madison-c-evans force-pushed the cxh-1180/fix-resourcesyncers-gate branch from 2596990 to 5d97fad Compare April 8, 2026 17:16
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Claude finished @madison-c-evans's task in 1m 18s —— View job


PR Review: fix: enterprise SAML emails lost on warm-start re-sync

  • Gather context and read changed files
  • Review connector.go changes
  • Review user.go changes
  • Post review findings

Overall: Both bugs are real, well-diagnosed, and correctly fixed. The changes are minimal and targeted. LGTM.

Fix 1: Remove enterpriseLicensesAvailable gate from ResourceSyncersconnector.go

The SDK calls ResourceSyncers() before Validate(), so enterpriseLicensesAvailable was always false — enterprises were never passed to the user builder. Removing the gate and passing gh.enterprises directly is the correct fix.

The Validate() method now just logs a warning instead of gating. This is safe because:

  • fetchAndStoreEnterpriseSAML already has a fallback path (user.go:157-166) that catches failures and disables SAML enrichment for the rest of the sync
  • Enterprise role sync would surface a clear error at sync time rather than silently skipping

The log message change from "will be skipped" to "may fail at sync time" (connector.go:215, connector.go:237) accurately reflects the new behavior.

Fix 2: Replace struct flag with session-based sentinel — user.go

This is the more important fix. The enterpriseSAMLFetched bool on userResourceType persisted across Lambda warm starts (struct reused), but the session store is scoped per SyncID. On the second sync: flag says "already fetched", session is empty, SAML emails silently lost.

The fix uses session.GetJSON[[]string](ctx, opts.Session, enterpriseSAMLKeysIndex) to check whether SAML data was fetched in the current sync session. Key correctness details:

  • Empty sentinel on failure (user.go:161): Writing an empty []string{} to enterpriseSAMLKeysIndex prevents retry storms across orgs in the same sync — good.
  • Key index written unconditionally (user.go:483-488): The enterpriseSAMLKeysIndex is now written even when zero SAML mappings are found. This is critical — without it, zero-mapping enterprises would re-fetch on every List() call.
  • SetManyJSON still guarded (user.go:478): The len(samlByLogin) > 0 guard remains on SetManyJSON, avoiding a no-op call with an empty map. Correct.

Pre-existing: samlStates map

The samlStates map (user.go:114) is also a struct-level cache that persists across warm starts. Unlike enterpriseSAMLFetched, this is likely safe because SAML configuration for an org is effectively static between syncs. If it ever changes, a cold restart would pick it up. This is out of scope but worth noting for future hardening.

Minor observation

user.go:161-162 — the error from session.SetJSON (the empty sentinel write) is now logged at Debug level, which is an improvement over the previous review's nit about silently discarding it. Good.

Verdict: The two bugs interact to silently drop all enterprise SAML emails on warm-start re-syncs. Both fixes are correct, minimal, and use the right pattern (session as source of truth). 👍


…m restarts

The enterpriseSAMLFetched struct flag persisted across sync cycles when
the connector stayed warm (Lambda warm start), but the session store is
scoped per SyncID. On the second sync the flag said "already fetched"
while the session was empty, silently dropping all SAML emails.

Replace the struct flag with a session-based sentinel: the presence of
the enterpriseSAMLKeysIndex key in the current sync's session indicates
whether we already fetched. Also always write the key index (even when
empty) so the sentinel works when there are zero SAML mappings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@madison-c-evans madison-c-evans force-pushed the cxh-1180/fix-resourcesyncers-gate branch from 5d97fad to 218ec2c Compare April 8, 2026 17:22
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Claude finished @madison-c-evans's task in 1m 33s —— View job


PR Review: fix: enterprise SAML emails lost on warm-start re-sync

  • Gather context and read changed files
  • Review changes against origin/main
  • Post review findings

Review

Overall: This is a clean, well-targeted fix for a real warm-start bug. The session-based sentinel approach is the correct pattern. LGTM.

What changed

The enterpriseSAMLFetched bool field on userResourceType was removed and replaced with a session-based presence check on enterpriseSAMLKeysIndex. This is the only change in the PR — just user.go.

Bug analysis

The root cause is clear: on Lambda warm starts, the userResourceType struct is reused across sync cycles, but the session store is scoped per SyncID. After the first sync:

  1. enterpriseSAMLFetched = true (stale, persists on struct)
  2. Session store is empty (new SyncID)
  3. loadEnterpriseSAMLEmails finds nothing → all SAML emails silently dropped

The fix replaces the struct flag with session.GetJSON[[]string](ctx, opts.Session, enterpriseSAMLKeysIndex) — checking the session itself as the source of truth. This is the right approach.

Key correctness details

  • Empty sentinel on failure (user.go:160-163): If fetchAndStoreEnterpriseSAML fails, an empty []string{} is written to enterpriseSAMLKeysIndex so remaining orgs in the same sync don't retry. Good.
  • Key index written unconditionally (user.go:483-488): Moved outside the len(samlByLogin) > 0 guard. This is critical — without it, an enterprise with zero SAML mappings would never write the sentinel and would re-fetch on every List() call for every org.
  • SetManyJSON still guarded (user.go:478): The bulk write remains inside len(samlByLogin) > 0, correctly avoiding a no-op with an empty map.
  • Error logging on sentinel write (user.go:162): The setErr from the empty sentinel write is logged at Debug rather than silently discarded. Good practice.

Pre-existing: samlStates map

The samlStates map (user.go:114) is also a struct-level cache that persists across warm starts. This is likely safe since SAML configuration for an org is effectively static between syncs, and a cold restart would reset it. However, if an org toggles SAML between syncs, the stale cache would return the wrong state. Out of scope for this PR but worth noting for future hardening — the same session-based pattern could be applied.

One observation on the keys slice allocation

At user.go:474, keys is built from samlByLogin before the len > 0 check:

keys := make([]string, 0, len(samlByLogin))
for k := range samlByLogin {
    keys = append(keys, k)
}

When samlByLogin is empty, this allocates a []string{} with capacity 0 and iterates nothing — which is fine, it's just a no-op allocation. No action needed, just noting the flow is correct.

Verdict: The bug is real, the fix is minimal and correct, and the session-based pattern is the right approach for Lambda warm-start resilience. Approved.


@madison-c-evans madison-c-evans merged commit c8b04dd into main Apr 8, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants