Skip to content

fix(local): tolerate NATS startup races#46

Merged
FrankSpitulski merged 1 commit into
mainfrom
fix/nvbug-6230705-6231022-nats-startup
May 28, 2026
Merged

fix(local): tolerate NATS startup races#46
FrankSpitulski merged 1 commit into
mainfrom
fix/nvbug-6230705-6231022-nats-startup

Conversation

@FrankSpitulski
Copy link
Copy Markdown
Collaborator

Summary

  • keep auth-callout alive when the initial NATS connection is unavailable by enabling initial-connect retry and unlimited reconnects
  • split process liveness onto /livez while keeping /healthz NATS-backed for readiness
  • extend only local surveyor and auth-callout rollout waits for NATS startup/backoff races

NVBugs

  • 6230705
  • 6231022

Validation

  • go test ./src/internal/service
  • make test
  • make test-helm
  • git diff --check
  • helm template nats-event-bus deploy/nats-event-bus --show-only charts/auth-callout/templates/deployment.yaml
  • make -C local deploy-nats
  • make -C local test-functional
  • adversarial subagent review: no blocking findings

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 28, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@FrankSpitulski FrankSpitulski force-pushed the fix/nvbug-6230705-6231022-nats-startup branch 6 times, most recently from 6c62ea8 to 641bc85 Compare May 28, 2026 22:48
@FrankSpitulski FrankSpitulski marked this pull request as ready for review May 28, 2026 22:54
@FrankSpitulski FrankSpitulski requested a review from a team May 28, 2026 22:54
@github-actions
Copy link
Copy Markdown

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-05-28 22:55:32 UTC | Commit: 641bc85

@github-actions
Copy link
Copy Markdown

🛡️ CodeQL Analysis

✅ No security issues found!

💡 Note: Enable GitHub Advanced Security to see full details in the Security tab.

🕐 Last updated: 2026-05-28 22:57:38 UTC | Commit: 641bc85

Copy link
Copy Markdown

@bryan-aguilar bryan-aguilar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good other linting

Add auth-callout initial NATS reconnect handling and split liveness from

NATS-backed health checks.

The service now stays alive while NATS starts.

Readiness reports unavailable until the NATS connection is established.

Extend only local surveyor and auth-callout rollout waits to cover kubelet

backoff during local Kind deploys.

Signed-off-by: Frank Spitulski <fspitulski@nvidia.com>
@FrankSpitulski FrankSpitulski force-pushed the fix/nvbug-6230705-6231022-nats-startup branch from 641bc85 to 852b9fa Compare May 28, 2026 23:01
@FrankSpitulski FrankSpitulski merged commit d7538de into main May 28, 2026
15 checks passed
@FrankSpitulski FrankSpitulski deleted the fix/nvbug-6230705-6231022-nats-startup branch May 28, 2026 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants