Skip to content

fix(rig-bridge): stop auth retry storm and rate-limit server log warnings#1041

Merged
accius merged 2 commits into
accius:Stagingfrom
ceotjoe:fix/rig-bridge-auth-log-spam
May 31, 2026
Merged

fix(rig-bridge): stop auth retry storm and rate-limit server log warnings#1041
accius merged 2 commits into
accius:Stagingfrom
ceotjoe:fix/rig-bridge-auth-log-spam

Conversation

@ceotjoe
Copy link
Copy Markdown
Collaborator

@ceotjoe ceotjoe commented May 31, 2026

Summary

  • rig-bridge client (plugins/cloud-relay.js): on a 401/403 response, a new handleCredentialsInvalid() helper permanently stops both the push timer and the long-poll loop, logging once with a clear action for the user. Previously, pushState kept retrying every 2 s and longPollCommands retried every 1 s on any non-200 — flooding the server log at ~1 warn/s indefinitely.
  • Server (server/routes/rig-bridge.js): requireRelayAuth now rate-limits logWarn to once per 5 minutes per sessionId. MeshCom ingest failure warnings are similarly throttled to once per minute per subtype.

Test plan

  • Configure rig-bridge with a stale/wrong session token and confirm the server log shows only one auth warning, then silence for 5 minutes
  • Confirm rig-bridge logs a single clear error message and stops retrying after receiving a 401
  • Re-run Connect Cloud Relay in OHC Settings → Rig Bridge, confirm normal relay operation resumes with fresh credentials
  • Verify no regression in normal relay auth (valid credentials still pass through)

🤖 Generated with Claude Code

ceotjoe and others added 2 commits May 31, 2026 15:37
…ings

When rig-bridge holds a stale session token (e.g. after a server restart
invalidates the token store), two retry loops caused continuous log spam:

- pushState() fired every 2 s and kept retrying on 401
- longPollCommands() retried every 1 s on any non-200 response, including 401

Server-side: requireRelayAuth now rate-limits logWarn to once per 5 minutes
per sessionId instead of logging every rejected request. MeshCom ingest
warnings are similarly throttled to once per minute per subtype.

Client-side (cloud-relay.js): a shared handleCredentialsInvalid() helper
permanently stops both the push timer and the long-poll loop on a 401/403
response, logging once and instructing the user to re-run Connect Cloud Relay.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@accius accius merged commit 80ab645 into accius:Staging May 31, 2026
6 checks passed
@accius
Copy link
Copy Markdown
Owner

accius commented May 31, 2026

Couple of small things noticed during review, neither blocking, both are follow-up-able whenever you feel like it:

The authWarnLastLogged map keys on sessionId prefix but never clears stale entries. If a misbehaving client (or a deliberate one) cycles through a lot of bad sessionIds, that map slowly grows. The relayPollCountByIP cleanup pass that runs every 5 minutes a few lines up could probably absorb this one with a small tweak.

The other one is just messaging. The error tells the user to "re-run Connect Cloud Relay in OHC Settings" but in practice they also need to restart rig-bridge for the new credentials to actually take effect, since the old token is still in memory and credentialsInvalid stays set for the life of the process. Maybe worth adding "and restart rig-bridge" to that string so people don't bounce back wondering why it's still broken after clicking the button.

accius pushed a commit that referenced this pull request Jun 1, 2026
* fix(rig-bridge): stop auth retry storm and rate-limit server log warnings

When rig-bridge holds a stale session token (e.g. after a server restart
invalidates the token store), two retry loops caused continuous log spam:

- pushState() fired every 2 s and kept retrying on 401
- longPollCommands() retried every 1 s on any non-200 response, including 401

Server-side: requireRelayAuth now rate-limits logWarn to once per 5 minutes
per sessionId instead of logging every rejected request. MeshCom ingest
warnings are similarly throttled to once per minute per subtype.

Client-side (cloud-relay.js): a shared handleCredentialsInvalid() helper
permanently stops both the push timer and the long-poll loop on a 401/403
response, logging once and instructing the user to re-run Connect Cloud Relay.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(rig-bridge): bump version to 2.2.1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(rig-bridge): address PR review feedback

- Prune authWarnLastLogged entries in the existing 5-minute cleanup
  interval (entries older than 2× AUTH_WARN_INTERVAL) so the map
  doesn't grow unboundedly when clients cycle through many bad sessionIds
- Add "then restart rig-bridge" to the credentials-invalid error message
  since credentialsInvalid stays set for the life of the process and
  clicking Connect Cloud Relay alone is not enough to resume the relay

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@accius accius mentioned this pull request Jun 2, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants