fix: rate-limit notification retries for non-existent windows by UserAd · Pull Request #26 · UserAd/AgentMail

UserAd · 2026-01-15T05:03:13Z

Summary

Fix infinite retry loop when mailman tries to notify non-existent tmux windows
Check window existence before attempting notification (cleaner approach)
If window doesn't exist, skip entirely - no "failed" errors in logs
Keep rate-limiting as fallback for other notification failures

Test plan

Added TestStatelessNotification_WindowNotExists to verify window check works
Updated TestStatelessNotification_NotifyFailure to verify rate-limiting on other failures
All existing tests pass with go test -race ./...
Manual test: send message to non-existent window, verify logs show "window does not exist" (not "failed")

When notification failed for a stateless agent (e.g., tmux window doesn't exist), the agent was not marked as notified. This caused mailman to retry every cycle, creating an infinite loop of failed notifications. Now notification failures also mark the agent as notified, rate-limiting retries to once per interval (60 seconds). The retry will stop entirely if the message gets read.

coderabbitai · 2026-01-15T05:05:41Z

📝 Walkthrough

Walkthrough

When stateless agent notification fails, the code now marks the agent as notified via StatelessTracker to rate-limit retry attempts, preventing immediate retries. The successful notification path remains unchanged. This failure-handling branch complements existing behavior to avoid repeated notification attempts.

Changes

Cohort / File(s)	Summary
Loop notification handling `internal/daemon/loop.go`	Added `StatelessTracker.MarkNotified(mailboxRecipient)` call in failure path of `CheckAndNotifyWithNotifier` to rate-limit retries on notification failures (+4 lines)
Loop notification tests `internal/daemon/loop_test.go`	Updated `StatlessNotification` test (T033) to verify agent is marked as notified on failure, enforcing rate-limited retry behavior; test now validates that immediate retries are blocked and subsequent attempts occur only after tracker interval elapses (+20/-7 lines)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: rate-limit notification retries for non-existent windows' directly and clearly describes the main change—implementing rate-limiting for notification retries in a specific failure scenario.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e7c67d and b596843.

📒 Files selected for processing (2)

internal/daemon/loop.go
internal/daemon/loop_test.go

🧰 Additional context used

📓 Path-based instructions (2)

**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.go: Store messages in JSONL format in .agentmail/mailboxes/<recipient>.jsonl with file locking for atomic operations
Use only Go standard library (os/exec for tmux, encoding/json for JSONL) without external dependencies
Build system must pass go vet ./... without errors
Build system must pass go fmt ./... without errors
Build system must pass govulncheck ./... without errors
Build system must pass gosec ./... without errors
Use JSONL file format in .agentmail/ directory for message storage and state management
Use standard library only (os, filepath, syscall, encoding/json, time, sync, os/signal, bufio, fmt, io) for Go implementation

Files:

internal/daemon/loop.go
internal/daemon/loop_test.go

**/*_test.go

📄 CodeRabbit inference engine (CLAUDE.md)

Build system must pass go test -v -race ./... without errors

Files:

internal/daemon/loop_test.go

🧬 Code graph analysis (1)

internal/daemon/loop_test.go (1)

internal/daemon/loop.go (1)

CheckAndNotifyWithNotifier (136-276)

🔇 Additional comments (2)

internal/daemon/loop.go (1)

252-258: LGTM! Rate-limiting on failure prevents infinite retry loops.

The change correctly marks the stateless agent as notified even when the notification fails. This ensures that subsequent notification cycles won't immediately retry, rate-limiting attempts to once per StatelessNotifyInterval (60 seconds). The approach is appropriate because:

If the window doesn't exist, repeated immediate retries are wasteful

The window might become available later, so retries after the interval are still attempted

If the mailbox is cleared (message read), retries stop entirely as there are no unread messages

The log message suffix "(after failure)" clearly distinguishes this path from successful notifications.

internal/daemon/loop_test.go (1)

1316-1376: LGTM! Test correctly validates the rate-limiting behavior on notification failure.

The updated test thoroughly verifies the new failure-handling semantics:

First call (lines 1341-1348): Notification fails, notifyAttempts = 1

Rate-limit check (lines 1350-1354): Confirms ShouldNotify returns false after failure

Immediate retry blocked (lines 1356-1363): Second call doesn't trigger notify due to rate-limiting

Interval elapsed (lines 1365-1366): Waits 60ms (> 50ms interval)

Retry allowed (lines 1368-1375): Third call succeeds, notifyAttempts = 2

The test interval (50ms) and sleep duration (60ms) provide adequate margin for timing reliability. The mock notifier correctly returns success on retry (notifyAttempts > 1), validating that recovery is possible after the rate-limit window.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Instead of letting notification fail for non-existent windows and then rate-limiting, check upfront if the tmux window exists. This provides cleaner logs (no "failed" errors for expected cases) and avoids unnecessary notification attempts. Added WindowCheckerFunc to make the check mockable in tests.

UserAd merged commit eb86928 into main Jan 15, 2026
2 checks passed

UserAd deleted the stuck-in-cycle branch January 15, 2026 05:08

UserAd mentioned this pull request Jan 15, 2026

fix: rate-limit window existence checks for non-existent windows #27

Merged

3 tasks

This was referenced Jan 15, 2026

fix: change notification flag to 60s debounce with 1h work protection #28

Merged

Install gocyclo and analyze code complexity #31

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: rate-limit notification retries for non-existent windows#26

fix: rate-limit notification retries for non-existent windows#26
UserAd merged 2 commits intomainfrom
stuck-in-cycle

UserAd commented Jan 15, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 15, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

UserAd commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

coderabbitai bot commented Jan 15, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

UserAd commented Jan 15, 2026 •

edited

Loading