Skip to content

fix(tracker): add rate limiter to prevent event dispatch loops#2515

Merged
tusharmath merged 4 commits intomainfrom
posthog-error-diagnosis
Mar 11, 2026
Merged

fix(tracker): add rate limiter to prevent event dispatch loops#2515
tusharmath merged 4 commits intomainfrom
posthog-error-diagnosis

Conversation

@tusharmath
Copy link
Collaborator

@tusharmath tusharmath commented Mar 10, 2026

Summary

Add rate limiter to the event tracker to prevent runaway dispatch loops that can occur when stderr/stdout are closed, causing each write error to trigger another error event.

Context

The tracker can enter a runaway loop scenario when stdout/stderr becomes unavailable (e.g., when the output stream is closed). Each write error would trigger another error event, potentially creating an infinite loop of error events. This was identified as a potential issue during error handling improvements.

Changes

  • Added new RateLimiter module (rate_limit.rs) with a thread-safe, atomic-based rate limiter
  • Configured maximum of 1,000 events per minute to allow normal tracking while preventing loops
  • Integrated rate limiting check into the dispatch method - events exceeding the limit are silently dropped
  • Added unit tests for the rate limiter to verify behavior

Key Implementation Details

The rate limiter uses atomic operations for thread-safety without requiring a mutex:

  • Uses AtomicU64 for window start timestamp and AtomicUsize for event count
  • Resets the window and counter every 60 seconds using compare-exchange to avoid race conditions
  • Returns true if event is allowed, false if rate limit exceeded (event is dropped)

The 1,000 events/minute limit was chosen to:

  • Allow normal tracking for long-running sessions
  • Prevent runaway loops from overwhelming the system
  • Be generous enough for legitimate high-frequency events

Use Cases

  • Long-running CLI sessions with continuous event tracking
  • Preventing infinite error loops when stdout/stderr becomes unavailable
  • High-frequency operation monitoring without system overload

Testing

# Run the tracker tests
cargo test -p forge_tracker

# Run just the rate limiter tests
cargo test -p forge_tracker rate_limit

Links

  • Related issue: (add issue number if applicable)

@github-actions github-actions bot added type: feature Brand new functionality, features, pages, workflows, endpoints, etc. type: fix Iterations on existing features or infrastructure. labels Mar 10, 2026
@tusharmath tusharmath changed the title feat(tracker): add rate limiter to prevent event dispatch loops fix(tracker): add rate limiter to prevent event dispatch loops Mar 10, 2026
@tusharmath tusharmath enabled auto-merge (squash) March 10, 2026 11:51
Comment on lines +26 to +45
pub fn check(&self) -> bool {
let now = Utc::now().timestamp() as u64;
let window_start = self.window_start.load(Ordering::Relaxed);

// If a minute has passed, reset the window and counter
if now.saturating_sub(window_start) >= 60 {
// We use compare_exchange to avoid race conditions when multiple threads try to
// reset
if self
.window_start
.compare_exchange(window_start, now, Ordering::SeqCst, Ordering::Relaxed)
.is_ok()
{
self.count.store(0, Ordering::SeqCst);
}
}

// Increment and check the rate limit
self.count.fetch_add(1, Ordering::Relaxed) < self.max_per_minute
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical race condition: The window reset is not atomic with the count reset. After window_start is updated via compare_exchange (line 36), but before count.store(0) executes (line 39), other threads may see the new window_start value and skip the reset check (line 31), then increment the old (un-reset) count value (line 44). This allows events beyond the rate limit.

Scenario:

  1. Thread A: CAS succeeds, sets window_start to new time
  2. Thread A: Gets preempted before executing count.store(0)
  3. Thread B: Loads new window_start, sees no reset needed (time diff < 60)
  4. Thread B: Increments old count value (e.g., 1005) via fetch_add
  5. Thread A: Resumes and resets count to 0
  6. Result: Events from the old window weren't properly limited

Fix: Use a single atomic operation or add a generation counter:

let current_count = self.count.fetch_add(1, Ordering::Relaxed);
if now.saturating_sub(window_start) >= 60 && current_count >= self.max_per_minute {
    if self.window_start.compare_exchange(window_start, now, Ordering::SeqCst, Ordering::Relaxed).is_ok() {
        self.count.store(1, Ordering::SeqCst);
        return true;
    }
}
current_count < self.max_per_minute

Spotted by Graphite

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

tusharmath and others added 2 commits March 11, 2026 15:26
Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
@tusharmath tusharmath merged commit 884fe78 into main Mar 11, 2026
9 checks passed
@tusharmath tusharmath deleted the posthog-error-diagnosis branch March 11, 2026 10:01
@tusharmath tusharmath removed the type: feature Brand new functionality, features, pages, workflows, endpoints, etc. label Mar 11, 2026
tusharmath added a commit that referenced this pull request Mar 13, 2026
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: fix Iterations on existing features or infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant