Conversation
Prevent() blocks requests while the circuit is closed, so emitting ErrShortCircuit is semantically wrong — it implies the circuit is open. Properly addressing this metric gap requires a new ErrPrevented method on the RunMetrics interface, which is a breaking change for a future major version.
cep21
added a commit
that referenced
this pull request
Feb 26, 2026
- fix: use CAS in openCircuit/close to emit Opened()/Closed() exactly once - fix: copy config in ConsecutiveErrOpenerFactory closure (race hygiene, matches PR #150) - fix: read Fallback.MaxConcurrentRequests once in throttle check (matches PR #150) - fix: use nil-safe props.now() in Opener.SetConfigNotThreadSafe - fix: guard NumBuckets==0 in RollingCounter.GetBuckets (fixes String() panic) - fix: nil-check pointer fields in RollingCounter.UnmarshalJSON - fix: MockClock.AfterFunc fires immediately for d<=0 (matches real time.AfterFunc) All fixes are backwards-compatible (panic->works, race->clean, or strictly additive guards). Regression tests added for each. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
cep21
added a commit
that referenced
this pull request
Feb 27, 2026
* Fix 7 bugs: TOCTOU races, panics, and missed cases from PR #150 - fix: use CAS in openCircuit/close to emit Opened()/Closed() exactly once - fix: copy config in ConsecutiveErrOpenerFactory closure (race hygiene, matches PR #150) - fix: read Fallback.MaxConcurrentRequests once in throttle check (matches PR #150) - fix: use nil-safe props.now() in Opener.SetConfigNotThreadSafe - fix: guard NumBuckets==0 in RollingCounter.GetBuckets (fixes String() panic) - fix: nil-check pointer fields in RollingCounter.UnmarshalJSON - fix: MockClock.AfterFunc fires immediately for d<=0 (matches real time.AfterFunc) All fixes are backwards-compatible (panic->works, race->clean, or strictly additive guards). Regression tests added for each. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: UnmarshalJSON returns error on incomplete JSON instead of leaving inconsistent state The previous fix (nil-check each pointer field and skip) made zero-value receivers safe but created a new deferred-panic vector: on a pre-initialized counter, UnmarshalJSON({}) would set buckets=nil while leaving NumBuckets unchanged, causing GetBuckets to panic later with index out of range. UnmarshalJSON is a round-trip API — MarshalJSON always emits all fields. Incomplete JSON is corrupt/truncated input. Return an error (receiver unmodified) rather than silently partial-apply. Backwards-compatible vs released master: panic -> error (strictly better). Tests added: - IncompleteInput: 5 partial-JSON variants x 2 receiver states (all must error, receiver preserved) - RoundTrip: 3 counter states marshal+unmarshal cleanly (the BC guarantee) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * tests: fix 6 timing-based flakes Systematic audit for timing-sensitive tests that can flake on loaded CI runners. - TestCircuitAttemptsToReopenOnlyOnce: convert to MockClock for deterministic timing. The previous real-clock version with SleepWindow: 1ms flaked when CI scheduling delays exceeded 1ms (same pattern PR #141 fixed in the sister test TestCircuitAttemptsToReopen). This is the flake that caused the CI failure. - TestThrottled, TestFallbackCircuitConcurrency: replace time.Sleep-based concurrency with a barrier pattern. Wait until ALL goroutines have either entered the throttled section or been rejected before releasing — guarantees the over-limit goroutine attempts while slots are occupied, regardless of scheduling. Also fixed: errCount++ was not atomic. - TestRollingCounter_Inc: use StringAt(now)/RollingSumAt(now) instead of String()/RollingSum(). The latter use real time.Now() which can advance past the 10ms rolling window on a slow runner, rolling out buckets before later Inc(now) calls. - TestTickUntil: register AfterFunc callbacks BEFORE starting the TickUntil goroutine. If TickUntil advances mock time first, callbacks get scheduled relative to the advanced time, breaking the final m.Now() == now+3h assertion. - TestManagerConcurrentFactoryConfiguration: drop require.NoError on Execute. With 300 concurrent executions under -race and only a 2x sleep/timeout margin, CI jitter can cause spurious timeouts. The test is about concurrent factory creation, not execution timing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
cep21
added a commit
that referenced
this pull request
Feb 28, 2026
TestRollingCounterBucketRolloverRace had a rollingSum < 0 check inside the hot concurrent loop since PR #139/#150. This check was always wrong for the lock-free RollingCounter design (documented at rolling_bucket.go): Thread A (Inc): buckets[idx].Add(1) Thread B (rollover): Swap(0) returns the incremented value, rollingSum.Add(-toDec) — briefly over-decrements Thread A (Inc cont): rollingSum.Add(1) — corrects A reader between B and A's rollingSum write observes a transient negative. The value is correct once writers quiesce. Moved the < 0 check to after wg.Wait(). The rollingSum <= TotalSum check stays in the hot loop (safe: reads are correctly ordered).
cep21
added a commit
that referenced
this pull request
Feb 28, 2026
…efile (#152) (#152) * Add property/fuzz/invariant tests; fix 2 fuzz-found bugs; restore Makefile ## Test infrastructure - circuit_property_test.go: invariant-asserting stress tests + deterministic state-machine property test (testing/quick + MockClock + reference model). Verified catches the TOCTOU bugs from PR #151 by temporarily reverting CAS. - TestCircuit_RunMetricsExactlyOnce_Concurrent: verifies the exactly-once RunMetrics contract under 100K concurrent Execute calls - TestCircuit_TransitionAlternation_Concurrent: verifies Opened/Closed alternate correctly (closed ≤ opened ≤ closed+1) under contention - TestCircuit_FallbackMetricsExactlyOnce_Concurrent: same for fallback path - TestCircuit_StateMachineProperty: 500 random op sequences replayed against real circuit + trivial reference model, 6 invariants checked per step, fully deterministic via MockClock - faststats/fuzz_test.go: 5 Go-native fuzz targets + naiveRollingCounter differential oracle. Seeds run as unit tests in ~5ms. - FuzzRollingCounterOps: Inc/RollingSumAt/GetBuckets vs naive oracle - FuzzRollingCounterJSON: round-trip + hostile-input safety - FuzzSortedDurationsPercentile: bounds + monotonicity - FuzzRollingBucketAdvance: index bounds + LastAbsIndex monotonicity - FuzzTimedCheckJSON: round-trip + hostile-input safety - Tightened existing stress tests (were assertion-free, just t.Logf): - circuit_stress_test.go: TestCircuitStateTransitionRace now verifies Opened/Closed alternation + ConcurrentCommands balance - faststats/rolling_stress_test.go: deleted 2 permanent-skip tests, added real TestTimedCheckConcurrency, added RollingSum≤TotalSum check ## Bug fixes (found by fuzzers in ~10s) - faststats/rolling_percentile.go: Percentile(NaN) panicked — NaN falls through both p<=0 and p>=100 guards, int(math.Floor(NaN)) is INT64_MIN on amd64, index-out-of-range. Now returns -1. - faststats/rolling_counter.go: UnmarshalJSON accepted hostile JSON with NumBuckets > len(Buckets), then GetBuckets panicked. Now rejected at unmarshal time with a descriptive error. ## Build tooling - Makefile: restored after Jan 2024 deletion. make (fast dev loop), make ci (pre-PR gate mirroring GitHub Actions), make fuzz, make bench, make cover, make fix, make help. - .github/workflows/build.yml: Build/Test steps now call make build / make test-race (same commands, now DRY). - .github/workflows/fuzz.yml: daily scheduled fuzzing, 5min per target, uploads crash inputs as artifacts on failure. Manual dispatch enabled. - README.md: Development section points to make targets. Fixed dead Makefile link. - .gitignore: added /coverage.html (from make cover). * Fix pre-existing flake: RollingSumAt < 0 check must be post-quiescence TestRollingCounterBucketRolloverRace had a rollingSum < 0 check inside the hot concurrent loop since PR #139/#150. This check was always wrong for the lock-free RollingCounter design (documented at rolling_bucket.go): Thread A (Inc): buckets[idx].Add(1) Thread B (rollover): Swap(0) returns the incremented value, rollingSum.Add(-toDec) — briefly over-decrements Thread A (Inc cont): rollingSum.Add(1) — corrects A reader between B and A's rollingSum write observes a transient negative. The value is correct once writers quiesce. Moved the < 0 check to after wg.Wait(). The rollingSum <= TotalSum check stays in the hot loop (safe: reads are correctly ordered).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes 14 bugs found through deep analysis of the circuit breaker library, ranging from runtime panics and data races to incorrect metric output and unbounded recursion. All fixes are backwards-compatible and include regression tests. Benchmarks show no performance regression.
19 files changed, 332 insertions, 56 deletions
Bug Fixes
Panics & Crashes
close(nil)panic inwaitForErrors(gowrapper.go)When
skipCatchPanicsis true,panicResultsis nil. The deferredclose(panicResults)panicked. Added a nil guard.SimpleBadRequest.Error()panics ifErris nil (errors.go)Calling
.Error()onSimpleBadRequest{Err: nil}caused a nil pointer dereference. Now returns"bad request".MetricEventStream.Close()panics on double-close (metriceventstream/metriceventstream.go)Closing the event stream twice panicked on
close(m.closeChan). Guarded withsync.Once.RollingPercentile.AddDurationpanics on pre-start times (faststats/rolling_percentile.go)Adding a duration with a timestamp before
StartTimeproduced a negative index, causing an out-of-bounds panic. Addedidx < 0guard.Correctness
SortedDurations.Var()wrong percentile scale (faststats/rolling_percentile.go)Var()calledPercentile(.25)etc, butPercentile()expects[0-100], not[0-1]. This made p25/p50/p90/p99 in expvar output nearly identical to the minimum value. Fixed callers to use25,50,90,99.RollingBuckets.Advancebackward-time guard was dead code (faststats/rolling_bucket.go)The condition
indexDiff >= r.NumBucketswas always false whenindexDiff < 0. Fixed to-indexDiff >= r.NumBuckets.OpenCircuitusedtime.Now()instead ofc.now()(circuit.go)OpenCircuitignored the configuredTimeKeeper.Now, breaking mock-time testing. Changed toc.now().Double
now()call at timeout boundary (circuit.go)Two separate
now()calls (endTimeandrunFuncDoneTime) could straddle a timeout boundary, causing a spurious timeout even though the function completed in time. Collapsed to a single call.Double read of
MaxConcurrentRequestsin throttle check (circuit.go)MaxConcurrentRequests.Get()was called twice in the same condition, allowing a TOCTOU race where the value could change between reads. Read once into a local variable.MockClock.AfterFuncreturned nil (internal/clock/clock.go)Callers calling
.Stop()on the returned*time.Timerwould panic. Now returns a stopped real timer.Concurrency
Factory closures mutated shared captured
config(closers/hystrix/closer.go,closers/hystrix/opener.go)CloserFactoryandOpenerFactorycalledconfig.Merge()on the captured config value, mutating it across concurrent invocations. Fixed by copying config before merging.RollingBuckets.Advanceunbounded recursion under contention (faststats/rolling_bucket.go)Two recursive
return r.Advance(now, clearBucket)calls could stack overflow under high concurrency. Converted to an iterative loop.Code Quality (from review)
MockClock.AfterFuncdeadlock whend==0— calledf()while holdingm.mu. Released lock first.cancelledfield intimedCallbacks— never set to true. Removed along with the check.TestOpenCircuitUsesMockTime.What is NOT in this PR
Prevent()metric gap: WhenClosedToOpen.Prevent()blocks a request, no metric is emitted. An earlier version of this PR emittedErrShortCircuit, but this is semantically wrong —Preventoperates on a closed circuit, whileErrShortCircuitimplies the circuit is open. Properly fixing this requires addingErrPreventedto theRunMetricsinterface, which is a breaking change for a future major version.Testing
go test ./... -racepasses with no races detected