Skip to content

refactor: drain as runFunc adapter (v0.2.0)#5

Merged
klaidliadon merged 14 commits into
feat/drain-and-tickerfrom
refactor/runfunc-adapter
May 12, 2026
Merged

refactor: drain as runFunc adapter (v0.2.0)#5
klaidliadon merged 14 commits into
feat/drain-and-tickerfrom
refactor/runfunc-adapter

Conversation

@klaidliadon
Copy link
Copy Markdown
Contributor

Per marino39's PR #4 review, drain belongs in a runFunc adapter, not as an Option coupled into core lifecycle. This refactor moves drain and ticker into a new `adapters/` subpackage. Core `runnable.go` returns to its pre-v0.1 single-purpose `Stop`. Hard cut — no deprecated shims. v0.2.0.


Summary

  • New `runnable/adapters` subpackage: `Draining`, `Ticker`, `Stopping`, `ErrDrainTimedOut`.
  • Core `runnable` package strips drain fields (3) and the drain logic from `Stop` (~50 LoC). Keeps the `runCancel` snapshot fix from PR feat: drain-on-Stop and NewTicker primitive #4 as memory-model defense.
  • `with_retry.go` loses its `Stopping(ctx)` check — drain is invisible to it now.
  • `NewGroup` + `Draining` works by construction. New test verifies (`TestNewGroup_DrainEnabledChild`).
  • SIGTERM-via-ctx no longer bypasses drain. Example passes `signal.NotifyContext` directly to `Run` again.

Behavioral change

`Stop(ctx)`'s ctx is no longer a drain budget. v0.1: caller's short ctx aborted drain mid-flight. v0.2: caller's ctx only governs how long they wait for `Stop` to return; drain runs on its own fixed timer.

Test plan

  • `go test -race ./...` — all green
  • `go test -race -count=10 ./adapters/...` — stable
  • `go vet ./...` — clean
  • New `TestNewGroup_DrainEnabledChild` — proves the v0.1 NewGroup gap is closed

Spec & plan

  • Spec: `tmp/runfunc-refactor/2026-05-02-design.md` (untracked, in branch tree)
  • Plan: `tmp/runfunc-refactor/2026-05-02-plan.md` (untracked, in branch tree)

WithDrain is now an adapter (adapters.Draining); core lifecycle has no drain knowledge.
Functionality moved to adapters.Draining and adapters.Ticker.
sigCtx can now be passed directly to Run; Draining intercepts cancellation.
- runnable.go: comment on runCancel snapshot was describing v0.1's
  failure mode (calling runCancel after waiting for drain). The actual
  defense in v0.2 is memory-model: Run overwrites the field across
  cycles, so reading it without synchronization races. Reword.
- README: drop the contrived `<-time.After` case from the Draining
  example body and pull the "always select on both" note out as prose.
  The full pattern lives in examples/ticker-with-drain.
Copy link
Copy Markdown
Contributor

@marino39 marino39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three review items worth addressing before merge. Order is severity.

1. Panic in Draining's work goroutine crashes the program (regression vs v0.1)

WithRecoverer (with_recoverer.go:35-63) installs defer recover() around r.runFunc. In the canonical composition the example demonstrates:

runnable.New(
    adapters.Draining(10*time.Second,           // runFunc, wrapped by WithRecoverer
        adapters.Ticker(2*time.Second, reconcile),
    ),
    runnable.WithRecoverer(...),
)

the recover runs on the goroutine that calls r.runFunc(ctx) (runnable.go:111). But Draining spawns a second goroutine at adapters/draining.go:38:

go func() { done <- work(workCtx) }()

A panic inside work (the Ticker, the user's tick body, anything below Draining) happens on that goroutine. Go's recover only fires on the goroutine where the deferred recover lives. The runtime crashes the whole process.

This is a behavioral regression: on master, WithRecoverer caught tick panics because there was no extra goroutine between the recover and the work. It also contradicts the new example at examples/ticker-with-drain/main.go:50-55, which composes exactly this shape and visibly markets WithRecoverer as catching tick panics — it won't.

Fix (smallest diff): recover inside Draining's goroutine and surface the panic as an error.

go func() {
    defer func() {
        if rec := recover(); rec != nil {
            done <- fmt.Errorf("adapters: panic in draining work: %v\n%s", rec, debug.Stack())
        }
    }()
    done <- work(workCtx)
}()

Trade-off: WithRecoverer's Report / StackPrinter hooks don't fire — the panic becomes an error returned from Run. Stack is captured in the error string.

Cleaner long-term: a Recovering runFunc adapter so users can write Draining(t, Recovering(reporter, Ticker(...))), with WithRecoverer eventually deprecated in favor of it. This is consistent with PROPOSAL.md's open question about moving WithRetry/WithRecoverer to adapters — and arguably should be done as a follow-up, not a blocker.

For this PR, at minimum: recover-as-error inside Draining and update the example so users aren't misled.


2. TestTicker_FiresOnInterval is wall-clock flaky (adapters/ticker_test.go:17-33)

Sleeps 175ms with a 50ms ticker, then asserts 2 ≤ count ≤ 4. The lower bound is reasonable. The upper bound is fragile:

  • count is read after Stop returns (line 30). Stop races t.C — if a tick lands between time.Sleep returning and the inner select observing ctx.Done, count goes to 4. On a loaded CI runner with two queued ticks, count can hit 5.
  • time.Sleep itself is unbounded in practice — nanosleep can wake significantly late under load.

Passed -count=20 locally; will eventually go red on a busy GitHub Actions runner.

Fix: count signals via channel instead of asserting wall-clock arithmetic.

ticks := make(chan struct{}, 8)
tick := func(ctx context.Context) error {
    select { case ticks <- struct{}{}: default: }
    return nil
}
r := runnable.New(adapters.Ticker(20*time.Millisecond, tick))
go func() { _ = r.Run(context.Background()) }()
for i := 0; i < 3; i++ { <-ticks }   // wait until we've seen 3 ticks
require.NoError(t, r.Stop(context.Background()))

Pins the behavioral claim ("Ticker fires repeatedly on interval") without depending on wall-clock timing.


3. README's Runnable Function with Stop example teaches a bad pattern (README.md:38-69)

for {
    select {
    case <-ctx.Done():
        return nil
    default:
    }
    time.Sleep(1 * time.Second)
    fmt.Println("Running...")
}

The default: arm means the select returns immediately on every iteration, then sits in time.Sleep for a full second with no responsiveness to ctx cancellation. When Stop fires, the work doesn't observe cancellation until the in-flight sleep completes — up to a full second of avoidable shutdown latency. The example "works" only because Stop's ctx is Background.

This is exactly the pattern adapters.Ticker is built to replace cleanly, and it contradicts the README's own narrative about responsive shutdown via Draining / Stopping.

Fix: make the sleep cancellable.

for {
    select {
    case <-ctx.Done():
        return nil
    case <-time.After(time.Second):
    }
    fmt.Println("Running...")
}

Same fix applies to Runnable Function with timeout (README.md:79-87), which has the identical pattern.

Even better for didactic flow: replace this example with the same shape using adapters.Ticker — drives readers toward the new API on page one.


Priority if only some land in this PR: 1 is the only one that's a real regression. 2 and 3 are fair as follow-ups.

Panics in work ran on Draining's spawned goroutine, where the outer
runnable.WithRecoverer's defer cannot reach them — recover only fires
on the goroutine where the deferred call lives. The runtime would
crash the whole process. The example explicitly composed
Draining + Ticker + WithRecoverer and implied recovery worked; it
didn't.

Recover inside Draining's goroutine and surface the panic as an error
containing the panic value and stack trace. Update the example to
drop WithRecoverer (no longer load-bearing in this composition) and
document the trade-off: WithRecoverer's Report/StackPrinter hooks do
not fire; tick panics arrive as errors on runErr.

A future Recovering adapter could give back the rich hooks for users
who want them; deferred to a follow-up.

Adds TestDraining_RecoversPanicAsError — fails on the unfixed code
(work goroutine panic crashes the test process).
Old form asserted 2 <= count <= 4 after 175ms of wall sleep with a
50ms ticker. The upper bound flaked under load — Stop races t.C and
loaded CI runners can queue extra ticks. New form counts signals via
a channel until 3 ticks are observed, then stops. Pins the behavioral
claim ("fires repeatedly on interval") without depending on wall-clock
arithmetic.
Two pre-existing examples used `select { case <-ctx.Done(): ...; default: }`
followed by `time.Sleep(1s)`, which means Stop has to wait up to a
full second for the in-flight sleep before the loop observes
cancellation. The pattern directly contradicts the new "Adapters"
section's narrative about responsive shutdown via Draining/Stopping.

Replace with `case <-time.After(time.Second)` in the select itself so
cancellation is immediate. Also captures+defers the cancel from
context.WithTimeout in the timeout example (the discarded form was a
context-leak warning from go vet on master).
@klaidliadon
Copy link
Copy Markdown
Contributor Author

@marino39 thanks — all three addressed.

  • [Feedback #1] Panic in Draining's goroutine — Addressed in 3f29830. Recover inside the spawned goroutine, surface as a formatted error with %v on the panic value and debug.Stack() appended. WithRecoverer dropped from the example since it doesn't load-bear in this composition; doc comment on Draining + the example both call out the trade-off explicitly. A Recovering adapter remains a clean follow-up if anyone wants the rich Sentry-style hooks back.
  • [Implement WithPanicsAsErrors #2] TestTicker_FiresOnInterval flakiness — Addressed in a65853b. Counts ticks on a buffered channel and waits for 3 observations; no wall-clock arithmetic remaining. Survives -count=10 -race.
  • [recoverer - report panics in runnable functions #3] Legacy README examples — Addressed in e24dcb4. Both Runnable Function with Stop and Runnable Function with timeout use case <-time.After(time.Second): so cancellation is immediate. Picked up the pre-existing govet lostcancel diagnostic on the timeout example as a side effect (captured + deferred the cancel func) — disclosed in the commit message.

Skipped your larger suggestion to rewrite the legacy README examples in terms of adapters.Ticker — the plain Runnable form is useful pedagogically before introducing adapters. Happy to take it on if you'd prefer.

@klaidliadon klaidliadon merged commit 92b3741 into feat/drain-and-ticker May 12, 2026
@klaidliadon klaidliadon deleted the refactor/runfunc-adapter branch May 12, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants