Skip to content

Simplify posixage lock acquisition API #525

@Benehiko

Description

@Benehiko

Problem

The shape of the API conflates two unrelated concepts:

  1. LockTimeout is a wait duration — exactly what context.Context
    exists for. Callers already pass a ctx to Save, Get, etc. Adding
    WithLockTimeout on top means callers now have two ways to express
    "how long am I willing to wait" — one via the ctx they pass per-call,
    and one via a store-level option configured at construction. This
    becomes ambiguous when both are set (the implementation picks
    whichever fires first), and it forces callers to think about lock
    waits separately from operation deadlines.

  2. StaleLockTimeout is a file-age threshold, not a wait duration —
    it controls when a .posixage.lock file is considered abandoned and
    may be removed. It's a semantically distinct concept that genuinely
    belongs as a store-level option.

Bundling both into a flock.Config struct passed to every TryLock call
obscures this distinction and adds API surface (a struct, two options,
two new validation paths) for capability that is already expressible via
context.Context.

Proposal

Drop WithLockTimeout and the flock.Config struct entirely. Lock
acquisition retries until the caller context is canceled — callers that
want a bounded wait pass context.WithTimeout(ctx, …); callers that want
no timeout (the disable-timeout use case the PR was motivated by) pass a
context without a deadline.

Keep WithStaleLockTimeout as the only locking knob, since it controls
a file-age threshold rather than a wait duration. The single-attempt
stale-recovery trigger moves from "after LockTimeout expires" to "after
the first failed acquisition attempt" — semantically the same, but no
longer dependent on a separately-configured wait duration.

Resulting API surface

// flock package
func TryLock(ctx context.Context, root *os.Root, staleAfter time.Duration) (UnlockFunc, error)
func TryRLock(ctx context.Context, root *os.Root, staleAfter time.Duration) (UnlockFunc, error)

// posixage store
posixage.WithStaleLockTimeout(d time.Duration) Options  // unchanged in spirit
// posixage.WithLockTimeout — removed

Trade-off

Callers lose the ability to set a short lock-retry while keeping a long
overall operation deadline (e.g. "this Save can run for 30s but fail
the lock acquisition after 100ms"). This isn't a real use case in
practice: if the lock is held by a live process for longer than 100ms,
stale recovery won't help (the file isn't actually stale), so the caller
would just fail. The previous 100ms default was effectively a fail-fast
heuristic, which the new first-attempt-then-recover flow expresses more
directly.

Why this matters now

Doing this refactor before #521 merges (or as a follow-up before it
ships externally) avoids landing an API we already know we want to
remove. WithLockTimeout would become a deprecation-and-removal cycle
otherwise.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions