[backend] LockBackend — abstract filesystem locks for multi-process / distributed deployments

## Background

`atomic_agents/_locks.py` uses `fcntl` filesystem locks (`AgentLock` writes a `.lock` file at the agent root and acquires it via `fcntl.flock`). Every agent call holds the lock for the duration of the run.

This works perfectly on a single box. It **breaks** the moment you try:
- Multiple processes on different hosts (NFS doesn't reliably honor `fcntl`)
- Containerized deployments where the lock dir is shared but the kernel isn't
- Cloud Run / Lambda / serverless where filesystems are ephemeral
- Redis-backed scale-out where locks should be Redis advisory locks

This is the most urgent of the protocol-pattern abstractions — every other primitive has a single-box workaround, but locks are the cliff for multi-process deployments. Quote from internal scaling review (2026-05-08): "the one that actually breaks first if anyone tries to run atomic-agents on more than one box."

## Why it matters

Tier 1 of the framework is single-tenant single-box. Tier 2 is multi-process or multi-host. Without a `LockBackend` protocol, Tier 2 is structurally impossible without forking the framework or replacing every lock site individually.

Concrete users blocked: Meridian wants to run atomic-agents-driven workflows on Cloud Run. Bishop's gizmo deployment wants to run multiple agents in parallel without race conditions on shared memory. Any future SaaS deployment.

## What to change

Mirror the `MemoryBackend` pattern (#57):

1. New module `atomic_agents/locks/` with `backend.py` (Protocol) and `filesystem.py` (default `FilesystemLockBackend` wrapping current `fcntl` logic).
2. `LockBackend` protocol exposes: `acquire(name, timeout)`, `release(handle)`, `is_held(name)`, capability advertisement (single-host vs distributed).
3. Replace direct `AgentLock` instantiation in `agent.py`, `dream.py`, and any other lock site with `agent.lock_backend.acquire(...)`.
4. Backend registry: `register_backend("filesystem", FilesystemLockBackend)`. Future `RedisLockBackend` and `PostgresAdvisoryLockBackend` plug in identically.
5. Spec doc `docs/spec/21-lock-backend.md` describing the protocol + acceptable backends.

## Acceptance

- All existing dream/agent tests pass with `FilesystemLockBackend` as default.
- Protocol conformance test suite (~15 tests) — `acquire returns handle`, `concurrent acquires of same name block one`, `release releases`, `acquire with timeout=0 raises if held`, `is_held reflects state`, etc. Reusable for any future backend.
- A Redis-shaped mock backend implements the protocol correctly to prove distributed-shaped locks fit the contract.

## Open questions for design

- Lock granularity: agent-level (current) vs note-level vs run-level. Memory backend's optimistic concurrency (`expected_content_sha256`) reduces some lock pressure — does that change the granularity story?
- Reentrancy: current `fcntl` lock is per-process; Redis locks would need explicit reentrancy. Protocol contract?
- Lease + heartbeat: Redis advisory locks need TTL + renewal. How does that surface in the protocol without leaking Redis-isms?

## Context

- Surfaced in scaling review of post-#57 hardcoded items (2026-05-08), item #7 in the list
- Blocking dependency for any multi-process atomic-agents deployment
- Pattern reference: `MemoryBackend` from PR #57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[backend] LockBackend — abstract filesystem locks for multi-process / distributed deployments #60

Background

Why it matters

What to change

Acceptance

Open questions for design

Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[backend] LockBackend — abstract filesystem locks for multi-process / distributed deployments #60

Description

Background

Why it matters

What to change

Acceptance

Open questions for design

Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions