Skip to content

b-erdem/lockstep

Repository files navigation

Lockstep

CI Hex.pm Hexdocs License

Coyote-style controlled concurrency testing for the BEAM.

You write what looks like a regular ExUnit test. Lockstep runs it many times, each time picking a different message-passing schedule via a configurable strategy. When it finds a bug, it prints the schedule that caused it and saves it to disk so you can reproduce.

Contents

Installation

Add to your mix.exs:

def deps do
  [
    {:lockstep, "~> 0.1.0", only: :test}
  ]
end

Status: 0.1.0 / initial release. API is stable for the surface documented here, but expect breakage as we tighten things and respond to real-world usage. SemVer applies starting at 1.0.

Using Lockstep with AI tools

Lockstep ships with first-class AI integration:

  • AGENTS.md — guide for AI coding agents (Claude Code, Cursor, Copilot) to invoke Lockstep via the CLI. Documents the input/output contract for mix test / mix lockstep.replay / mix lockstep.shrink.
  • skill/lockstep/SKILL.md — Anthropic Skill format. Loadable by the Anthropic Agent SDK so agents understand when and how to use Lockstep.
  • mcp_server/ — Model Context Protocol server. Wraps mix commands as MCP tools (lockstep_test, lockstep_replay, lockstep_shrink, lockstep_dump_trace). Plug into Claude Code via claude_desktop_config.json.

The killer agentic workflow:

  1. Agent reads code, hypothesizes a race
  2. Agent generates a Lockstep test file
  3. Agent calls lockstep_test (MCP) — Lockstep runs N iterations
  4. If a bug is found, agent reads the trace via lockstep_replay
  5. Agent generates a fix
  6. Agent re-runs lockstep_test with the same seed to verify

Need this on your system?

If you maintain a high-stakes BEAM codebase — a payment processor, a distributed coordinator, a high-throughput pipeline, a custom GenServer that must be linearizable — and you want a thorough concurrency audit rather than a one-off bug report, I take on this work directly.

That includes: applying Lockstep to your code, writing custom test scaffolds and models, running multi-node race-hunting campaigns, hardening hot paths, and delivering a written report with reproducible counterexamples and fixes.

baris@erdem.dev

Why bother

Concrete numbers from test/value_prop/lost_update_comparison_test.exs running the same lost-update race two ways: ordinary GenServer.call in a tight loop (200 runs) vs Lockstep.GenServer.call under PCT (5 seeds × up to 200 iterations).

Vanilla GenServer.call × 200 Lockstep PCT × 5 seeds
Bug detected 198/200 runs (99%) 5/5 seeds (100%)
Time to first detection ~3.6 ms wall-clock iterations 1, 2, 2, 4, 15
Failing schedule recorded No Yes (.lockstep file)
Failing schedule prints in human form No Yes (alias-mapped trace)
Same input → same failing schedule No Yes (same seed → same iteration → identical trace)
Replay on demand No Yes (Lockstep.Replay.run/2)

Vanilla testing finds this bug just fine — running the body in a tight loop catches it 99% of the time. What it can't do is hand you a specific failing schedule that you can stare at, save, debug, and replay. Each failing run is an opaque "the counter ended at 1 instead of 2" with no record of which interleaving caused it.

Lockstep's value isn't finding races — multi-core BEAM finds them too, eventually. It's giving you a reproducible handle to a specific failing schedule:

  • Same seed → same iteration → byte-identical trace. Three runs of the lost-update test with seed: 0xC0DE_C0FFEE all find the bug at iteration 2 with structurally identical traces.
  • Trace prints as a human-readable schedule with aliased pids (P0, P1, ...), step numbers, and a <-- FAILED HERE marker.
  • Replay re-fires the bug through Lockstep.Replay.run/2 so you can attach a debugger / add IO.inspect / step through.

Vanilla assert_raise in a loop tells you the test is flaky. Lockstep tells you which interleaving makes it flaky.

Which strategy?

Lockstep ships five strategies (:random, :pct, :fair_pct, :pos, :idpct) plus :replay for trace-driven reruns. They behave noticeably differently as race depth grows. From test/value_prop/strategy_comparison_test.exs on a 3-subscribe + 1-publish ordering race, 10 seeds × up to 100 iterations each:

strategy bug found min iter median mean max
:random 10/10 1 3.0 3.5 7
:pct 10/10 1 1.0 1.4 2
:fair_pct 10/10 1 1.0 1.4 2
:pos 10/10 1 1.5 1.8 3

Lower iteration counts mean the strategy steered toward the bug faster. PCT and Fair-PCT find this race in 1-2 iterations every time; random takes ~3.5 on average. POS sits between them. On shallower races (the depth-2 lost-update demo) all four strategies tie at iteration 1 — the differentiation only shows up once the race has real depth.

Default is :pct because it gives the most consistent low-iteration finds across both shallow and deep races. Switch to :fair_pct if your code has spin loops where pure PCT can starve a low-priority process; switch to :pos if PCT plateaus on a particular bug shape.

What's shipping

Engine

  • Lockstep.Test — ExUnit case template with a ctest/3 macro that runs the body N times.
  • A user-level scheduler (Lockstep.Controller) that intercepts every Lockstep sync point and decides who runs next.
  • Five strategies: :random, :pct, :fair_pct, :pos, :idpct.
    • PCT is Burckhardt et al., ASPLOS 2010 — bug-depth probabilistic guarantees.
    • Fair-PCT is Coyote's hybrid: PCT then random, for liveness.
    • POS is Yuan et al., OOPSLA 2025 (Fray) — re-randomizes priorities each step.
    • IDPCT is iterative-deepening PCT — cycles depth in a configured range per iteration.
  • Trace-driven replay: :replay strategy + Lockstep.Replay.run/2.

Test surface

  • Selective receive via Lockstep.recv_first/1 — predicate-based pattern matching over the controller-side mailbox.
  • Lockstep.GenServerstart_link/2 / call/2 / cast/2 / stop/2 wrapper for ordinary GenServer-style modules. Returns {:ok, pid} so OTP-shaped patterns ({:ok, srv} = ...) work identically after rewriting.
  • Lockstep.GenStatem — minimal :gen_statem-shaped wrapper (handle_event_function mode + :reply actions).
  • Lockstep.Taskasync/1 / await/1 / await_many/1.
  • Virtual timeLockstep.now/0, Lockstep.send_after/3, Lockstep.cancel_timer/1. Time advances only when no managed process is ready and a timer is the only way forward.
  • Process monitorsLockstep.monitor/1, Lockstep.demonitor/2, with :DOWN messages delivered through the controller's mailbox so monitor-based crash detection (NimblePool, Phoenix.PubSub, supervisor patterns) works under controlled scheduling. Monitored crashes don't trigger the child-crash bug verdict — the monitor is treated as "I expected this death."
  • Links + :trap_exitLockstep.spawn_link/1, Lockstep.link/1, Lockstep.unlink/1, Lockstep.flag/2. The controller maintains the link graph and propagates abnormal exits with cycle detection: non-trapping linkers cascade-die, trapping linkers receive {:EXIT, dying_pid, reason}, normal exits don't propagate. Lets Lockstep run real OTP code that depends on link semantics (NimblePool's worker monitoring, Task supervision, GenServer trees).
  • Process.alive?Lockstep.alive?/1 consults the controller's view; the call is itself a sync point so the classic if Process.alive?(pid), do: GenServer.call(pid, ...) TOCTOU race surfaces (see test/examples/toctou_alive_test.exs — Lockstep finds it on iteration 3 under PCT).
  • RegistryLockstep.Registry with :unique / :duplicate keys, monitor-driven automatic cleanup on registering-process death, dispatch/3 broadcast, and the full :via callback contract so Lockstep.GenServer.start_link(M, A, name: {:via, Lockstep.Registry, {reg, :foo}}) works. Most-common-subset implementation; partitioning, listeners, and match/4 patterns aren't modeled.
  • SupervisorLockstep.Supervisor, :one_for_one strategy, built on spawn_link + trap_exit. Restart options :permanent, :transient, :temporary. Enforces max_restarts / max_seconds intensity using Lockstep's virtual clock so restart-loop limits are reproducible across replays.
  • Compile-time linter (Lockstep.Linter) — warns on bare send/receive/spawn, plain GenServer.call/Task.async/ Process.send_after, and ExUnit assert_receive/refute_receive inside ctest bodies.
  • Compile-time rewriter (Lockstep.Rewriter) — automatically maps vanilla GenServer.call, Task.async, send, spawn, Process.send_after, and receive blocks to their Lockstep.* equivalents inside a test body. Lets you write vanilla-OTP-shaped tests and still get controlled scheduling. Enable per-module via use Lockstep.Test, rewrite: true, or per-test via Lockstep.Test.vanilla_run/2.
  • Mix compiler (Mix.Tasks.Compile.LockstepRewrite) — applies the same rewriter at the project level: every lib/**/*.ex is rewritten to a build directory before the standard Elixir compiler runs. Lets you point Lockstep at unmodified production modules in your own application. Opt in via env-specific compilers: / elixirc_paths: settings (see "Project-level rewriting" below).
  • Erlang AST rewriter (Lockstep.ErlangRewriter) — same vanilla-OTP-routing trick, but on .erl source. Handles gen_server:call, erlang:spawn, Pid ! Msg, bare receive, process_flag(:trap_exit, ...), the bare BIF auto-imports (spawn(F)), and 4-arg gen_server:start_link({local, Name}, ...) atom registration. Lets us pull in foundational Erlang libraries whose gen_server machinery is the actual concurrency surface (:pg, supervisors, gen_statem, :ssl's session manager, etc.). Demonstrated on OTP's pg.erl (~720 LOC of pure Erlang, the foundation of Phoenix.PubSub) — compiles through the rewriter and runs under Lockstep's controlled scheduling.
  • Lockstep.Agent — drop-in for OTP Agent. Without it, Agent.get/3-4 bypasses Lockstep's controller entirely (it's a bare :gen_server.call); with it, every agent operation is a sync point. Necessary for finding races in Agent-backed code.
  • Lockstep.Task / Lockstep.Task.Supervisor — extended to cover Task.async_stream/3,4 (honouring :max_concurrency), Task.async/3 (MFA form), and Task.Supervisor.{start_link, async, start_child, async_stream}.
  • NIF-backed shared state (Lockstep.ETS, Lockstep.Atomics, Lockstep.PersistentTerm) — ETS / :atomics / :persistent_term ops are individually atomic but compose racefully (the classic lookup; insert(v+1) lost-update). Each wrapper inserts a sync point before delegating to the underlying NIF, so the strategy can interleave between calls. The rewriter routes :ets.*, :atomics.*, :persistent_term.* calls through these wrappers. Outside a Lockstep test (no controller in scope) the wrappers fall through to the underlying NIF — same code runs identically in production.

Tooling

  • Trace capture as :erlang.term_to_binary to .lockstep files.
  • mix lockstep.replay --trace path — print the saved schedule + repro recipe.
  • mix lockstep.replay --trace path --run "Mod.fn" --file path.exs — re-run a specific test body under the recorded schedule.
  • mix lockstep.dump_trace --trace path — raw dump for tooling.
  • Deterministic seeds: same top-level seed reproduces the same iterations.

What's not shipping yet

  • No DPOR / exhaustive search. Lockstep is a randomized tester, not a verifier. Concuerror integration is on the roadmap.
  • No distributed Erlang. Single-node only.
  • No async: true ExUnit modules. use Lockstep.Test forces async: false.
  • No live iteration progress. A long-running iterations: 10_000 run is silent until it either finishes or finds a bug.
  • Lockstep.GenStatem is handle_event_function-mode only; :state_functions mode and built-in timeout actions aren't wrapped.

Quick start

# mix.exs
{:lockstep, "~> 0.1.0", only: :test}

Write a controlled test:

defmodule MyApp.RaceTest do
  use Lockstep.Test, iterations: 200, strategy: :pct

  ctest "lost update on shared counter" do
    parent = self()
    state = Lockstep.spawn(fn -> stateful_loop(0) end)

    for _ <- 1..2 do
      Lockstep.spawn(fn ->
        Lockstep.send(state, {:get, self()})
        v = Lockstep.recv()
        Lockstep.send(state, {:put, v + 1})
        Lockstep.send(parent, :done)
      end)
    end

    for _ <- 1..2, do: Lockstep.recv()

    Lockstep.send(state, {:get, self()})
    assert Lockstep.recv() == 2
  end

  defp stateful_loop(n) do
    case Lockstep.recv() do
      {:get, from} -> Lockstep.send(from, n); stateful_loop(n)
      {:put, x}    -> stateful_loop(x)
    end
  end
end

Run it:

mix test test/my_app/race_test.exs

Expected output on a buggy schedule:

Lockstep found a concurrency bug on iteration 2.

  seed:        828370911214
  iter seed:   1417509209
  strategy:    :pct
  trace path:  traces/counter_race-iter2-seed828370911214.lockstep

Reason:
  assertion failed: lost update detected; counter is 1

Processes:
  P0(root) = #PID<0.186.0>
  P1 = #PID<0.187.0>
  ...

Schedule:
  step    6  send    P3 -> P1  {:get, ...}
  step    8  send    P2 -> P1  {:get, ...}
  step   11  recv    P3  0
  step   18  recv    P2  0    # both workers read the same value!
  ...

Replay with:
    mix lockstep.replay --trace traces/counter_race-iter2-seed828370911214.lockstep

Showcase: bug classes Lockstep finds

The test/examples/ directory contains worked demos. Each is a side-by-side of a buggy implementation that Lockstep flags within tens of iterations and a fixed implementation that it explores at length without finding a problem.

Atomicity violation: lost-uniqueness in a job queue

test/examples/unique_jobs_test.exs. A textbook bug from the Elixir job-queue ecosystem: naive submit_unique does a lookup then an insert, which under concurrency lets two callers both observe the key as missing and both insert. DB-backed queues have hit this exact race historically; the fix is always atomic upsert / INSERT ... ON CONFLICT.

# Buggy: split read+write — Lockstep finds duplicate submissions on iter ~15.
case Lockstep.GenServer.call(db, {:lookup, key}) do
  nil -> Lockstep.GenServer.call(db, {:insert, key, val}); :submitted
  _   -> :already_exists
end

# Fixed: single atomic call serialized by the GenServer — 100/100 iters pass.
Lockstep.GenServer.call(db, {:put_new, key, val})

The trace pinpoints the failure: both workers call :lookup (steps 6 and 8), both receive nil (steps 11 and 15), both call :insert.

Atomicity violation: read-modify-write on shared state

test/examples/genserver_race_test.exs. Two workers each do Counter.get then Counter.put(val + 1). With N=2 workers expecting the counter to reach 2, some interleavings let both reads happen first, both writes set the counter to 1, and one update is lost. PCT finds it on iteration 4.

The control test in the same file uses cast({:add, 1}) instead — the GenServer serializes the read-modify-write — and Lockstep proves it race-free across all explored schedules.

Deadlock: circular GenServer calls

test/examples/circular_call_deadlock_test.exs. Two Echo GenServers wired up as peers. Each one's handle_call(:echo, ...) synchronously delegates to its peer. Two concurrent callers — one to A, one to B — and both servers end up mid-handle_call waiting for the other to reply. Classic A-waits-for-B-waits-for-A.

Lockstep's check_progress step notices that every alive process is blocked on recv_match with no pending timer that could unblock anyone, and aborts the iteration with :lockstep_deadlock. The trace shows the exact sequence of gen_calls that closed the cycle.

This bug class is hard to catch with property-based testing because it needs a specific interleaving (both callers in flight before either peer-call returns) — Lockstep's PCT strategy finds it on iteration 1.

Auto-rewrite: vanilla OTP code, controlled scheduling

test/examples/auto_rewrite_demo_test.exs. Same lost-update race, written entirely with vanilla GenServer.call, Task.async, Task.await_many — no Lockstep.* calls in the test body. The Lockstep.Test.vanilla_run/2 macro walks the body's AST and rewrites every recognized OTP call to its Lockstep equivalent before handing it to the runner.

test "auto-rewrite finds the race in vanilla OTP code" do
  assert_raise Lockstep.BugFound, fn ->
    Lockstep.Test.vanilla_run iterations: 100, strategy: :pct, seed: 1 do
      {:ok, c} = GenServer.start_link(Counter, 0)

      tasks =
        for _ <- 1..2 do
          Task.async(fn ->
            v = GenServer.call(c, :get)
            :ok = GenServer.call(c, {:set, v + 1})
          end)
        end

      Task.await_many(tasks)

      final = GenServer.call(c, :get)
      if final != 2, do: raise "lost update; counter is #{final}"
    end
  end
end

Lockstep finds the race on iteration 2 with the same trace it would have produced for the manually-written equivalent. Useful when you want test bodies to read like ordinary Elixir, when you're porting an existing flaky test to Lockstep, or when you want to demonstrate a race in a vanilla snippet without forcing readers to learn the Lockstep.send/Lockstep.recv API up-front.

The rewriter handles GenServer.{call, cast, start_link, stop}, Task.{async, await, await_many}, send, Kernel.send, :erlang.send, spawn, spawn_link, Process.send_after, Process.cancel_timer, and receive do clauses end (translates to Lockstep.recv_first + case). Helper functions defined outside the test body are not rewritten by vanilla_run/ctest rewrite: true — inline race code into the body, or use the project-level Mix compiler below to rewrite an entire lib/.

Project-level rewriting (Mix compiler)

To test unmodified modules in your lib/ (or in a dep) without changing source, plug Lockstep into your test compile pipeline:

# mix.exs
def project, do: [
  ...
  compilers: compilers(Mix.env()),
  elixirc_paths: elixirc_paths(Mix.env()),
  lockstep_rewrite: lockstep_rewrite(Mix.env()),
]

defp compilers(:test), do: [:lockstep_rewrite] ++ Mix.compilers()
defp compilers(_),     do: Mix.compilers()

defp elixirc_paths(:test), do: ["_build/test/lockstep_rewritten/lib", "test/support"]
defp elixirc_paths(_),     do: ["lib"]

defp lockstep_rewrite(:test), do: %{paths: ["lib/**/*.ex"]}
defp lockstep_rewrite(_),     do: nil

In :test, every matching .ex is rewritten under _build/test/lockstep_rewritten/lib/ and the standard Elixir compiler picks it up from there. Production / dev compilation is untouched — no Lockstep.* runtime cost.

The same machinery works for dep code: include "deps/<dep>/lib/**/*.ex" in the paths: list. The rewriter handles GenServer.{call, cast, start_link, stop, reply}, Task.{async, await, await_many}, send, spawn, spawn_link, Process.{send_after, cancel_timer, monitor, demonitor}, and receive blocks. Process.flag(:trap_exit, …) and :erlang.spawn/3 (MFA form) are left alone.

End-to-end NimblePool dep-rewrite demo: test/phase2_dep_rewrite_test.exs.

CubDB historical race (commit 60fb38f)

test/examples/cubdb_compaction_race_test.exs. Models the race upstream CubDB fixed in commit 60fb38f ("Fix race condition during compaction"). Quoting the maintainer:

The compaction process actually happens in two phases: actual compaction, and catch-up. The previous implementation was starting the catch-up phase asynchronously, introducing a race condition: right after actual compaction completed, and before catch-up started, there was the possibility to erroneously start another concurrent compaction.

The shape: a multi-phase operation guarded by a busy flag. Phase 1 completion clears busy AND queues phase 2 as a separate message. Between those two events a new operation passes the busy? check and starts a concurrent compaction. Phase 2 then sees state overwritten by the second compaction.

Detection in our model: each compaction has a unique ID, written to state.current_id on start_compaction. Phase 2 verifies the ID it holds matches state.current_id; if a second compaction has run in between and overwritten the field, the assertion fires.

Sweep result over 4 strategies × 5 seeds × 200 iterations each:

strategy finds the race typical iter
:pos 5/5 seeds 1, 1, 3, 5, 6
:random 5/5 seeds 1, 3, 3, 4, 9
:fair_pct 1/5 seeds 53
:pct 0/5 seeds n/a

PCT's bug-depth heuristic happens to bias away from this particular race shape — the change-point insertion at d-1 random steps doesn't align with the phase-1/phase-2 mailbox window. POS's per-step priority resampling finds it consistently. A real consequence of having multiple strategies bundled.

The fix in the model (and upstream): keep busy=true until BOTH phases complete. The intermediate mailbox state — phase 2 queued but not run — no longer admits a new :start_compaction. Verified race-free across the same 200 iterations × :pct.

Honeydew historical race (commit 41831c8)

test/examples/honeydew_status_race_test.exs. Models the race upstream Honeydew fixed in commit 41831c8:

Fixing race condition with Honeydew.status/1 where monitors could die before being queried for their current job.

The original buggy pattern:

busy_workers =
  queue
  |> get_all_members(:monitors)
  |> Enum.map(&GenServer.call(&1, :current_job))

Race: between get_all_members/2 returning a list of monitor pids and GenServer.call/2 reaching each one, a monitor can exit. The call raises :exit, crashing the entire status/1 invocation.

We can't compile and run upstream Honeydew under Lockstep — it's a full OTP application with :pg, supervisors, and Mnesia plumbing beyond our rewriter's reach — but the race shape is universal. The minimal reproduction (3 monitors, one killed concurrently with the iteration) fails on iteration 2 under PCT. The fixed pattern (try/catch :exit, _ around each call) passes 200 iterations.

The point isn't that we found this bug — the maintainer fixed it in 2017. The point is that the same pattern in any monitor-based library would surface under Lockstep, and the failing schedule is a one-line change in user code (the missing try/catch) away from a clean test.

Real bug found: Phoenix.Tracker's pre-fc5686f ETS race

test/phoenix_tracker_prefix_race_test.exs. Lockstep finds the race fixed upstream in fc5686f ("Update ETS atomically on tracker update").

Pre-fix Tracker.update did State.leave then put_presence — two ETS writes back-to-back. A concurrent Tracker.list could observe the table in between, with the entry temporarily missing. This is exactly the kind of race that's invisible at the message-passing level — it lives at the ETS NIF level.

With Lockstep.ETS providing sync points at every ETS call, the strategy can interleave Tracker.list between the leave and the put. POS finds it on iteration 1. The whole-of-Tracker plus Phoenix.PubSub plus rewritten OTP :pg (~3,700 LOC) runs unmodified, with sync points injected at NIF boundaries by our rewriter.

Real bug found in NimblePool's history

Lockstep finds a real race in NimblePool's checkout! timeout handling, fixed upstream in commit e18f45f ("Cancel requests on client timeout").

test/nimble_pool_timeout_race_test.exs takes the pre-fix NimblePool source (just one commit before the fix), compiles it through Lockstep.MixCompiler, and runs a realistic scenario:

  1. Capacity-1 pool
  2. Holder client checks out and refuses to release
  3. Waiter client calls checkout!(pool, ..., 50ms) and uses try/catch :exit, _ to handle the deadline (real production pattern)
  4. Waiter's timeout fires
  5. Holder is then told to release

Under wall-clock testing this race rarely surfaces — it depends on the scheduler interleaving the timeout and the holder's release in a particular way. Lockstep's PCT strategy finds it on iteration 1, producing the exit:

child crashed: exit(:unexpected_remove)

The pool process itself dies. In production, anyone calling checkout! with a try/catch :exit, _ on this version would intermittently bring down their entire pool.

The upstream fix has two parts: send a cancel message from the client before exiting, and turn the :unexpected_remove crash into a no-op for late-arriving removals. With the fix applied (current NimblePool releases) Lockstep's same scenario runs clean.

Real-world race patterns

Beyond the libraries the rewriter handles directly, the real-world-patterns suite reproduces bug shapes seen in production BEAM systems in standalone form, exercising Lockstep's Registry / Supervisor / link / monitor support.

test/examples/real_world_patterns/cache_thundering_herd_test.exs. The classic "lookup → miss → populate → put" anti-pattern. N concurrent clients each see a miss, each populate, end up overwriting each other. Lockstep finds it on iteration 1 under PCT. The locked-populate fix runs 100 iterations clean.

test/examples/real_world_patterns/pubsub_fanout_ack_test.exs. PubSub broadcast that waits for acks from every subscriber. If one subscriber dies between Registry.lookup and acking, the broadcaster blocks forever. Found on iteration 1 under PCT. The fix is monitor-based ack accounting — treat :DOWN as a counted ack — and runs clean across 50 iterations.

These patterns are what Lockstep.Registry, Lockstep.Supervisor, Lockstep.spawn_link, Lockstep.flag(:trap_exit, true), and Lockstep.alive? were built for. Each lives in the bug zoo regression suite so future engine changes can't quietly break their findability.

Real libraries verified under Lockstep

Beyond the bug-class showcase above, Lockstep has been pointed at production BEAM libraries — both pre-fix versions to confirm it catches known races, and current versions to gather verification evidence ("no race in N controlled schedules").

Library Mode Result Test
Hammer rewrite + run Real bugAtomic.FixWindow.inc/4 lost-update vs hit/5, found under PCT test/hammer_test.exs
Phoenix.Tracker (pre-fc5686f) rewrite + run Real bug — ETS race fixed in fc5686f; POS finds it iter 1 test/phoenix_tracker_prefix_race_test.exs
NimblePool (pre-e18f45f) rewrite + run Real bug — checkout timeout race; PCT finds it iter 1 test/nimble_pool_timeout_race_test.exs
Phoenix.PubSub (current) rewrite + run Verified clean — 70 PCT iterations across 3 race-hunt scenarios test/phoenix_pubsub_race_hunt_test.exs
GenStage (current) rewrite + run Verified clean — single + fan-out pipeline scenarios test/gen_stage_test.exs
OTP :pg (Erlang) Erlang rewriter + run Verified clean — joins/broadcasts under interleaved scheduling test/erlang_pg_test.exs
Mutex (lud/mutex) rewrite + run Verified clean — 10,000 schedules across 5 scenarios test/real_libraries_test.exs
ConCache.Lock rewrite + run Verified clean — 4,000 schedules across 5 scenarios test/concache_lock_hunt_test.exs
NimblePool port reimplemented Verified clean — 1,200 schedules across 6 scenarios test/examples/nimble_pool_demo_test.exs
Cachex test-only Hex dep Linearizable — 200 iter mixed ops + 500 iter R-M-W test/cachex_test.exs

The two work modes:

  • rewrite + runLockstep.MixCompiler rewrites the upstream source, recompiles it, and the test drives the actual upstream API under controlled scheduling. Sync points sit at every GenServer.call, :ets.*, Process.monitor, send, receive, etc.
  • Erlang rewriter + run — same pattern but for .erl source: Lockstep.ErlangRewriter walks the Erlang abstract format and routes gen_server:call, erlang:spawn, Pid ! Msg, bare receive, etc. through the controller. Used for OTP modules like :pg, :gen_statem, supervisor internals.
  • test-only Hex dep — the library compiles unmodified; the test wraps its API in Lockstep.GenServer.call-shape sync points at the boundary. Useful when the lib is too large or macro-heavy to rewrite cleanly.

For the methodology behind picking a mode, writing the model, and sizing iterations, see METHODOLOGY.md.

Configuration

use Lockstep.Test accepts:

Option Default Meaning
:iterations 100 How many randomized schedules to try.
:strategy :pct `:random
:seed random Top-level RNG seed; per-iteration seeds derive from it.
:max_steps 1000 Hard limit on scheduling steps per iteration.
:iter_timeout 30_000 Hard wall-clock timeout per iteration (ms).

Testing methodology

For applying Lockstep to a real BEAM library — the model-based linearizability template, the compile-rewrite vs test-only-dep paths, strategy selection, fault injection, and case studies (Cachex, ConCache, lud/mutex, nimble_pool, Hammer) — see METHODOLOGY.md. Includes a real bug we found in Hammer's Hammer.Atomic.FixWindow.inc/4 (lost-update race against hit/5).

API

The runtime API for use inside ctest bodies:

Call Description
Lockstep.spawn/1 Spawn a managed child process.
Lockstep.spawn_link/1 Spawn a managed child and bidirectionally link to it (atomic).
Lockstep.send/2 Send to another managed process (recorded by controller).
Lockstep.recv/0 Receive the next message in delivery order.
Lockstep.recv_first/1 Selective receive: take first message matching a predicate.
Lockstep.now/0 Read the controller's virtual clock.
Lockstep.send_after/3 Schedule a message after delay_ms of virtual time. Returns a timer ref.
Lockstep.cancel_timer/1 Cancel a pending timer; returns ms remaining or false.
Lockstep.sleep/1 Virtual-time sleep (sentinel timer + selective receive under the hood).
Lockstep.monitor/1 Monitor a managed process; :DOWN is delivered through Lockstep's mailbox.
Lockstep.demonitor/2 Stop monitoring; :flush removes any already-delivered :DOWN.
Lockstep.alive?/1 Consult the controller's view; falls back to vanilla for unmanaged pids.
Lockstep.link/1 Bidirectionally link to another managed process.
Lockstep.unlink/1 Remove the link.
Lockstep.flag/2 :trap_exit is modeled at the controller level; other flags are placeholders.

Higher-level wrappers built on these primitives:

  • Lockstep.GenServer.{start_link, call, cast, stop} — test modules that follow the GenServer callback contract. start_link/3 accepts name: {:via, Module, term}.
  • Lockstep.Task.{async, await, await_many}Task.async-style fan-out / fan-in with selective-receive on a unique ref.
  • Lockstep.Registry.{start_link, register, unregister, lookup, dispatch, count, keys} — controller-aware key/pid registry with :via integration.
  • Lockstep.Supervisor.{start_link, which_children, count_children, start_child, terminate_child, restart_child}:one_for_one supervisor with permanent/transient/temporary restart semantics and intensity limits.

Every primitive is a sync point — every call hands control back to the controller, which picks the next process to run per the configured strategy. The send / spawn variants don't block the user-side caller for delivery semantics; the controller records / queues the action and resumes the caller at its next scheduling turn. recv/0 and recv_first/1 block until the controller delivers a message.

Lockstep.alive?/1 being a sync point is the point: TOCTOU bugs (if Process.alive?(pid), do: GenServer.call(pid, ...)) surface naturally because the strategy may interleave another process between the check and the subsequent call.

Lockstep.link/1 and friends model the BEAM link graph at the controller level. When a linked process dies abnormally, the cascade is propagated through Lockstep's link graph (with cycle detection): non-trapping linkers die transitively, and the chain terminates either at a flag(:trap_exit, true) process — which receives a synthetic {:EXIT, dying_pid, reason} message — or at the test root, which finalizes the iteration as a bug.

Selective receive

ref = make_ref()
Lockstep.send(server, {:request, self(), ref})

reply =
  Lockstep.recv_first(fn
    {^ref, payload} -> {:ok, payload}
    _              -> false   # any non-truthy means "skip"
  end)

Other messages already in the mailbox are not disturbed.

GenServer wrapper

defmodule Counter do
  def init(initial), do: {:ok, initial}
  def handle_call(:get, _from, n),         do: {:reply, n, n}
  def handle_call({:put, x}, _from, _n),   do: {:reply, :ok, x}
  def handle_cast({:add, x}, n),           do: {:noreply, n + x}
end

ctest "lost update on counter" do
  srv = Lockstep.GenServer.start_link(Counter, 0)

  for _ <- 1..2 do
    Lockstep.spawn(fn ->
      v = Lockstep.GenServer.call(srv, :get)
      :ok = Lockstep.GenServer.call(srv, {:put, v + 1})
    end)
  end

  # ... barrier ...
  assert Lockstep.GenServer.call(srv, :get) == 2
end

The wrapper itself is a thin loop on top of Lockstep.recv/Lockstep.send/ Lockstep.recv_first. See lib/lockstep/gen_server.ex for the ~80-line implementation.

Virtual time

ctest "GenServer with periodic tick" do
  pid = Lockstep.GenServer.start_link(MyTimerServer, [])
  Lockstep.send_after(pid, :start, 0)

  # Drain the test forward until 350ms of virtual time has passed.
  Lockstep.send_after(self(), :wakeup, 351)
  _ = Lockstep.recv()

  assert Lockstep.GenServer.call(pid, :tick_count) >= 3
end

Virtual time only advances when the controller would otherwise deadlock — when every managed process is blocked on receive and the only way to unblock someone is to fire the next pending timer. So Lockstep.send_after(pid, :ding, 86_400_000) followed by recv returns in milliseconds of wall-clock time, not a day. Same trick :meck/Mox let you do for :erlang.system_time, but built into the scheduler.

Reproducing a bug

Three ways, in order of laziness:

1. Re-run the suite with the original seed. The trace records both the top-level seed and the iteration. Re-run with seed: <top_seed> and iterations: >= <iter>. Per-iteration seed derivation is phash2({top_seed, iteration}), so iteration K of the same top seed always behaves identically.

2. Replay from the CLI. Extract the test body into a named function, then:

mix lockstep.replay --trace traces/foo.lockstep \
                    --run "MyApp.RaceTest.lost_update_body" \
                    --file test/my_app/race_test.exs

The mix task loads the test file (so the module becomes available), then runs Lockstep.Replay.run/2 with the schedule from the trace.

3. Replay programmatically. Same idea, written as an ordinary ExUnit test:

defmodule MyApp.RaceTest do
  use Lockstep.Test, ...

  # The body is extracted into a function so it can be called from
  # both the ctest and a one-off replay test.
  def lost_update_body do
    # ... the actual test scenario ...
  end

  ctest "lost update" do
    lost_update_body()
  end
end

defmodule MyApp.ReplayTest do
  use ExUnit.Case

  test "replay the lost-update bug" do
    Lockstep.Replay.run(
      &MyApp.RaceTest.lost_update_body/0,
      "traces/MyApp-iter2-seed828370911214.lockstep"
    )
  end
end

If the user code is fully deterministic (no raw :rand, no real I/O, no NIFs that yield asymmetrically), the replay produces a structurally identical trace and re-fires the bug. If something diverges, you get a Lockstep.ReplayDivergence with a step number and a hint.

You can also compare two traces structurally:

Lockstep.Replay.compare_traces!(original_trace, replayed_trace)

Pids and refs are sanitized before comparison since they're recreated on each run.

Shrinking a trace

A failing trace can be hundreds of events. Most of those are necessary setup but obscure the actual race. Lockstep.Shrink.shrink/3 searches for a shorter replay_pid_order that still triggers the same bug signature under strict replay, then saves a .shrunk.lockstep next to the original.

mix lockstep.shrink --trace traces/counter_race-iter5-seed42.lockstep \
                    --run "MyApp.CounterRaceTest.lost_update_body" \
                    --file test/my_app/counter_race_test.exs

Output:

Lockstep shrink succeeded:
  original schedule decisions: 87
  shrunk schedule decisions:   12
  attempts:                    156
  bug signature:               {:exception, RuntimeError}
  shrunk trace:                traces/counter_race-iter5-seed42.shrunk.lockstep

The shrunk trace replays exactly like the original — point mix lockstep.replay at it for the same bug at fewer scheduling decisions. Programmatic version:

{:ok, info} =
  Lockstep.Shrink.shrink(
    fn -> MyApp.CounterRaceTest.lost_update_body() end,
    "traces/counter_race-iter5-seed42.lockstep",
    verbose: true
  )

info.original_length  # 87
info.new_length       # 12
info.new_trace_path   # "traces/...shrunk.lockstep"

The algorithm is two-pass — truncation (shorter prefixes) then decimation (drop individual decisions) — repeated to a fixed point. Bug-signature comparison is coarse (same exception type, same deadlock category, etc.), so the shrunk trace is a cleaner repro not a byte-identical reproduction.

Long runs and progress

iterations: 10_000 is silent by default until it either finishes or finds a bug. Pass progress: to opt in:

Lockstep.Runner.run(
  fn -> MyTest.race_body() end,
  iterations: 10_000,
  strategy: :pct,
  progress: true                 # prints every ~5%
  # progress: 100                # or: every 100 iterations
  # progress: fn iter -> ... end # or: a custom hook (telemetry, ...)
)

Bug-finding regression suite

test/bug_zoo_test.exs is a curated set of known concurrency races, each declared with a (strategy, seed, max_iterations, expected_at_most) configuration. The test fails if any race takes more iterations than its baseline — catching engine regressions in bug-finding power as the controller and strategies evolve.

When making engine changes, run mix test test/bug_zoo_test.exs to confirm bug-finding hasn't slipped. Add new entries as you wire up new classes of bug.

Design

The shape mirrors Microsoft Coyote, mapped to BEAM:

Coyote (.NET) Lockstep (BEAM)
coyote rewrite (IL rewriter) Runtime Lockstep.send/recv/spawn API
TestingEngine Lockstep.Runner.run/2
IActorRuntime Lockstep.Controller GenServer
Strategies (random, PCT, ...) Lockstep.Strategy behaviour
coyote replay <trace> mix lockstep.replay

License

Apache-2.0.