Context
Zilla integration tests run the engine and the k3po zilla-transport side by side. When a binding's attach() initiates outbound writes (e.g., MCP cache hydration calling manager.start()), those writes target an external worker's data{N} ring-buffer file that k3po owns. Because k3po doesn't open accepts and create those files until the test body invokes k3po.finish(), the engine can race ahead and write into a partially-initialized buffer, leaving the engine worker spinning forever in ManyToOneRingBuffer.claimCapacity.
A localized workaround landed in commit f873afa (zilla.binding.mcp.cache.start.delay, default PT0S, configured to PT0.2S in McpProxyCacheIT). It papers over the race for MCP by deferring manager.start() past a timer-wheel tick. This issue proposes a clean architectural fix in the engine that lets tests serialize binding startup against any external initialization signal — k3po being the immediate use case but not the only possible one.
Proposal
1. Split Engine.start() into two phases
public interface Engine extends AutoCloseable {
void init(); // start workers and boss; ring buffers active, doWork looping,
// no config read, no bindings attached.
// Idempotent: subsequent calls are no-ops.
void start(); // if init() has not been called, call it implicitly first.
// Then read config and attach bindings (bindings' attach() runs
// as today, including any synchronous runtime work like
// manager.start()).
}
Because start() implicitly invokes init() when needed, existing callers (Zilla CLI start command, embedded users) keep working unchanged. The new value comes when a caller wants to do something between init() and start() — for example, wait for an external system to be ready — in which case they call them in two steps explicitly.
In production, engine.start() is called immediately (no explicit init() first) and behavior is observably identical to today. No binding code changes are required — bindings keep doing their work in attach().
2. EngineRule adds an injectable hook between init() and start()
public final class EngineRule {
private Runnable beforeStart = () -> {};
public EngineRule beforeStart(Runnable hook) {
this.beforeStart = hook;
return this;
}
// inside Statement.evaluate():
engine.init();
Thread t = new Thread(() -> {
beforeStart.run(); // default no-op; with external coordinator, blocks
engine.start(); // attach bindings; outbound writes hit ready files
});
t.start();
try {
base.evaluate(); // test body
} finally {
t.join();
engine.close();
}
}
Defaults preserve current single-call behavior. The name beforeStart reflects the hook's position in Zilla's engine lifecycle without baking in any assumption about what it coordinates with.
Behavior matrix
| Scenario |
beforeStart |
Engine flow |
Production (CLI, embedded — engine.start() only) |
n/a |
start() implicitly runs init(), then attach (unchanged) |
| Test without external coordinator |
not set, defaults to no-op |
init() → no-op → start() (same as today) |
| Test with external coordinator (e.g. k3po) |
beforeStart(coordinator::await) |
init() → watcher waits → start() |
Acceptance criteria
Follow-up
Once this lands and k3po exposes a corresponding await() (tracked separately in aklivity/k3po), McpProxyCacheIT can be migrated to .beforeStart(k3po::await) and zilla.binding.mcp.cache.start.delay removed. That follow-up will be filed as a separate PR against this repo after both prerequisites land.
References
- f873afa — the localized
cache.start.delay workaround that this issue's follow-up will retire.
Context
Zilla integration tests run the engine and the k3po zilla-transport side by side. When a binding's
attach()initiates outbound writes (e.g., MCP cache hydration callingmanager.start()), those writes target an external worker'sdata{N}ring-buffer file that k3po owns. Because k3po doesn't open accepts and create those files until the test body invokesk3po.finish(), the engine can race ahead and write into a partially-initialized buffer, leaving the engine worker spinning forever inManyToOneRingBuffer.claimCapacity.A localized workaround landed in commit f873afa (
zilla.binding.mcp.cache.start.delay, defaultPT0S, configured toPT0.2SinMcpProxyCacheIT). It papers over the race for MCP by deferringmanager.start()past a timer-wheel tick. This issue proposes a clean architectural fix in the engine that lets tests serialize binding startup against any external initialization signal — k3po being the immediate use case but not the only possible one.Proposal
1. Split
Engine.start()into two phasesBecause
start()implicitly invokesinit()when needed, existing callers (Zilla CLIstartcommand, embedded users) keep working unchanged. The new value comes when a caller wants to do something betweeninit()andstart()— for example, wait for an external system to be ready — in which case they call them in two steps explicitly.In production,
engine.start()is called immediately (no explicitinit()first) and behavior is observably identical to today. No binding code changes are required — bindings keep doing their work inattach().2.
EngineRuleadds an injectable hook betweeninit()andstart()Defaults preserve current single-call behavior. The name
beforeStartreflects the hook's position in Zilla's engine lifecycle without baking in any assumption about what it coordinates with.Behavior matrix
beforeStartengine.start()only)start()implicitly runsinit(), then attach (unchanged)init()→ no-op →start()(same as today)beforeStart(coordinator::await)init()→ watcher waits →start()Acceptance criteria
Engine.init()andEngine.start()are separate API methods.Engine.init()is idempotent.Engine.start()implicitly callsEngine.init()if it has not been called yet, preserving today's behavior for all existing callers (CLIstartcommand, embedded API users, anything calling onlystart()).EngineRule.beforeStart(Runnable)builder method exists and defaults to a no-op.Follow-up
Once this lands and k3po exposes a corresponding
await()(tracked separately in aklivity/k3po),McpProxyCacheITcan be migrated to.beforeStart(k3po::await)andzilla.binding.mcp.cache.start.delayremoved. That follow-up will be filed as a separate PR against this repo after both prerequisites land.References
cache.start.delayworkaround that this issue's follow-up will retire.