Skip to content

Split Engine.start() into init() + start() with optional beforeStart hook #1807

@jfallows

Description

@jfallows

Context

Zilla integration tests run the engine and the k3po zilla-transport side by side. When a binding's attach() initiates outbound writes (e.g., MCP cache hydration calling manager.start()), those writes target an external worker's data{N} ring-buffer file that k3po owns. Because k3po doesn't open accepts and create those files until the test body invokes k3po.finish(), the engine can race ahead and write into a partially-initialized buffer, leaving the engine worker spinning forever in ManyToOneRingBuffer.claimCapacity.

A localized workaround landed in commit f873afa (zilla.binding.mcp.cache.start.delay, default PT0S, configured to PT0.2S in McpProxyCacheIT). It papers over the race for MCP by deferring manager.start() past a timer-wheel tick. This issue proposes a clean architectural fix in the engine that lets tests serialize binding startup against any external initialization signal — k3po being the immediate use case but not the only possible one.

Proposal

1. Split Engine.start() into two phases

public interface Engine extends AutoCloseable {
    void init();   // start workers and boss; ring buffers active, doWork looping,
                   // no config read, no bindings attached.
                   // Idempotent: subsequent calls are no-ops.

    void start();  // if init() has not been called, call it implicitly first.
                   // Then read config and attach bindings (bindings' attach() runs
                   // as today, including any synchronous runtime work like
                   // manager.start()).
}

Because start() implicitly invokes init() when needed, existing callers (Zilla CLI start command, embedded users) keep working unchanged. The new value comes when a caller wants to do something between init() and start() — for example, wait for an external system to be ready — in which case they call them in two steps explicitly.

In production, engine.start() is called immediately (no explicit init() first) and behavior is observably identical to today. No binding code changes are required — bindings keep doing their work in attach().

2. EngineRule adds an injectable hook between init() and start()

public final class EngineRule {
    private Runnable beforeStart = () -> {};

    public EngineRule beforeStart(Runnable hook) {
        this.beforeStart = hook;
        return this;
    }

    // inside Statement.evaluate():
    engine.init();
    Thread t = new Thread(() -> {
        beforeStart.run();    // default no-op; with external coordinator, blocks
        engine.start();       // attach bindings; outbound writes hit ready files
    });
    t.start();
    try {
        base.evaluate();      // test body
    } finally {
        t.join();
        engine.close();
    }
}

Defaults preserve current single-call behavior. The name beforeStart reflects the hook's position in Zilla's engine lifecycle without baking in any assumption about what it coordinates with.

Behavior matrix

Scenario beforeStart Engine flow
Production (CLI, embedded — engine.start() only) n/a start() implicitly runs init(), then attach (unchanged)
Test without external coordinator not set, defaults to no-op init() → no-op → start() (same as today)
Test with external coordinator (e.g. k3po) beforeStart(coordinator::await) init() → watcher waits → start()

Acceptance criteria

  • Engine.init() and Engine.start() are separate API methods.
  • Engine.init() is idempotent.
  • Engine.start() implicitly calls Engine.init() if it has not been called yet, preserving today's behavior for all existing callers (CLI start command, embedded API users, anything calling only start()).
  • EngineRule.beforeStart(Runnable) builder method exists and defaults to a no-op.
  • Full IT suite stays green with the new engine lifecycle (no functional test changes yet).

Follow-up

Once this lands and k3po exposes a corresponding await() (tracked separately in aklivity/k3po), McpProxyCacheIT can be migrated to .beforeStart(k3po::await) and zilla.binding.mcp.cache.start.delay removed. That follow-up will be filed as a separate PR against this repo after both prerequisites land.

References

  • f873afa — the localized cache.start.delay workaround that this issue's follow-up will retire.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions