Fix 10 delegation/worker bugs in mini-a-subtask.js and mini-a-worker.yaml by Copilot · Pull Request #131 · OpenAF/mini-a

Copilot · 2026-03-30T13:09:44Z

A set of correctness, thread-safety, and resource-leak bugs in the delegation and worker subsystems. Fixes range from a broken timeout accumulation to a thread-unsafe global variable swap to CIDR notation that was documented but never actually parsed.

`mini-a-subtask.js`

waitForAll timeout (#1): timeoutMs - (results.length > 0 ? 0 : 0) always subtracted zero — each waitFor received the full budget, allowing up to N × timeoutMs total block. Now tracks startAll and passes Math.max(1, timeoutMs - elapsed) each iteration.
workerMaxFailures (#4): _recordWorkerFailure used defaultMaxAttempts (default 2) as the per-worker death threshold — one transient error on two tasks killed the worker permanently. Added dedicated workerMaxFailures option (default 5).
_failOrRetrySubtask blanket kill (#5): Replaced _markWorkerDead with _recordWorkerFailure so logic failures (bad goal, LLM refusal) don't immediately evict the worker; threshold from #4 now applies.
_processQueue inconsistency (#6): Item was only removed via shift() in the catch branch; success relied on start()'s internal indexOf + splice. Now always shift() before start() with a status !== "pending" guard for stale entries.
Dead code (#7): Removed the unused _nextWorker method (superseded by _nextWorkerForSubtask).
Watchdog stop mechanism (#10): while (true) loop had no exit path. Added this._running = true in the constructor, while (parent._running) in the watchdog, and a destroy() method.

`mini-a-worker.yaml`

Thread-safe shared task submission (#2): /message:send reassigned the shared request global and copy-pasted the entire /task body verbatim — not thread-safe under concurrent requests. Extracted global.__worker_submitTask(postData, wargs) called from both handlers:
```
// Both /task and /message:send now call:
var result = global.__worker_submitTask(postData, global.__worker_args);
```
Accurate /task response status (#3): Response hardcoded status: "queued" even after global.__worker_tasks[taskId].status had already been set to "running". Now returns the actual status.
CIDR allowlist (#8): apiallow documented CIDR support (e.g. 192.168.1.0/24) but used a naive indexOf(...) === 0 prefix match — 192.168.1.100 would not match. Added global.__worker_cidrMatch(ip, cidr) with bitwise subnet arithmetic and full input validation (octet range, prefix 0–32, NaN guards).
Eager SubtaskManager init (#9): Manager was lazily created on the first request using that request's merged args (including maxsteps, format, etc.) — all subsequent tasks then shared a manager misconfigured by the first caller. Now initialized eagerly in Init with clean base wargs.

Original prompt

Overview

Fix all identified bugs and apply significant improvements to the delegation/worker functionality across mini-a-subtask.js and mini-a-worker.yaml.

Bug Fixes

1. `waitForAll` — Timeout never decrements (`mini-a-subtask.js` line ~1106)

The expression timeoutMs - (results.length > 0 ? 0 : 0) always subtracts zero, so each waitFor call in the loop gets the full timeoutMs. The aggregate waitForAll can block for up to N × timeoutMs.

Fix: Track a start timestamp before the loop and subtract elapsed time for each iteration:

SubtaskManager.prototype.waitForAll = function(subtaskIds, timeoutMs) {
  if (!isArray(subtaskIds)) {
    throw new Error("subtaskIds must be an array")
  }
  timeoutMs = _$(timeoutMs, "timeoutMs").isNumber().default(300000)
  var results = []
  var startAll = new Date().getTime()
  for (var i = 0; i < subtaskIds.length; i++) {
    var elapsed = new Date().getTime() - startAll
    var remainingTime = Math.max(1, timeoutMs - elapsed)
    results.push(this.waitFor(subtaskIds[i], remainingTime))
  }
  return results
}

2. Non-thread-safe `request` variable swap in `/message:send` (`mini-a-worker.yaml` lines ~246–308)

The /message:send handler reassigns the shared global request variable to re-use the /task logic inline. This is not thread-safe: concurrent HTTP requests can corrupt each other's request reference.

Additionally, the entire task-submission logic is copy-pasted verbatim from the /task handler — a significant maintenance burden and source of bugs.

Fix: Extract the shared task submission logic into a reusable JavaScript helper function (e.g., global.__worker_submitTask(postData, wargs)) called from both /task and /message:send, eliminating the unsafe request swap and the duplication. The helper should:

Accept parsed postData and wargs as arguments
Validate the goal (trim, length check)
Merge allowed remote args
Parse and clamp timeout
Lazily init SubtaskManager if needed (but see fix Consolidate Kubernetes read tools #9 — ideally done in Init)
Register the task in global.__worker_tasks and start it
Return { taskId, createdAt }

3. `/task` response always says `"queued"` even when task is already `"running"` (`mini-a-worker.yaml` line ~206)

After calling global.__worker_taskManager.start(subtaskId), the task status is updated to "running", but the HTTP response body still hardcodes status: "queued".

Fix: Use the actual task status in the response:

return ow.server.httpd.reply(stringify({ taskId: taskId, status: global.__worker_tasks[taskId].status, createdAt: now }, __, ""), 202, ow.server.httpd.mimes.JSON)

4. Worker is killed using task retry threshold (`mini-a-subtask.js` line ~373)

_recordWorkerFailure marks a worker dead when failures >= this.defaultMaxAttempts. defaultMaxAttempts is a per-subtask retry count (default: 2), not a meaningful per-worker failure budget. A single transient error on two tasks immediately kills the worker.

Fix: Introduce a dedicated workerMaxFailures option (default: 5) and use it in _recordWorkerFailure:

Add this.workerMaxFailures = _$(opts.workerMaxFailures, "opts.workerMaxFailures").isNumber().default(5) in the constructor.
Change the check to if (failures >= this.workerMaxFailures).

5. Worker marked dead on any task exhaustion, not just transport errors (`mini-a-subtask.js` lines ~632–633)

In _failOrRetrySubtask, when a subtask exhausts all retries, the worker that ran it is immediately marked dead — even if the task failed due to a bad goal, LLM refusal, or logic error (not a worker fault).

Fix: Only mark the worker dead if there is evidence of a transport/connectivity problem. Remove the blanket _markWorkerDead call from _failOrRetrySubtask. Worker health should be tracked exclusively through _recordWorkerFailure (which uses the new workerMaxFailures threshold from fix #4).

6. Inconsistent queue removal in `_processQueue` (`mini-a-subtask.js` lines ~1202–1215)

The queue item is only removed via pendingQueue.shift() in the catch branch. In the success path, start() removes it internally via indexOf + splice. This is inconsistent and fragile — if the queue changes between pendingQueue[0] and the error-path shift(), the wrong item could be removed.

Fix: Always shift the item off the queue before calling start(), and handle rollback on failure:

SubtaskManager.prototype._processQueue = function() {
  while (this.runningCount < this.maxConcurrent && this.pendingQueue.length > 0) {
    var nextId = this.pendingQueue.shift()  // remove first
    var subtask = this.subtasks[nextId]
    if (!isDef(subtask) || subtask.status !== "pending") continue
    try {
      this.start(nextId)
    } catch(e) {
      if (isDef(subtask)) {
        subtask.status = "failed"
        subtask.error = "Failed to start: " + (isDef(e) && isString(e.message) ? e.m...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

…yaml Agent-Logs-Url: https://github.com/OpenAF/mini-a/sessions/c0190ccb-ed30-46ee-a90e-3a3430a8f32f Co-authored-by: nmaguiar <11761746+nmaguiar@users.noreply.github.com>

Initial plan

15c81dc

Copilot AI assigned Copilot and nmaguiar Mar 30, 2026

Copilot started work on behalf of nmaguiar March 30, 2026 13:09 View session

Fix 10 delegation/worker bugs in mini-a-subtask.js and mini-a-worker.…

6bad4bc

…yaml Agent-Logs-Url: https://github.com/OpenAF/mini-a/sessions/c0190ccb-ed30-46ee-a90e-3a3430a8f32f Co-authored-by: nmaguiar <11761746+nmaguiar@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix identified bugs and improve delegation functionality~~ Fix 10 delegation/worker bugs in mini-a-subtask.js and mini-a-worker.yaml Mar 30, 2026

Copilot AI requested a review from nmaguiar March 30, 2026 13:19

Copilot finished work on behalf of nmaguiar March 30, 2026 13:19

nmaguiar merged commit 6bad4bc into main Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 10 delegation/worker bugs in mini-a-subtask.js and mini-a-worker.yaml#131

Fix 10 delegation/worker bugs in mini-a-subtask.js and mini-a-worker.yaml#131
nmaguiar merged 2 commits intomainfrom
copilot/fix-delegation-worker-functionality

Copilot AI commented Mar 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mini-a-subtask.js

mini-a-worker.yaml

Overview

Bug Fixes

1. waitForAll — Timeout never decrements (mini-a-subtask.js line ~1106)

2. Non-thread-safe request variable swap in /message:send (mini-a-worker.yaml lines ~246–308)

3. /task response always says "queued" even when task is already "running" (mini-a-worker.yaml line ~206)

4. Worker is killed using task retry threshold (mini-a-subtask.js line ~373)

5. Worker marked dead on any task exhaustion, not just transport errors (mini-a-subtask.js lines ~632–633)

6. Inconsistent queue removal in _processQueue (mini-a-subtask.js lines ~1202–1215)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 30, 2026 •

edited

Loading

`mini-a-subtask.js`

`mini-a-worker.yaml`

1. `waitForAll` — Timeout never decrements (`mini-a-subtask.js` line ~1106)

2. Non-thread-safe `request` variable swap in `/message:send` (`mini-a-worker.yaml` lines ~246–308)

3. `/task` response always says `"queued"` even when task is already `"running"` (`mini-a-worker.yaml` line ~206)

4. Worker is killed using task retry threshold (`mini-a-subtask.js` line ~373)

5. Worker marked dead on any task exhaustion, not just transport errors (`mini-a-subtask.js` lines ~632–633)

6. Inconsistent queue removal in `_processQueue` (`mini-a-subtask.js` lines ~1202–1215)