Skip to content

fix: opt workspace-maintenance tasks out of the agent-context gate#565

Merged
chubes4 merged 1 commit into
mainfrom
fix-disk-cleanup-agent-context
Jun 6, 2026
Merged

fix: opt workspace-maintenance tasks out of the agent-context gate#565
chubes4 merged 1 commit into
mainfrom
fix-disk-cleanup-agent-context

Conversation

@chubes4
Copy link
Copy Markdown
Member

@chubes4 chubes4 commented Jun 6, 2026

Fixes #564

Root cause

The recurring workspace-maintenance system tasks are registered in data-machine-code.php (datamachine_recurring_schedules filter, ~lines 339–383) as agent-less schedules — no agent_id/agent_slug, not per_agent. But every one of them extends SystemTask without overriding requiresAgentContext(), so they inherited the base-class default of true.

In data-machine core TaskScheduler::schedule() (inc/Engine/Tasks/TaskScheduler.php ~161), when there is no agent context (empty($context['agent_slug']) && empty($context['agent_id'])) and requiresAgentContext() === true, it logs TaskScheduler: queued task requires agent context (error_code: task_scheduler_agent_context_required) and return false — the task is rejected before it runs.

On the live extrachill.com install, workspace_disk_emergency_cleanup fired hourly and logged 213 identical errors (2026-05-30 → 2026-06-06, 100% of the DM error log); the emergency disk cleanup never executed.

Tasks changed and why

All fixed by overriding public function requiresAgentContext(): bool { return false; } — the opt-out the SystemTask docblock explicitly documents for "pure internal maintenance tasks." Not per_agent — disk cleanup is site-scoped; you do not want N cleanups fanned out per agent. No agent identity is fabricated.

Task Schedule Default Status before fix
workspace_disk_emergency_cleanup hourly enabled actively failing hourly (the 213 logged errors)
workspace_retention_cleanup daily enabled actively failing daily (same gate, less log volume)
worktree_cleanup daily disabled would fail the moment an operator enables it
workspace_hygiene_report weekly disabled would fail the moment an operator enables it
worktree_cleanup_chunk child (fan-out) n/a see below

Sibling tasks affected and fixed in this PR

workspace_retention_cleanup is default_enabled => true and was silently failing on the same gate daily. worktree_cleanup and workspace_hygiene_report are disabled by default but are registered the identical agent-less way and would fail immediately on enable. All three are fixed here.

Why worktree_cleanup_chunk is also fixed

worktree_cleanup_chunk is the child task that performs the actual destructive cleanup (artifact deletion / worktree removal). The recurring parents schedule it via TaskScheduler::scheduleBatch(), forwarding 'agent_id' => (int)($params['agent_id'] ?? 0). Under an agent-less recurring run that is agent_id = 0, which is empty() — so scheduleBatchschedule() would hit the same gate and reject every chunk. Without this override the fix would only move the failure one layer down and the real cleanup still would never run.

Evidence that executeTask() is pure maintenance (no agent-scoped calls)

Read each executeTask() end-to-end. Every one:

  • gates on PluginSettings::get(SETTING_KEY, ...),
  • delegates to the DataMachineCode\Workspace\Workspace service (disk/file/git operations),
  • logs via the datamachine_log action and calls completeJob()/failJob().

None read agent_id from the engine snapshot and none invoke an agent-scoped ability. The only agent_id reference is (int)($params['agent_id'] ?? 0) forwarded into scheduleBatch() for child-job linkage — it is never used as an ownership/authorization input.

Verification

  • php -l clean on all five touched files.
  • Runtime check: loaded the five worktree task classes against core's SystemTask and reflected requiresAgentContext() — all return false, each declared on its own class:
WorkspaceDiskEmergencyCleanupTask    requiresAgentContext()=false (declared in DataMachineCode\Tasks\WorkspaceDiskEmergencyCleanupTask)
WorktreeCleanupTask                  requiresAgentContext()=false (declared in DataMachineCode\Tasks\WorktreeCleanupTask)
WorkspaceRetentionCleanupTask        requiresAgentContext()=false (declared in DataMachineCode\Tasks\WorkspaceRetentionCleanupTask)
WorkspaceHygieneReportTask           requiresAgentContext()=false (declared in DataMachineCode\Tasks\WorkspaceHygieneReportTask)
WorktreeCleanupChunkTask             requiresAgentContext()=false (declared in DataMachineCode\Tasks\WorktreeCleanupChunkTask)

Per the TaskScheduler::schedule() gate logic (lines 161–175), requiresAgentContext() === false means an agent-less recurring run no longer takes the error branch — it builds the system job snapshot and enqueues the workflow. Full end-to-end "completed job row" verification on prod requires the deployed plugin to carry this code (deploy is out of scope for this PR).

The recurring workspace-maintenance system tasks (workspace_disk_emergency_cleanup,
worktree_cleanup, workspace_retention_cleanup, workspace_hygiene_report) are
registered as agent-less recurring schedules in data-machine-code.php, but each
extends SystemTask without overriding requiresAgentContext(). They therefore
inherited the base default of true, so TaskScheduler::schedule() logged
"queued task requires agent context" (error_code task_scheduler_agent_context_required)
and returned false before the task ran. On the live install
workspace_disk_emergency_cleanup logged 213 identical hourly errors and the
disk cleanup never ran.

These are pure disk/file/git maintenance tasks driven by the Workspace service
and gated by PluginSettings; none act as an agent or invoke an agent-scoped
ability (the only agent_id they touch is read from task params, defaulting to 0,
to forward into child cleanup chunk jobs). The honest fix is to opt out of the
gate via requiresAgentContext(): false, not to fabricate agent identity.

WorktreeCleanupChunkTask is fixed for the same reason: it is fanned out by the
recurring parents which forward agent_id=0, so without the override the actual
destructive cleanup chunks would be rejected by the same gate even after the
parents are fixed.

Fixes #564
@homeboy-ci
Copy link
Copy Markdown
Contributor

homeboy-ci Bot commented Jun 6, 2026

Homeboy Results — data-machine-code

Lint

lint — passed

ℹ️ Full options: homeboy docs commands/lint
ℹ️ Save lint baseline: homeboy lint data-machine-code --baseline
Deep dive: homeboy lint data-machine-code --changed-since 06dcabf

Artifacts and drill-down
  • CI results artifact: homeboy-ci-results-data-machine-code-lint-quality-Linux-node24 contains immediate command JSON for this action invocation.
  • Observation artifact: homeboy-observations-data-machine-code-lint-quality-Linux-node24 contains exported Homeboy run history for deeper queries.
  • Drill-down: download the observation artifact, then run homeboy runs import <dir>, homeboy runs list, and homeboy runs findings <run-id>.
  • Artifacts are attached to the workflow run: https://github.com/Extra-Chill/data-machine-code/actions/runs/27066548463

Test

test — passed

ℹ️ No impacted tests found for --changed-since 06dcabf
ℹ️ Run full suite if needed: homeboy test data-machine-code
Deep dive: homeboy test data-machine-code --changed-since 06dcabf

Artifacts and drill-down
  • CI results artifact: homeboy-ci-results-data-machine-code-test-quality-Linux-node24 contains immediate command JSON for this action invocation.
  • Observation artifact: homeboy-observations-data-machine-code-test-quality-Linux-node24 contains exported Homeboy run history for deeper queries.
  • Drill-down: download the observation artifact, then run homeboy runs import <dir>, homeboy runs list, and homeboy runs findings <run-id>.
  • Artifacts are attached to the workflow run: https://github.com/Extra-Chill/data-machine-code/actions/runs/27066548463

Audit

audit — passed

  • audit — 4 finding(s)
  • Total: 4 finding(s)

Deep dive: homeboy audit data-machine-code --changed-since 06dcabf

Artifacts and drill-down
  • CI results artifact: homeboy-ci-results-data-machine-code-audit-quality-Linux-node24 contains immediate command JSON for this action invocation.
  • Observation artifact: homeboy-observations-data-machine-code-audit-quality-Linux-node24 contains exported Homeboy run history for deeper queries.
  • Drill-down: download the observation artifact, then run homeboy runs import <dir>, homeboy runs list, and homeboy runs findings <run-id>.
  • Artifacts are attached to the workflow run: https://github.com/Extra-Chill/data-machine-code/actions/runs/27066548463
Tooling versions
  • Homeboy CLI: homeboy 0.220.3+a02394a
  • Extension: wordpress from https://github.com/Extra-Chill/homeboy-extensions
  • Extension revision: e34defd5
  • Action: unknown@unknown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

workspace_disk_emergency_cleanup fails hourly: queued task requires agent context (cleanup dead 7+ days)

1 participant