Problem
Data Machine Code already tracks workspace identity end-to-end:
- `WorktreeContextInjector` persists per-worktree metadata (`repo`, `branch`, `handle`, `path`, `origin_site`, `origin_agent`, `origin_user`, `origin_task`) in the `datamachine_worktree_metadata` site option and the `WorktreeInventoryRepository` table
- Workspace abilities (`datamachine/workspace-*`) take a workspace `handle` input and resolve it through `WorkspaceAliasResolver` against that metadata
- The CI runner (`homeboy-extensions/datamachine-agent-ci.yml`) configures a `runner_workspace` block describing the target repo to clone for an agent run
But none of this surfaces into AI execution payloads. When an AI step fires during a CI-driven agent run (e.g. `Automattic/docs-agent` against `Extra-Chill/extrachill-artist-platform`), the AI directives and the agent's tool calls cannot ask "which repo are we documenting?" because that identity lives only in DMC's workspace layer, never in DM's engine_data or directive payload.
Concrete consumer
`Extra-Chill/extrachill-docs#33` (just merged as #35) registers a `docs` agent execution mode that injects platform-wide voice rules into the AI's system context whenever `agent_modes: ['docs']` is active. The voice rules are network-wide and ship as-is.
The next layer — per-target context — needs to tell the agent "you are currently documenting the Artist Platform; your readers are musicians and artist managers" vs "you are currently documenting the Newsletter; your readers are subscribers managing their preferences." That per-target message is straightforward to compose from `runner-configs/platform-map.yml` in `extrachill-docs`, but the directive callback needs to know which repo the current run is against.
Today there is no public path to that knowledge in PHP. The runner_workspace config exists only in YAML/JSON at the homeboy-extensions layer; it never enters PHP runtime state.
Proposal
Expose active workspace identity into Data Machine's engine_data at agent-run start so any directive, ability, or tool call can read it.
Shape
When a workspace is bound to an AI execution context (CI runner setup, chat session referencing a workspace, system task running against a worktree), DMC writes a structured `active_workspace` entry into the job's engine_data:
```json
{
"active_workspace": {
"handle": "extrachill-artist-platform@docs-agent-run-12345",
"repo": "extrachill-artist-platform",
"owner": "Extra-Chill",
"full_name": "Extra-Chill/extrachill-artist-platform",
"branch": "docs-agent-run-12345",
"path": "/var/lib/datamachine/workspace/extrachill-artist-platform@docs-agent-run-12345",
"origin_site": "...",
"task_url": "..."
}
}
```
Directives and abilities read it via the standard `$engine->get('active_workspace')` accessor that already exists on the EngineData object (see `GitHubAbilities` reading `run_artifact_egress_policy` the same way).
Where the write happens
The natural injection point is wherever the CI runner currently configures `runner_workspace` to seed the worktree. Today that's the homeboy-extensions workflow constructing a runner config JSON. DMC needs a bootstrap hook that:
- Reads `runner_workspace` from the runner config (or any equivalent bootstrap argument when the runtime isn't CI — e.g. chat or system tasks)
- Resolves the matching `WorktreeContextInjector` metadata record (or constructs one for the bound workspace)
- Writes the `active_workspace` entry into engine_data via `EngineData::merge` at job start
Concretely this is probably a new `Runtime/WorkspaceBootstrap.php` class in DMC that hooks into job creation (Action Scheduler new-job event, or DM's `datamachine_job_start` action if one exists; if not, add one) and writes the entry when a workspace is in scope.
Generic, not docs-specific
The entry shape and the writer should be generic — nothing about docs, documentation, voice rules, or any consumer-specific concern leaks into DMC. `active_workspace` is just "here's the workspace this run is bound to," available to any consumer.
This issue is explicitly NOT about layering docs/voice/audience concerns into DMC. Those stay in their own plugin. DMC's job is to surface the workspace identity DMC already knows about.
Filter for extension
A filter like `datamachine_code_active_workspace` lets other plugins enrich the entry without forking DMC. Example downstream uses:
- `extrachill-docs` reads `active_workspace.full_name`, looks up the matching `platform-map.yml` entry, and stacks per-target context onto its existing `docs` mode guidance
- A future security ability uses `active_workspace.owner` to scope tool access
- Audit logging includes the workspace identity in every recorded tool call
Why this belongs in DMC
DMC already owns:
- Workspace registry (`WorkspaceRepositoryLifecycle`, `Workspace.php`)
- Worktree metadata (`WorktreeContextInjector`)
- Inventory storage (`WorktreeInventoryRepository`)
- Alias resolution (`WorkspaceAliasResolver`)
- The integration with the CI runner (`runner_workspace` JSON is consumed by DMC-supplied tools like `workspace_worktree_add`)
Surfacing the active workspace into engine_data closes the loop between DMC's workspace-side knowledge and DM's AI-side runtime. No other plugin has the right combination of metadata and lifecycle hooks to do this.
DM core stays generic. Consumers of `active_workspace` read it through the existing `EngineData` accessor; they don't need to know it came from DMC.
Implementation sketch
- New file: `inc/Runtime/WorkspaceBootstrap.php` — single static class with a `bootstrap_for_job( int $job_id, array $context ): void` method that builds the `active_workspace` entry from available metadata and calls `EngineData::merge`
- Trigger: hook into job-creation flow. If DM exposes a `datamachine_job_created` action, use it. If not, this issue depends on adding one to DM (small upstream change)
- Runner config consumption: when the CI driver builds the WP runtime, it currently writes a runner config JSON containing `runner_workspace`. That JSON is read by the ci-driver fixture. The fixture should call `WorkspaceBootstrap::bootstrap_for_job` with the runner_workspace payload converted into the active_workspace shape
- Filter: `apply_filters( 'datamachine_code_active_workspace', $entry, $context )` before persisting
- Tests: unit tests for the shape; an end-to-end test that runs a fake job with a bound workspace and asserts the entry lands in engine_data
Acceptance criteria
Non-goals
- Layering docs / voice / audience concerns into DMC. Stay generic.
- Mutating workspace state from the directive layer. `active_workspace` is read-only context.
- Persisting `active_workspace` past job completion. It's per-job runtime context.
- Cross-job workspace sharing. One workspace identity per job.
Downstream blockers
This issue is the missing leg for per-target context in `Extra-Chill/extrachill-docs#33` (merged in #35 as platform-wide voice only). Once this lands, extrachill-docs adds a second callback on `datamachine_agent_mode_docs` at a later priority that reads `active_workspace.full_name`, looks up `runner-configs/platform-map.yml`, and stacks per-target context onto the same mode.
Related
Problem
Data Machine Code already tracks workspace identity end-to-end:
But none of this surfaces into AI execution payloads. When an AI step fires during a CI-driven agent run (e.g. `Automattic/docs-agent` against `Extra-Chill/extrachill-artist-platform`), the AI directives and the agent's tool calls cannot ask "which repo are we documenting?" because that identity lives only in DMC's workspace layer, never in DM's engine_data or directive payload.
Concrete consumer
`Extra-Chill/extrachill-docs#33` (just merged as #35) registers a `docs` agent execution mode that injects platform-wide voice rules into the AI's system context whenever `agent_modes: ['docs']` is active. The voice rules are network-wide and ship as-is.
The next layer — per-target context — needs to tell the agent "you are currently documenting the Artist Platform; your readers are musicians and artist managers" vs "you are currently documenting the Newsletter; your readers are subscribers managing their preferences." That per-target message is straightforward to compose from `runner-configs/platform-map.yml` in `extrachill-docs`, but the directive callback needs to know which repo the current run is against.
Today there is no public path to that knowledge in PHP. The runner_workspace config exists only in YAML/JSON at the homeboy-extensions layer; it never enters PHP runtime state.
Proposal
Expose active workspace identity into Data Machine's engine_data at agent-run start so any directive, ability, or tool call can read it.
Shape
When a workspace is bound to an AI execution context (CI runner setup, chat session referencing a workspace, system task running against a worktree), DMC writes a structured `active_workspace` entry into the job's engine_data:
```json
{
"active_workspace": {
"handle": "extrachill-artist-platform@docs-agent-run-12345",
"repo": "extrachill-artist-platform",
"owner": "Extra-Chill",
"full_name": "Extra-Chill/extrachill-artist-platform",
"branch": "docs-agent-run-12345",
"path": "/var/lib/datamachine/workspace/extrachill-artist-platform@docs-agent-run-12345",
"origin_site": "...",
"task_url": "..."
}
}
```
Directives and abilities read it via the standard `$engine->get('active_workspace')` accessor that already exists on the EngineData object (see `GitHubAbilities` reading `run_artifact_egress_policy` the same way).
Where the write happens
The natural injection point is wherever the CI runner currently configures `runner_workspace` to seed the worktree. Today that's the homeboy-extensions workflow constructing a runner config JSON. DMC needs a bootstrap hook that:
Concretely this is probably a new `Runtime/WorkspaceBootstrap.php` class in DMC that hooks into job creation (Action Scheduler new-job event, or DM's `datamachine_job_start` action if one exists; if not, add one) and writes the entry when a workspace is in scope.
Generic, not docs-specific
The entry shape and the writer should be generic — nothing about docs, documentation, voice rules, or any consumer-specific concern leaks into DMC. `active_workspace` is just "here's the workspace this run is bound to," available to any consumer.
This issue is explicitly NOT about layering docs/voice/audience concerns into DMC. Those stay in their own plugin. DMC's job is to surface the workspace identity DMC already knows about.
Filter for extension
A filter like `datamachine_code_active_workspace` lets other plugins enrich the entry without forking DMC. Example downstream uses:
Why this belongs in DMC
DMC already owns:
Surfacing the active workspace into engine_data closes the loop between DMC's workspace-side knowledge and DM's AI-side runtime. No other plugin has the right combination of metadata and lifecycle hooks to do this.
DM core stays generic. Consumers of `active_workspace` read it through the existing `EngineData` accessor; they don't need to know it came from DMC.
Implementation sketch
Acceptance criteria
Non-goals
Downstream blockers
This issue is the missing leg for per-target context in `Extra-Chill/extrachill-docs#33` (merged in #35 as platform-wide voice only). Once this lands, extrachill-docs adds a second callback on `datamachine_agent_mode_docs` at a later priority that reads `active_workspace.full_name`, looks up `runner-configs/platform-map.yml`, and stacks per-target context onto the same mode.
Related