Skip to content

Expose active workspace identity (repo, handle, branch) to AI execution context #423

@chubes4

Description

@chubes4

Problem

Data Machine Code already tracks workspace identity end-to-end:

  • `WorktreeContextInjector` persists per-worktree metadata (`repo`, `branch`, `handle`, `path`, `origin_site`, `origin_agent`, `origin_user`, `origin_task`) in the `datamachine_worktree_metadata` site option and the `WorktreeInventoryRepository` table
  • Workspace abilities (`datamachine/workspace-*`) take a workspace `handle` input and resolve it through `WorkspaceAliasResolver` against that metadata
  • The CI runner (`homeboy-extensions/datamachine-agent-ci.yml`) configures a `runner_workspace` block describing the target repo to clone for an agent run

But none of this surfaces into AI execution payloads. When an AI step fires during a CI-driven agent run (e.g. `Automattic/docs-agent` against `Extra-Chill/extrachill-artist-platform`), the AI directives and the agent's tool calls cannot ask "which repo are we documenting?" because that identity lives only in DMC's workspace layer, never in DM's engine_data or directive payload.

Concrete consumer

`Extra-Chill/extrachill-docs#33` (just merged as #35) registers a `docs` agent execution mode that injects platform-wide voice rules into the AI's system context whenever `agent_modes: ['docs']` is active. The voice rules are network-wide and ship as-is.

The next layer — per-target context — needs to tell the agent "you are currently documenting the Artist Platform; your readers are musicians and artist managers" vs "you are currently documenting the Newsletter; your readers are subscribers managing their preferences." That per-target message is straightforward to compose from `runner-configs/platform-map.yml` in `extrachill-docs`, but the directive callback needs to know which repo the current run is against.

Today there is no public path to that knowledge in PHP. The runner_workspace config exists only in YAML/JSON at the homeboy-extensions layer; it never enters PHP runtime state.

Proposal

Expose active workspace identity into Data Machine's engine_data at agent-run start so any directive, ability, or tool call can read it.

Shape

When a workspace is bound to an AI execution context (CI runner setup, chat session referencing a workspace, system task running against a worktree), DMC writes a structured `active_workspace` entry into the job's engine_data:

```json
{
"active_workspace": {
"handle": "extrachill-artist-platform@docs-agent-run-12345",
"repo": "extrachill-artist-platform",
"owner": "Extra-Chill",
"full_name": "Extra-Chill/extrachill-artist-platform",
"branch": "docs-agent-run-12345",
"path": "/var/lib/datamachine/workspace/extrachill-artist-platform@docs-agent-run-12345",
"origin_site": "...",
"task_url": "..."
}
}
```

Directives and abilities read it via the standard `$engine->get('active_workspace')` accessor that already exists on the EngineData object (see `GitHubAbilities` reading `run_artifact_egress_policy` the same way).

Where the write happens

The natural injection point is wherever the CI runner currently configures `runner_workspace` to seed the worktree. Today that's the homeboy-extensions workflow constructing a runner config JSON. DMC needs a bootstrap hook that:

  1. Reads `runner_workspace` from the runner config (or any equivalent bootstrap argument when the runtime isn't CI — e.g. chat or system tasks)
  2. Resolves the matching `WorktreeContextInjector` metadata record (or constructs one for the bound workspace)
  3. Writes the `active_workspace` entry into engine_data via `EngineData::merge` at job start

Concretely this is probably a new `Runtime/WorkspaceBootstrap.php` class in DMC that hooks into job creation (Action Scheduler new-job event, or DM's `datamachine_job_start` action if one exists; if not, add one) and writes the entry when a workspace is in scope.

Generic, not docs-specific

The entry shape and the writer should be generic — nothing about docs, documentation, voice rules, or any consumer-specific concern leaks into DMC. `active_workspace` is just "here's the workspace this run is bound to," available to any consumer.

This issue is explicitly NOT about layering docs/voice/audience concerns into DMC. Those stay in their own plugin. DMC's job is to surface the workspace identity DMC already knows about.

Filter for extension

A filter like `datamachine_code_active_workspace` lets other plugins enrich the entry without forking DMC. Example downstream uses:

  • `extrachill-docs` reads `active_workspace.full_name`, looks up the matching `platform-map.yml` entry, and stacks per-target context onto its existing `docs` mode guidance
  • A future security ability uses `active_workspace.owner` to scope tool access
  • Audit logging includes the workspace identity in every recorded tool call

Why this belongs in DMC

DMC already owns:

  • Workspace registry (`WorkspaceRepositoryLifecycle`, `Workspace.php`)
  • Worktree metadata (`WorktreeContextInjector`)
  • Inventory storage (`WorktreeInventoryRepository`)
  • Alias resolution (`WorkspaceAliasResolver`)
  • The integration with the CI runner (`runner_workspace` JSON is consumed by DMC-supplied tools like `workspace_worktree_add`)

Surfacing the active workspace into engine_data closes the loop between DMC's workspace-side knowledge and DM's AI-side runtime. No other plugin has the right combination of metadata and lifecycle hooks to do this.

DM core stays generic. Consumers of `active_workspace` read it through the existing `EngineData` accessor; they don't need to know it came from DMC.

Implementation sketch

  • New file: `inc/Runtime/WorkspaceBootstrap.php` — single static class with a `bootstrap_for_job( int $job_id, array $context ): void` method that builds the `active_workspace` entry from available metadata and calls `EngineData::merge`
  • Trigger: hook into job-creation flow. If DM exposes a `datamachine_job_created` action, use it. If not, this issue depends on adding one to DM (small upstream change)
  • Runner config consumption: when the CI driver builds the WP runtime, it currently writes a runner config JSON containing `runner_workspace`. That JSON is read by the ci-driver fixture. The fixture should call `WorkspaceBootstrap::bootstrap_for_job` with the runner_workspace payload converted into the active_workspace shape
  • Filter: `apply_filters( 'datamachine_code_active_workspace', $entry, $context )` before persisting
  • Tests: unit tests for the shape; an end-to-end test that runs a fake job with a bound workspace and asserts the entry lands in engine_data

Acceptance criteria

  • `active_workspace` shape documented in `Runtime/WorkspaceBootstrap.php` docblock
  • Bootstrap fires at job start when a workspace is bound, no-ops otherwise
  • CI runner path (homeboy-extensions `runner_workspace` → DMC bootstrap) works end-to-end
  • Chat and system task paths (when an explicit handle is provided) also populate `active_workspace`
  • `datamachine_code_active_workspace` filter documented and tested
  • At least one downstream consumer demonstrated — likely `extrachill-docs` reading `full_name` to stack per-target context onto its `docs` mode
  • No DM core changes required, OR if a `datamachine_job_created` hook is needed, that's filed as a tiny upstream DM PR first
  • No regression: jobs without bound workspaces still run cleanly

Non-goals

  • Layering docs / voice / audience concerns into DMC. Stay generic.
  • Mutating workspace state from the directive layer. `active_workspace` is read-only context.
  • Persisting `active_workspace` past job completion. It's per-job runtime context.
  • Cross-job workspace sharing. One workspace identity per job.

Downstream blockers

This issue is the missing leg for per-target context in `Extra-Chill/extrachill-docs#33` (merged in #35 as platform-wide voice only). Once this lands, extrachill-docs adds a second callback on `datamachine_agent_mode_docs` at a later priority that reads `active_workspace.full_name`, looks up `runner-configs/platform-map.yml`, and stacks per-target context onto the same mode.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions