You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a GitSync primitive to data-machine-code that binds an arbitrary site-owned local directory to a remote git repository, with pull/push semantics, per-binding policies, and scheduled sync support.
This is the layer above the existing Workspace abilities. Workspace operates on agent-owned checkouts under ~/.datamachine/workspace/<repo> for ad-hoc code edits. GitSync operates on site-owned subtrees (e.g. wp-content/uploads/markdown/wiki/, wp-content/uploads/datamachine-files/agents/<slug>/) that plugins want to keep in lockstep with a remote over time.
Intelligence is the first consumer — it needs to git-sync wiki content subtrees (Automattic/intelligence#31) and wiki-generator agent definitions (Automattic/intelligence#125). Both are the same underlying pattern: bind a local path to a remote, pull periodically, optionally push local changes back. The primitive belongs here, not duplicated in Intelligence.
Why this layering is right
Intelligence is the consumer. Data Machine Code owns the git substrate:
Workspace/ classes already wrap every git operation we'd need (clone_repo, git_pull, git_push, git_add, git_commit, git_status, git_diff, git_log)
WorkspaceAbilities already exposes these as abilities — permission-gated, callable from CLI/REST/MCP/chat
GitHubAbilities handles GitHub API (PR creation, issue wiring) at the same boundary
datamachine_workspace_git_policies already has the policy shape we need: per-repo write_enabled, push_enabled, allowed_paths, fixed_branch
The only missing concept is "bind an external path to a remote". Workspace is opinionated about path structure — it manages checkouts under ~/.datamachine/workspace/<repo>. A GitSync binding points at a site path (wp-content/uploads/... or anywhere else the site controls) and keeps that path as a git working tree against a declared remote.
Having Intelligence reimplement this would duplicate Workspace's git plumbing and policy substrate. Moving it into DMC means:
One place owns "the site has git-synced directories" semantics
Any plugin (Intelligence, a WooCommerce extension, a docs plugin) gets the primitive for free
Scheduled sync slots naturally into the DM flow/cron infrastructure that DMC is already adjacent to
Stored in datamachine_gitsync_bindings option. Separate from datamachine_workspace_git_policies — different concerns (site subtrees vs agent workspace repos), so don't cram both into one option.
CLI surface
# Bind a local directory to a remote
wp datamachine-code gitsync bind intelligence-wiki \\
--local=/uploads/markdown/wiki/ \\
--remote=https://github.com/Automattic/a8c-wiki-woocommerce \\
--branch=main \\
--auto-pull=hourly
# Pull / push / status
wp datamachine-code gitsync pull intelligence-wiki
wp datamachine-code gitsync push intelligence-wiki --commit-message=\"...\"
wp datamachine-code gitsync status intelligence-wiki
# List all bindings
wp datamachine-code gitsync list
# Update policy
wp datamachine-code gitsync policy intelligence-wiki --push-enabled=true
# Remove (does NOT delete the local directory — just the binding metadata)
wp datamachine-code gitsync unbind intelligence-wiki
Programmatic API for consumers
Intelligence (and other plugins) should not call git operations directly. They should call the GitSync ability or service:
A single DM scheduled task (datamachine_gitsync_tick, hourly) iterates every binding with auto_pull=true, dispatches a per-binding pull via GitSync::pull(), and logs results. This reuses DM's Action Scheduler group and respects pull_interval semantics — a binding with pull_interval=daily is skipped until 24h since last_pulled.
Security posture
Same constraints Workspace already enforces, generalized:
Path containment:local_path must be under ABSPATH (or an explicitly allowed root); realpath() + traversal check (.., .) rejects escapes.
Push enablement: defaults to false. push_enabled=true requires explicit admin action.
Branch pinning:branch is enforced — a pull that would fast-forward to a different branch fails.
Auth: remote credentials come from existing DM auth providers where applicable (GitHub handler already has this). HTTPS with token is the baseline; SSH keys out of scope for v1.
Conflict policy:upstream_wins (discard local, pull with force-reset), fail (abort pull on local diff), manual (pull but surface the conflict in logs — admin resolves). Default fail so nothing destructive happens unless opted into.
What it explicitly is not
Not the Workspace system. Workspace keeps its home under ~/.datamachine/workspace/ and is for agent-owned code editing checkouts (primary + worktrees). GitSync is for site-owned subtrees that a plugin wants mirrored. They share the underlying Workspace class's git methods but are configured and scoped differently.
Not a generic git hosting / registry. It's a sync primitive. PR review happens on GitHub; conflict resolution for tricky cases happens by admin intervention; webhook-driven real-time sync is out of scope for v1 (hourly poll is enough).
Not Intelligence-specific. Nothing about the primitive cares what the synced content is. Intelligence is the first consumer; others follow.
Consumers (filed or expected)
Automattic/intelligence#31 — Git-synced wiki content subtrees. Maps cleanly to a GitSync binding per wiki subtree (woocommerce-wiki, jetpack-wiki, etc.).
Automattic/intelligence#125 — Git-tracked wiki-generator agent definitions. Maps to a GitSync binding on uploads/datamachine-files/agents/wiki-generator/ pointing at github.com/Automattic/a8c-wiki-generator.
Both Intelligence issues will be updated to consume this primitive rather than invent their own sync code.
Relationship to existing DMC pieces
Existing
Reused by GitSync
Notes
Workspace\\Workspace class
Yes — underlying git operations
clone_repo, git_pull, git_push, git_add, git_commit are generic enough to target any path. May need small refactor to accept target directory as param instead of resolving from workspace root.
WorkspaceAbilities
No — parallel abilities class
Keep separate for clear mental model: workspace = agent-owned code, gitsync = site-owned subtree.
datamachine_workspace_git_policies option
No — separate option
Different concerns; keep storage separate.
GitHubAbilities
Optional consumer
Push-to-PR flow (v2) could use these for opening PRs from local changes.
DM scheduler
Yes
Scheduled pull task uses Action Scheduler via datamachine_gitsync_tick.
Acceptance (v1)
GitSync service class with bind, unbind, pull, push, status, list methods
Option storage format. Single datamachine_gitsync_bindings array option vs a custom table? v1 option is fine; move to a table if the count grows past ~50 bindings.
Workspace refactor scope. How much of Workspace\\Workspace can be reused as-is? Likely most methods are generic enough; minor refactor to accept a working-directory parameter.
Auth reuse. Does GitHubAuthProvider in DM cover enough, or do bindings need their own credentials (e.g. pulling from a GitLab mirror)? Start with DM's GitHub auth; generalize later.
Unbind semantics. Should unbind leave the .git/ directory in place (so the local path remains a valid working tree) or remove it (clean site directory)? Flag it: --keep-git defaults to true.
Bidirectional sync for mutable consumers. Intelligence wiki content can be both pulled from upstream AND edited locally. Conflict policy for that case needs real thought — probably manual with explicit admin resolution.
Summary
Add a GitSync primitive to data-machine-code that binds an arbitrary site-owned local directory to a remote git repository, with pull/push semantics, per-binding policies, and scheduled sync support.
This is the layer above the existing
Workspaceabilities. Workspace operates on agent-owned checkouts under~/.datamachine/workspace/<repo>for ad-hoc code edits. GitSync operates on site-owned subtrees (e.g.wp-content/uploads/markdown/wiki/,wp-content/uploads/datamachine-files/agents/<slug>/) that plugins want to keep in lockstep with a remote over time.Intelligence is the first consumer — it needs to git-sync wiki content subtrees (Automattic/intelligence#31) and wiki-generator agent definitions (Automattic/intelligence#125). Both are the same underlying pattern: bind a local path to a remote, pull periodically, optionally push local changes back. The primitive belongs here, not duplicated in Intelligence.
Why this layering is right
Intelligence is the consumer. Data Machine Code owns the git substrate:
clone_repo,git_pull,git_push,git_add,git_commit,git_status,git_diff,git_log)datamachine_workspace_git_policiesalready has the policy shape we need: per-repowrite_enabled,push_enabled,allowed_paths,fixed_branchThe only missing concept is "bind an external path to a remote". Workspace is opinionated about path structure — it manages checkouts under
~/.datamachine/workspace/<repo>. A GitSync binding points at a site path (wp-content/uploads/...or anywhere else the site controls) and keeps that path as a git working tree against a declared remote.Having Intelligence reimplement this would duplicate Workspace's git plumbing and policy substrate. Moving it into DMC means:
Proposed shape
New module
Binding shape
Stored in
datamachine_gitsync_bindingsoption. Separate fromdatamachine_workspace_git_policies— different concerns (site subtrees vs agent workspace repos), so don't cram both into one option.CLI surface
Programmatic API for consumers
Intelligence (and other plugins) should not call git operations directly. They should call the GitSync ability or service:
Or via the ability surface (MCP/REST/CLI callable):
Scheduled sync
A single DM scheduled task (
datamachine_gitsync_tick, hourly) iterates every binding withauto_pull=true, dispatches a per-binding pull viaGitSync::pull(), and logs results. This reuses DM's Action Scheduler group and respectspull_intervalsemantics — a binding withpull_interval=dailyis skipped until 24h sincelast_pulled.Security posture
Same constraints Workspace already enforces, generalized:
local_pathmust be under ABSPATH (or an explicitly allowed root);realpath()+ traversal check (..,.) rejects escapes..env, credentials, keys, private SSH, anything matching existing Workspace block list.false.push_enabled=truerequires explicit admin action.branchis enforced — a pull that would fast-forward to a different branch fails.upstream_wins(discard local, pull with force-reset),fail(abort pull on local diff),manual(pull but surface the conflict in logs — admin resolves). Defaultfailso nothing destructive happens unless opted into.What it explicitly is not
~/.datamachine/workspace/and is for agent-owned code editing checkouts (primary + worktrees). GitSync is for site-owned subtrees that a plugin wants mirrored. They share the underlyingWorkspaceclass's git methods but are configured and scoped differently.Consumers (filed or expected)
uploads/datamachine-files/agents/wiki-generator/pointing atgithub.com/Automattic/a8c-wiki-generator.Relationship to existing DMC pieces
Workspace\\Workspaceclassclone_repo,git_pull,git_push,git_add,git_commitare generic enough to target any path. May need small refactor to accept target directory as param instead of resolving from workspace root.WorkspaceAbilitiesdatamachine_workspace_git_policiesoptionGitHubAbilitiesdatamachine_gitsync_tick.Acceptance (v1)
GitSyncservice class withbind,unbind,pull,push,status,listmethodsdatamachine-codecategory (bind, unbind, pull, push, status, list, policy-update)wp datamachine-code gitsyncCLI surfacedatamachine_gitsync_bindingsoptionGitSyncPullTaskscheduled task honoringauto_pull+pull_intervalfailREADME.md+ a newdocs/gitsync.mdAcceptance (v2, follow-up)
Open questions
datamachine_gitsync_bindingsarray option vs a custom table? v1 option is fine; move to a table if the count grows past ~50 bindings.Workspace\\Workspacecan be reused as-is? Likely most methods are generic enough; minor refactor to accept a working-directory parameter.GitHubAuthProviderin DM cover enough, or do bindings need their own credentials (e.g. pulling from a GitLab mirror)? Start with DM's GitHub auth; generalize later.unbindleave the.git/directory in place (so the local path remains a valid working tree) or remove it (clean site directory)? Flag it:--keep-gitdefaults to true.manualwith explicit admin resolution.