Skip to content

WP Origin: persist Git history in a wpdb-backed filesystem#28

Merged
adamziel merged 6 commits intotrunkfrom
adamziel/wpdb-filesystem
Apr 28, 2026
Merged

WP Origin: persist Git history in a wpdb-backed filesystem#28
adamziel merged 6 commits intotrunkfrom
adamziel/wpdb-filesystem

Conversation

@adamziel
Copy link
Copy Markdown
Contributor

Follow-up to #27. The custom post type and the per-revision Markdown blobs were doing the work of a small filesystem — one that just happened to live inside the post table. This swaps that for an actual filesystem.

WpdbFilesystem is a port of SQLiteFilesystem to wpdb. Same schema, same semantics, same transactions/savepoints, two MySQL tables instead of two SQLite ones. Hand it to GitRepository and the repo's .git/objects, refs, and config persist in WordPress with no further plumbing.

The plugin then becomes much smaller. wp_origin_commit is gone. The _wp_origin_markdown revision metadata is gone. The manifest JSON in post_content is gone. The wp_origin_head_commit_id option is gone. open_repository() no longer replays history into an in-memory repo on every request — it just opens the persistent one.

Push conflict detection still works the same way: it compares each pushed commit's parent tree to current WordPress state via the Markdown front-matter, and that data already lives in the Git tree, not the manifest. The merge-commit and empty-commit guards are kept. The "skip modified checks for follow-on commits in a multi-commit push" rule is kept.

Existing wp-origin databases will keep their old wp_origin_commit posts and orphan post meta — there's no migration. Since the storage was newly added in #27 and isn't load-bearing anywhere else, dropping it on upgrade is safe; cleanup can happen separately.

Test plan

  • Run the existing Playground integration script (bin/test-wp-origin-git-actions.sh) end-to-end: clone, edit, commit, push, pull.
  • Verify a fresh install creates {$prefix}wp_origin_files and {$prefix}wp_origin_directory_entries automatically on first request.
  • Push a multi-commit branch and confirm only the first commit is gated by the per-file modified_gmt check.
  • Push two stale clients in a row and confirm the second one gets the "remote changed" error without corrupting state.

Replace the wp_origin_commit CPT, the per-revision Markdown blob
metadata, and the manifest stored on each commit post with a single
WpdbFilesystem instance backing the GitRepository.

The Filesystem component gains a WpdbFilesystem class that mirrors
SQLiteFilesystem, but writes to two MySQL tables managed through
$wpdb. Hand it to GitRepository as the object/ref store and Git's own
on-disk layout becomes the WordPress-side persistence — no parallel
manifest layer to keep in sync.

The plugin no longer registers a custom post type, no longer rebuilds
an in-memory repo on every request, and no longer round-trips exact
bytes through post meta. Push conflict detection still works because
it keys off the previous commit's Markdown front-matter, which is
already in the repo.
Add WpdbFilesystem unit tests via a small SQLite-backed wpdb shim,
extend the existing Smart-HTTP e2e to assert the old CPT model is
gone, and add a new Playground restart test that proves repository
history survives a server process restart.

The unit tests reuse FilesystemTestCase, so WpdbFilesystem now goes
through the same 23-test contract as SQLiteFilesystem, plus a binary
round-trip test that mirrors how Git objects are stored.

The persistence script mounts the WordPress SQLite DB on the host,
pushes a commit, kills the server, restarts a fresh process pointed
at the same DB, and re-clones to confirm the commit hash survives.
That is the test that would have caught the bug PR #27 fixed.
The unit tests for WpdbFilesystem already run through PHPUnit's
standard "Project Test Suite", which CI executes via vendor/bin/phpunit
across the full PHP matrix. No shell glue needed for them.

Revert the modifications to the existing wp-origin Smart-HTTP script
and remove the persistence script — both required Playground locally
and added no coverage that CI actually runs.
Spin up MySQL, WordPress, and the wp-origin plugin in a dedicated CI
job, then drive the running server with the real `git` CLI from
PHPUnit. The assertions live in `plugins/wp-origin/Tests/EndToEndTest.php`:
clone shows seeded content, push updates a WordPress post, push can
create new posts and trash existing ones in the same commit, a stale
push is rejected, a brand-new clone sees the full history we just
produced, and the old wp_origin_commit post type is no longer
registered.

The test skips when the WP_ORIGIN_E2E_* env vars are absent, so
`composer test` keeps working unchanged. The new workflow at
.github/workflows/wp-origin-e2e.yml is the only thing that wires
those vars up — it sets up WordPress with wp-cli, mounts the plugin,
creates an Application Password, starts `php -S` with a small router
that synthesises PHP_AUTH_USER from the Authorization header (so REST
auth works under the built-in webserver), then runs the suite.

This gives the PR concrete, reproducible evidence that the
wpdb-backed Git filesystem actually does what the rewrite says it
does — across real Smart-HTTP, real Git, real WordPress.
@adamziel adamziel merged commit 3107a9e into trunk Apr 28, 2026
23 checks passed
adamziel added a commit that referenced this pull request Apr 28, 2026
Stacks on #28. Once that lands, GitHub will retarget this PR to trunk.

The wpdb-filesystem PR makes the plugin's first request build a single
"Sync from WordPress" commit covering every post on the site. On a small
dev site that's instant; on a real site with thousands of posts it dies
on max_execution_time. This adds a resumable state-machine seeder that
runs from WP-Cron and only opens the repository for clone/push/pull once
the import has finished.

## How it works

State machine, persisted in `wp_options`:

- `pending` → activation queues a one-off cron event.
- `in_progress` → each tick converts a batch of posts to Markdown,
creates a "Seed batch" commit on `refs/heads/_wp_origin_seed`, updates
the progress option, and reschedules itself when 15 s elapses or memory
hits 70% of the limit.
- `finalizing` → the next tick reads the seed branch's tree, creates a
single parent-less "Initial import from WordPress" commit pointing at
it, sets `refs/heads/trunk` to that commit, and drops the seed branch.
Clones now see one clean root commit.
- `done` → repository is open for business.
- `failed` → admin can retry from the Tools → WP Origin page.

A transient lock prevents concurrent ticks. No Action Scheduler
dependency — plain WP-Cron only.

## What clients see

While state is anything but `done`, every Smart-HTTP request returns
HTTP 503 + `Retry-After: 15` with a plaintext body:

> WP Origin is preparing the repository (40%, 200/500 posts). Please try
again shortly.

Better than a half-built history.

## What admins see

A new **Tools → WP Origin** page shows a live progress bar (state,
percent, processed / total, last message) by polling
`/wp-json/wp-origin/v1/seed-status` every 2 s. A "Retry import" button
POSTs to `/wp-json/wp-origin/v1/seed-retry` for the failed-state case.

## Test plan

The existing wp-origin e2e job in CI is the proof:

- Workflow now updates the seeded posts **before** activating the
plugin, then drives `wp cron event run --due-now` in a 30-iteration loop
until `wp_origin_seed_state` reaches `done`. Failure of that loop fails
the job.
- Two new PHPUnit assertions in `EndToEndTest.php`:
- `testSeedStatusReportsDone` — `/wp-json/wp-origin/v1/seed-status`
returns `state=done, percent=100, total>0`.
- `testInitialCommitIsParentless` — the first commit on `trunk` is
"Initial import from WordPress" with zero parents and no "Seed batch"
subjects leaked through.
- Existing round-trip test still passes (clone, push,
fresh-clone-sees-history, etc).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant