Skip to content

Persist WP Origin history in WordPress#27

Merged
adamziel merged 3 commits intotrunkfrom
codex/wp-origin-persistent-history
Apr 27, 2026
Merged

Persist WP Origin history in WordPress#27
adamziel merged 3 commits intotrunkfrom
codex/wp-origin-persistent-history

Conversation

@artpi
Copy link
Copy Markdown
Contributor

@artpi artpi commented Apr 27, 2026

Summary

This PR fixes the core persistence bug in wp-origin.

Before this change, each Git HTTP request built a fresh temporary repository, projected the current WordPress state into it, served the request, and then discarded the repository. That meant a successful push updated WordPress content, but the remote Git history itself was not preserved. On the next pull, the server exported a new synthetic snapshot with unrelated ancestry, which is why the clone -> push -> pull flow could conflict even when the content changes themselves were valid.

Persistence model : CPT

The persistence model is intentionally WordPress-native:

  • wp_origin_commit posts represent synthetic Git commits
  • post_parent links commits into a linear parent chain
  • post_content stores the full snapshot manifest for that commit
  • revision post meta stores the exact Markdown bytes for each tracked snapshot
  • a single option stores the current HEAD commit post ID

At request time, wp-origin:

  1. loads the persisted commit chain from WordPress
  2. replays those commits into an in-memory Git repository
  3. exports current WordPress posts/pages into Markdown
  4. creates a new synthetic sync commit if WordPress has changed since the last persisted snapshot
  5. serves the Git Smart HTTP request from that reconstructed repository

That gives us persistent ancestry without needing a .git directory on disk.

Why this approach

Why not rely on revisions alone?

Revisions help with per-post history, but Git needs repository history.

A push may change multiple files atomically, delete paths, or create new paths. Git also needs a stable parent chain and access to historical snapshots as Git saw them, not just "the latest revision for each post." Revisions by themselves do not provide repo-wide snapshots or stable commit ancestry.

Why store full snapshot manifests instead of only per-commit deltas?

A delta-only model would have reduced storage, but it would require replaying history backward to reconstruct unchanged files for older commits. That makes fetch/pull reconstruction more complex and more expensive.

The full-manifest approach is intentionally simpler:

  • each commit can be reconstructed directly
  • unchanged files are resolved immediately from the manifest
  • deletes are represented by omission from the next manifest

This trades storage efficiency for implementation simplicity and predictable read behavior.

Why store exact Markdown bytes instead of only hashes?

A blob hash alone is not enough to serve historical objects during clone, fetch, or pull. If historical Markdown were regenerated later from current exporter code, even tiny formatting changes could rewrite blob hashes, tree hashes, and commit hashes retroactively.

Persisting the exact exported Markdown bytes on the revision snapshot keeps historical Git-visible content stable.

Tradeoffs and intentional limitations

This implementation is deliberately scoped.

  • The persisted history model is linear. We do not support branches in the stored WordPress-side history yet.
  • Merge commits are explicitly rejected for now.
  • Empty commits are rejected.
  • The commit manifest stores a full snapshot on every commit, which is simpler but duplicates path-to-revision mappings over time.
  • The repository is still rebuilt in memory on each request, so this favors storage simplicity over absolute compute efficiency.
  • The implementation only tracks the supported content types already exposed by wp-origin today: posts and pages.

Push behavior and conflict handling

For pushes, this PR keeps the existing WordPress-first behavior but persists the Git-side result as part of the same flow. Each pushed commit is:

  1. read from the incoming Git history
  2. validated
  3. applied to WordPress content
  4. captured as a new persisted snapshot manifest
  5. stored as a synthetic commit in WordPress history

Conflict checks still use the existing content-level stale-change protections based on the Markdown metadata and current WordPress modified timestamps.

Git layer fix included here

While building and testing this, one lower-level Git issue also surfaced.

The toolkit was encoding directory tree mode as 040000, while the Git CLI writes tree entries as 40000. That difference changes tree hashes, which meant a client-pushed commit could not be reconstructed byte-for-byte even when the file blobs were identical.

This PR normalizes directory tree mode handling to 40000 and updates the related Git tests.

Without that fix, the new persistence approach would still fail when rebuilding some client-created commits.

Follow-ups / future work

  • I think we need to simplify front matter and remove the dates because they change with each commit
  • it would be ideal if valid markdown once pushed didn't require a pull after.
  • we need solid testing with an agent.

Persist the md.git remote across requests by rebuilding an in-memory Git repository from WordPress data instead of throwing away history after each HTTP call.

Store synthetic commits as a private wp_origin_commit post type, keep a HEAD option, and persist exact exported Markdown bytes on revision records so clone/pull can reconstruct stable snapshots without touching the plugin filesystem.

On push, apply incoming commits to WordPress and persist each resulting snapshot manifest so later pulls have real ancestry. Also fix Git directory tree mode encoding to match the Git CLI, which keeps reconstructed tree and commit hashes stable after client pushes.

Add a small Playground runner script for local manual testing without the full e2e harness.
@artpi artpi requested a review from adamziel April 27, 2026 07:37
@artpi artpi self-assigned this Apr 27, 2026
artpi added 2 commits April 27, 2026 10:10
Build and validate a complete push plan before writing any WordPress content. This prevents rejected pushes from partially updating posts or advancing the persisted WP Origin HEAD.

Validate Markdown paths, post type and slug metadata, supported post statuses, permissions, stale modified timestamps, deletes, and unsupported non-Markdown files before applying the pushed commits.

Only after the full pushed range validates do we apply each commit to WordPress, capture revision snapshots, and persist the corresponding synthetic commit manifest.
Encode exact-history fields before they pass through WordPress metadata APIs so commit messages, identities, and stored Markdown snapshots round-trip byte-for-byte across history reconstruction.

Harden the display-only commit subject handling for long or malformed first lines, and extend the WP Origin e2e script to verify a slash-heavy pushed commit can be recloned with the original commit message and blob content intact.
@adamziel
Copy link
Copy Markdown
Contributor

We could greatly simplify it by using a database-backed Filesystem class, e.g. see https://github.com/WordPress/php-toolkit/blob/trunk/components/Filesystem/class-sqlitefilesystem.php. That being said, it solves a problem in a plugin so let's get it in.

@adamziel adamziel merged commit aaef4b7 into trunk Apr 27, 2026
22 checks passed
@adamziel adamziel deleted the codex/wp-origin-persistent-history branch April 27, 2026 15:40
adamziel added a commit that referenced this pull request Apr 28, 2026
Add WpdbFilesystem unit tests via a small SQLite-backed wpdb shim,
extend the existing Smart-HTTP e2e to assert the old CPT model is
gone, and add a new Playground restart test that proves repository
history survives a server process restart.

The unit tests reuse FilesystemTestCase, so WpdbFilesystem now goes
through the same 23-test contract as SQLiteFilesystem, plus a binary
round-trip test that mirrors how Git objects are stored.

The persistence script mounts the WordPress SQLite DB on the host,
pushes a commit, kills the server, restarts a fresh process pointed
at the same DB, and re-clones to confirm the commit hash survives.
That is the test that would have caught the bug PR #27 fixed.
adamziel added a commit that referenced this pull request Apr 28, 2026
Follow-up to #27. The custom post type and the per-revision Markdown
blobs were doing the work of a small filesystem — one that just happened
to live inside the post table. This swaps that for an actual filesystem.

`WpdbFilesystem` is a port of `SQLiteFilesystem` to `wpdb`. Same schema,
same semantics, same transactions/savepoints, two MySQL tables instead
of two SQLite ones. Hand it to `GitRepository` and the repo's
`.git/objects`, refs, and config persist in WordPress with no further
plumbing.

The plugin then becomes much smaller. `wp_origin_commit` is gone. The
`_wp_origin_markdown` revision metadata is gone. The manifest JSON in
`post_content` is gone. The `wp_origin_head_commit_id` option is gone.
`open_repository()` no longer replays history into an in-memory repo on
every request — it just opens the persistent one.

Push conflict detection still works the same way: it compares each
pushed commit's parent tree to current WordPress state via the Markdown
front-matter, and that data already lives in the Git tree, not the
manifest. The merge-commit and empty-commit guards are kept. The "skip
modified checks for follow-on commits in a multi-commit push" rule is
kept.

Existing wp-origin databases will keep their old `wp_origin_commit`
posts and orphan post meta — there's no migration. Since the storage was
newly added in #27 and isn't load-bearing anywhere else, dropping it on
upgrade is safe; cleanup can happen separately.

## Test plan
- [ ] Run the existing Playground integration script
(`bin/test-wp-origin-git-actions.sh`) end-to-end: clone, edit, commit,
push, pull.
- [ ] Verify a fresh install creates `{$prefix}wp_origin_files` and
`{$prefix}wp_origin_directory_entries` automatically on first request.
- [ ] Push a multi-commit branch and confirm only the first commit is
gated by the per-file `modified_gmt` check.
- [ ] Push two stale clients in a row and confirm the second one gets
the "remote changed" error without corrupting state.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants