Drive pull-reprint with a single reprint pull#3768
Conversation
Re-running `studio pull-reprint` on a completed pull previously printed "pulled successfully" and exited without contacting the remote, so remote edits never reached the local site. A re-run now resets the stage machine and re-executes every phase; reprint makes each phase incremental (files delta sync, full DB re-download + re-apply with the idempotent dump). Details: - Persist a hasCompletedOnce flag in pull.json so the non-empty site-directory guard is skipped on re-pulls (the directory holds the previous pull's output) and drop the cached preflight so connectivity and the secret-rotation retry path are re-verified. - Re-apply the stored admin credentials over the running site's admin API after the database refresh: db-apply rebuilds the DB from the remote dump, wiping the local admin user and the studio_admin_username option that /studio-auto-login depends on. The server only re-applies credentials at startup, and a re-pull deliberately skips restarting a running server. A connection failure on that request doubles as a health check — if the daemon's "online" view is stale, fall through to a server start. - Reconcile pull.json with the live site record in ensurePort() before the DB stages: if the site record was re-created with a different id/port, db-apply would rewrite the database URLs to a port nothing serves. - Drop /tmp/reprint/* mounts with missing host paths when loading the persisted start-options.json: reprint state files mounted for the remote-uploads proxy are transient, and mounting a missing path crashed server starts with ENOENT. - Fix shouldRestartFilesSyncIndex() comparing against the legacy 'files-sync' command name; reprint canonicalizes to 'files-pull' when saving state, so the check was dead code. Requires the reprint-side refresh support (WordPress/reprint#257) in the bundled reprint.phar.
05d8026 to
19388cd
Compare
📊 Performance Test ResultsComparing dc16064 vs trunk app-size
site-editor
site-startup
Results are median values from multiple test runs. Legend: 🟢 Improvement (faster) | 🔴 Regression (slower) | ⚪ No change (<50ms diff) |
…ta-resync # Conflicts: # apps/cli/commands/pull-reprint.ts
…print-delta-resync
A delta re-pull re-runs files-sync/db-sync/db-apply against a state dir where the previous pull left them "complete". Re-running as-is either throws (files-sync rejects a --filter change while a sync still looks in progress) or silently no-ops, so the database never refreshes. Issue `<command> --abort` before each on a re-pull to reset the sub-command state so every phase re-runs: files-sync keeps the local index and deltas, db-sync re-downloads, db-apply re-applies. Gated on hasCompletedOnce (the persisted "this pull finished once" flag) rather than the local isRepull, because the first re-pull attempt already reset the stage to "initialized"; a retry must still abort. Each command is aborted only while its stage is still pending, so a mid-pull resume never wipes a phase that already re-ran. Runs after preflight, which files-sync/db-sync --abort require. This removes the dependency on reprint's sub-commands auto-refreshing, so the re-pull works against a stock reprint build.
There was a problem hiding this comment.
Pull request overview
This PR changes the studio pull-reprint orchestration so re-running a previously completed pull performs a real delta re-pull (re-entering reprint phases, clearing “complete” subcommand state via --abort, and re-running DB refresh), rather than exiting early without syncing remote changes.
Changes:
- Reset completed pull sessions back to
initializedon re-run, clear cached preflight, and abort completed reprint subcommands so files re-sync incrementally and DB refresh re-applies. - Improve resilience around reprint state/version differences and stale runtime mounts (preventing ENOENT crashes on server start).
- Add Vitest coverage for delta re-pull reset behavior, exporter re-enable on preflight retry, and admin-credentials re-apply behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| apps/cli/lib/wordpress-server-manager.ts | Filters out stale /tmp/reprint/* mounts from persisted start options to avoid ENOENT crashes. |
| apps/cli/lib/pull/reprint-state.ts | Accepts both files-sync and canonicalized files-pull when interpreting reprint state. |
| apps/cli/commands/pull-reprint.ts | Implements delta re-pull flow (stage reset, preflight refresh, --abort state clearing), updates port adoption logic, and re-applies admin credentials when reusing a running server. |
| apps/cli/commands/tests/pull-reprint.test.ts | Adds tests covering delta re-pull behavior and recovery paths. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
On a re-pull where the daemon process is still up, the site-started "already running" branch skipped updateSiteLatestCliPid and updateSiteAutoStart; both only ran on the start branch. The running-status check requires site.latestCliPid to equal the live process pid, so a stale or missing pid (e.g. after ensurePort adopts a re-created site record) made the site report as stopped. Mirror the start branch and `studio site start`'s already-running path: refresh latestCliPid when online and keep autoStart enabled.
| const isRepull = studioMetadata.stage === 'completed'; | ||
| if ( isRepull ) { | ||
| studioMetadata.stage = 'initialized'; | ||
| studioMetadata.hasCompletedOnce = true; |
There was a problem hiding this comment.
Would it be helpful if reprint exposed this information? Such as php reprint.php import-metadata | jq '.hasCompletedOnce'? It's already stored there so that would be less state and less opportunity for one of these values to go out of sync.
There was a problem hiding this comment.
Yes, I think that would be better as reprint "knows better" about the pull, and reducing the state to the minimum seems a good option.
|
|
||
| // Re-verify connectivity (and give the secret-rotation retry path | ||
| // a chance to run) instead of trusting the cached preflight from | ||
| // the original pull, which may be days old. |
There was a problem hiding this comment.
Did a cached preflight come up in your testing? In what way did it cause trouble?
There was a problem hiding this comment.
I had errors for sites that were pulled some time before that were "missing multipart boundary" due to HTML being returned. The way to reproduce that is to pull a site, then wait for 60 minutes (or reduce the sliding window), and then pull again
There was a problem hiding this comment.
Oh, so this is an issue with the rotating token we use on WP.com. And it seems like that token is saved in preflight.json, which is why deleting that file helps. 👍
| // Rotating the secret does not bump `reprint_exporter_enabled`, so | ||
| // re-open the gate explicitly; otherwise the retry hits the same | ||
| // closed window and ?reprint-api falls through to an HTML page. | ||
| await enableReprintExporter( |
|
The code change mostly makes sense, thank you @epeicher! I'm only confused by one thing: If we're moving to reliance on |
As part of this PR, I have not moved to relying on |
Replace the per-sub-command orchestration (downloadEssentialSiteFiles, refreshFlattenedSiteDirectory, downloadRemoteDatabase, applyDownloadedDatabase, generateRuntimeConfiguration) and the clearCompletedSubcommandState/--abort delta-reset with one `reprint pull` call. reprint owns the pipeline ordering (files-pull -> db-pull -> db-apply -> flat-docroot -> apply-runtime) and resets its own state for a delta re-pull via prepare_repull(), so the Studio-side --abort wiring and per-phase stage gating go away. - runFullPull() issues the single pull with the same sqlite geometry the old db-apply used (target sqlite under the raw content dir) plus --flatten-to, --runtime=playground-cli, --start-runtime=none and --output-dir, mounting the site + runtime dirs up front. - ensurePort moves before the pull so --new-site-url is available. - Collapse the stage machine from 9 stages to 5 (initialized -> pulled -> site-registered -> site-started -> completed). - Bump the PHP-WASM memory_limit to 1024M: the single long-lived fork holds the file-index high-water-mark across phases. Requires reprint's flatten_to->flat_document_root bridge and the re-pull filter-guard fix (WordPress/reprint#268).
Drop the historic 'replaces the former clearCompletedSubcommandState' framing from the pull comment and the runFullPull docblock; describe what the single-pull flow does now.
|
The latest changes 0b27bdf, use a composite The changes require WordPress/reprint#268 to be applied and generate a |
adamziel
left a comment
There was a problem hiding this comment.
I haven't tested it, but the code change looks sound 👍
|
After merging of WordPress/reprint#268, these changes have been tested and work as expected. I have been able to pull both initial and incremental changes from a remote. Merging this.
|

Related issues
Closes STU-1885
flatten_to->flat_document_rootbridge plus the re-pull filter-guard fix that this flow depends on).How AI was used in this PR
Claude Code measured the single-fork memory profile against a live WP.com Atomic site (26k files / ~570 MB), implemented the migration and tests, and verified the full flow end-to-end (fresh pull plus delta re-pull: remote edit -> re-pull -> local content / auto-login checks). Each change was reviewed and re-tested live.
Proposed Changes
Re-running
studio pull-reprinton a completed pull previously printed "pulled successfully" and exited without contacting the remote, so remote edits, inserts, and deletions never reached the local site. With this PR a re-run performs a delta re-pull: files re-sync incrementally and the database is refreshed so existing-row edits and deletions propagate.Studio now drives reprint with a single
reprint pullinstead of orchestrating the pipeline itself. One call runs preflight -> files-pull -> db-pull -> db-apply -> flat-docroot -> apply-runtime in one PHP-WASM fork, and reprint resets its own state for a delta re-pull viaprepare_repull(). This deletes the former Studio orchestration: the five per-sub-command functions, the nine-stage machine, and the--abortdelta-reset wiring (about 340 lines net). The single fork'smemory_limitis raised to 1024M because the file-index high-water-mark now persists across phases (measured peak ~510M on the test site).Fixes that remain part of making re-pulls real:
adminuser andstudio_admin_usernameoption that/studio-auto-logindepends on. The pull re-applies the stored credentials over the running site's admin API, and a connection failure on that request doubles as a health check that catches a stale daemon "online" state and falls back to starting the server.--new-site-url) targets the right port.start-options.json(previously ENOENT crashed the start).Bundle the reprint build via
REPRINT_VERSIONinscripts/download-wp-server-files.ts; it must include WordPress/reprint#268.Testing Instructions
composer build:pharon the reprint repo.reprint.pharatwp-files/reprint/reprint.pharand runnpm run cli:build.node apps/cli/dist/cli/main.mjs pull-reprint --verbose --url <site>and let it complete. It should invoke a singlereprint pull(visible with--verbose) and start a working site.studio site stop/site start). No ENOENT mount errors.Pre-merge Checklist