wpcomsh: Report recovery-mode state to wpcom#48213
Conversation
Capture WordPress recovery-mode option writes and POST a state snapshot
to /sites/{blog_id}/recovery-mode-status from a PHP shutdown function so
the signal reaches WPcom even on fatal-error requests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Are you an Automattician? Please test your changes on all WordPress.com environments to help mitigate accidental explosions.
Interested in more tips and information?
|
|
Thank you for your PR! When contributing to Jetpack, we have a few suggestions that can help us test and review your patch:
This comment will be updated as you work on your PR and make changes. If you think that some of those checks are not needed for your PR, please explain why you think so. Thanks for cooperation 🤖 Follow this PR Review Process:
If you have questions about anything, reach out in #jetpack-developers for guidance! Wpcomsh plugin:
If you have any questions about the release process, please ask in the #jetpack-releases channel on Slack. |
Code Coverage SummaryCoverage changed in 1 file.
1 file is newly checked for coverage.
Coverage check overridden by
I don't care about code coverage for this PR
|
WP_Paused_Extensions_Storage::delete_all() rewrites the session option via update_option() when only one extension type is being removed but entries of the other type remain. Treating that rewrite as a new entry was bumping entered_at to the exit time. Drop the updated_option listener; only added_option signals a new recovery session entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove dashboard-specific wording since how wpcom consumes the signal (dashboard, alerts, etc.) is out of scope for this module. Normalize "WPcom" to "wpcom" in docblocks and the changelog. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Error paths (WP_Error response, non-2xx, thrown exception) log to WPCOMSH_Log/Kibana so operational issues surface site-wide. Success-path traces (hook captures and outbound POST snapshots) go through a filter-gated error_log helper, matching the migrate-guru-canary pattern. Default off; flip `wpcomsh_recovery_mode_sync_logging_enabled` to true on a specific site during verification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add trace() calls at each early return inside send() so we can tell exactly why a shutdown callback bails without posting — useful for diagnosing environments where the Jetpack Connection Client isn't autoloaded, the _wpcom_get_current_blog_id() helper isn't available, or the blog id isn't resolvable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The option writes that trigger a capture (paused_extensions, recovery email timestamp) happen from inside WP's own fatal-handler shutdown callback. register_shutdown_function() called from within a shutdown callback is not reliably executed, which left send() never firing even when capture_session_start was observed. Register the shutdown function in init() so it's queued before WP's fatal handler runs. The callback already no-ops when $payload is null, so the cost on requests without a recovery event is negligible. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Makes it possible to correlate logs across the capture point and the shutdown send() when they're interleaved with other requests' output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
register_shutdown_function — even when registered eagerly in init() — was not reliably invoking send() on the recovery-link request where the option writes actually happen. Per-request trace IDs confirmed the capture trace and the send()-entered trace came from different requests, so the capture request never ran its own send(). Drop the shutdown deferral. POST directly from each capture listener. Captures are rare (only on recovery-mode option writes), so the multiple-POST-per-request cost is negligible, and the flow is now independent of PHP shutdown behavior. Also unify all logging through the filter-gated trace() helper and drop the hot-path send()-entered / null-payload traces now that send() is only reachable from capture listeners. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Connection Client signs wpcom_json_api_request_as_blog() calls with wp_rand() / wp_generate_password(), which live in pluggable.php. That file is loaded late in WP bootstrap and is often not yet available when we're invoked from inside WP's fatal-handler shutdown path — which is the exact scenario this feature exists to handle. Pull it in ourselves, matching the migrate-guru-canary pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Avoids a secondary fatal if ABSPATH is defined but wp-includes/pluggable.php is missing from its expected location. Falls through to the existing wp_rand() availability trace-and-abort. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Declare $payload as a non-null array<string,int> defaulting to [] and gate send()/snapshot() on empty(). Drops the null sentinel, which Phan couldn't narrow through the snapshot() indirection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note: the wpcom-side endpoint |
| /** | ||
| * Listener for the recovery-mode email timestamp. | ||
| */ | ||
| public static function capture_email_last_sent() { |
There was a problem hiding this comment.
I think we can get latest value from the hook
There was a problem hiding this comment.
True! But I think relying on snapshot() provides better consistency across functions so I'd rather leaving it as is 🙂
snapshot() already reads the just-written option value via get_option() (WP fires the *_option_X hooks after the cache is updated), so the explicit reassignments after snapshot() were no-ops. Addresses review feedback on PR #48213. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* wpcomsh recovery-mode sync: include per-extension error info Follow-up to #48213. The state snapshot now also carries an extracted view of the live *_paused_extensions option, so wpcom-side consumers (Calypso) can surface what fataled instead of just that something fataled. Each record carries kind/slug/version + errno/message/file/line plus the transportable signature token from #48369, so a fatal seen via the recovery email and via the wpcomsh fatal-error screen can be joined on the same opaque token. file is reduced to its basename so server paths don't leak. Reading from the live option on every snapshot (instead of stashing errors in our own option, or threading them through one capture path) means every POST — email / session-start / session-end — emits a complete state, and session-end naturally shows errors=[] without any explicit clear step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Phan: widen \$payload type to array<string,mixed> The new recovery_session_errors field is an array of records, so the existing array<string,int> phpdoc no longer fits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Drop signature from recovery-mode-sync error records The flat fields (kind/slug/version/errno/message/file/line) cover the Calypso display use case. The signature was for cross-surface analytics joining (recovery email vs. fatal-error screen logstash), which has no consumer yet. We can re-add when one materializes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Drop signature mention from changelog entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Capture error_get_last() at email-send time So the fatal-request POST already carries the error info, instead of waiting for the admin to click the recovery email link. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Recovery sync: match core's slug shape in resolve_extension_for_file Use the first path segment under WP_PLUGIN_DIR as the plugin slug — the same value WP_Recovery_Mode::get_extension_for_error() produces and the key WP itself uses inside *_paused_extensions. Previously we returned the main-file path (e.g. akismet/akismet.php), which would not match the slug stored once a session is created for the same fatal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tic#48440) * wpcomsh recovery-mode sync: include per-extension error info Follow-up to Automattic#48213. The state snapshot now also carries an extracted view of the live *_paused_extensions option, so wpcom-side consumers (Calypso) can surface what fataled instead of just that something fataled. Each record carries kind/slug/version + errno/message/file/line plus the transportable signature token from Automattic#48369, so a fatal seen via the recovery email and via the wpcomsh fatal-error screen can be joined on the same opaque token. file is reduced to its basename so server paths don't leak. Reading from the live option on every snapshot (instead of stashing errors in our own option, or threading them through one capture path) means every POST — email / session-start / session-end — emits a complete state, and session-end naturally shows errors=[] without any explicit clear step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Phan: widen \$payload type to array<string,mixed> The new recovery_session_errors field is an array of records, so the existing array<string,int> phpdoc no longer fits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Drop signature from recovery-mode-sync error records The flat fields (kind/slug/version/errno/message/file/line) cover the Calypso display use case. The signature was for cross-surface analytics joining (recovery email vs. fatal-error screen logstash), which has no consumer yet. We can re-add when one materializes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Drop signature mention from changelog entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Capture error_get_last() at email-send time So the fatal-request POST already carries the error info, instead of waiting for the admin to click the recovery email link. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Recovery sync: match core's slug shape in resolve_extension_for_file Use the first path segment under WP_PLUGIN_DIR as the plugin slug — the same value WP_Recovery_Mode::get_extension_for_error() produces and the key WP itself uses inside *_paused_extensions. Previously we returned the main-file path (e.g. akismet/akismet.php), which would not match the slug stored once a session is created for the same fatal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: dognose24 <6869813+dognose24@users.noreply.github.com>
Fixes #
Proposed changes
class-wpcomsh-recovery-mode-sync.php) that captures WordPress recovery-mode option writes and POSTs a state snapshot to/sites/{blog_id}/recovery-mode-statuson wpcom (endpoint to be created on the wpcom side).recovery_mode_email_last_sent(written by WP core when a fatal triggers a recovery email),recovery_session_entered_at(set on first write to{session_id}_paused_extensions, i.e. admin clicked the recovery link),recovery_session_exited_at(set on deletion of that option, i.e. admin exited recovery).register_shutdown_functioncalled from that point is not reliably invoked — verified empirically via per-request-ID traces showing the capture and the send landed in different requests. Posting synchronously keeps the call inside a known-good stack frame.updated_optionlistener for*_paused_extensionsis intentionally not registered:WP_Paused_Extensions_Storage::delete_all()rewrites the option viaupdate_option()when removing one extension type while entries of the other type remain, which would otherwise clobberentered_atto the exit time.wp_rand()/wp_generate_password()) may not be available yet —pluggable.phpis loaded late in WP bootstrap. The class guardsfunction_exists( 'wp_rand' ), requiresABSPATH . 'wp-includes/pluggable.php'when needed (with afile_existsguard), and aborts cleanly if signing primitives are still unavailable.These three fields let wpcom-side consumers compute a full state machine (Needs recovery / In recovery / Recently recovered / Expired unresolved / Healthy) without requiring any history tracking on their side.
Observability
self::trace()), matching themigrate-guru-canary.phppattern. Off by default. Flip per-site via:WP_Errorresponses, exceptions, and each abort reason (missing class, missing helper, falsy blog id, missingwp_rand) are all traced.Related product discussion/links
Does this pull request change what data or activity we track or use?
Yes. Three integer timestamps describing the site's recovery-mode state are POSTed to wpcom:
recovery_mode_email_last_sent— Unix timestamp of WP's last recovery email send.recovery_session_entered_at— Unix timestamp of when an admin last entered recovery mode.recovery_session_exited_at— Unix timestamp of when an admin last exited recovery mode.No content, error messages, recovery tokens/keys, or user identifiers are sent. Payload is three
intfields. Data applies only to sites running wpcomsh (wpcom Atomic).Testing instructions
jetpack rsync plugins/wpcomsh <site>).trigger_error( 'boom', E_USER_ERROR );) and activate it.add_filter( 'wpcomsh_recovery_mode_sync_logging_enabled', '__return_true' );andtail -fthe PHP-FPM error log./sites/<blog_id>/recovery-mode-statuscontainingrecovery_mode_email_last_sentwith a fresh timestamp. If traces are on, expect lines likewpcomsh_recovery_mode_sync[<id>]: captured email_last_sent …immediately followed bywpcomsh_recovery_mode_sync[<id>]: posting state …with a matching request ID.recovery_session_entered_atpopulated andrecovery_mode_email_last_sentunchanged.recovery_session_exited_atpopulated andrecovery_session_entered_atunchanged (regression test for thedelete_all()clobber).Note: the wpcom-side endpoint
POST /sites/{blog_id}/recovery-mode-statusmust exist to accept the payload. Until it lands, requests will 404 server-side but wpcomsh behavior is still exercised and observable locally.