Skip to content

wpcomsh: Report recovery-mode state to wpcom#48213

Merged
taipeicoder merged 12 commits into
trunkfrom
add/wpcomsh-recovery-mode-sync
Apr 22, 2026
Merged

wpcomsh: Report recovery-mode state to wpcom#48213
taipeicoder merged 12 commits into
trunkfrom
add/wpcomsh-recovery-mode-sync

Conversation

@taipeicoder
Copy link
Copy Markdown
Contributor

@taipeicoder taipeicoder commented Apr 21, 2026

Fixes #

Proposed changes

  • New wpcomsh feature-plugin (class-wpcomsh-recovery-mode-sync.php) that captures WordPress recovery-mode option writes and POSTs a state snapshot to /sites/{blog_id}/recovery-mode-status on wpcom (endpoint to be created on the wpcom side).
  • Three integer timestamps are reported: recovery_mode_email_last_sent (written by WP core when a fatal triggers a recovery email), recovery_session_entered_at (set on first write to {session_id}_paused_extensions, i.e. admin clicked the recovery link), recovery_session_exited_at (set on deletion of that option, i.e. admin exited recovery).
  • The POST runs synchronously from each option-change listener, not via a PHP shutdown function. Recovery option writes commonly happen from inside WP's own fatal-handler shutdown callback, and register_shutdown_function called from that point is not reliably invoked — verified empirically via per-request-ID traces showing the capture and the send landed in different requests. Posting synchronously keeps the call inside a known-good stack frame.
  • updated_option listener for *_paused_extensions is intentionally not registered: WP_Paused_Extensions_Storage::delete_all() rewrites the option via update_option() when removing one extension type while entries of the other type remain, which would otherwise clobber entered_at to the exit time.
  • Because the POST runs from inside the fatal-handler shutdown path, the Jetpack Connection Client's signing helpers (wp_rand() / wp_generate_password()) may not be available yet — pluggable.php is loaded late in WP bootstrap. The class guards function_exists( 'wp_rand' ), requires ABSPATH . 'wp-includes/pluggable.php' when needed (with a file_exists guard), and aborts cleanly if signing primitives are still unavailable.

These three fields let wpcom-side consumers compute a full state machine (Needs recovery / In recovery / Recently recovered / Expired unresolved / Healthy) without requiring any history tracking on their side.

Observability

  • Diagnostic traces are emitted through a filter-gated helper (self::trace()), matching the migrate-guru-canary.php pattern. Off by default. Flip per-site via:
    add_filter( 'wpcomsh_recovery_mode_sync_logging_enabled', '__return_true' );
  • Traces tag each line with an 8-hex-char per-request ID so the capture line and the corresponding POST line can be correlated across interleaved log output.
  • Captured events, POST payload, non-2xx responses, WP_Error responses, exceptions, and each abort reason (missing class, missing helper, falsy blog id, missing wp_rand) are all traced.

Related product discussion/links

Does this pull request change what data or activity we track or use?

Yes. Three integer timestamps describing the site's recovery-mode state are POSTed to wpcom:

  • recovery_mode_email_last_sent — Unix timestamp of WP's last recovery email send.
  • recovery_session_entered_at — Unix timestamp of when an admin last entered recovery mode.
  • recovery_session_exited_at — Unix timestamp of when an admin last exited recovery mode.

No content, error messages, recovery tokens/keys, or user identifiers are sent. Payload is three int fields. Data applies only to sites running wpcomsh (wpcom Atomic).

Testing instructions

  1. Deploy this branch to an Atomic test site (e.g. jetpack rsync plugins/wpcomsh <site>).
  2. Install a plugin that fatals on load (e.g. a regular plugin that calls trigger_error( 'boom', E_USER_ERROR );) and activate it.
  3. (Optional, for verifying traces) add a must-use plugin with add_filter( 'wpcomsh_recovery_mode_sync_logging_enabled', '__return_true' ); and tail -f the PHP-FPM error log.
  4. Visit the front-end once to trigger WP's fatal-error handler. WP sends a recovery email.
  5. Confirm an outbound POST to /sites/<blog_id>/recovery-mode-status containing recovery_mode_email_last_sent with a fresh timestamp. If traces are on, expect lines like wpcomsh_recovery_mode_sync[<id>]: captured email_last_sent … immediately followed by wpcomsh_recovery_mode_sync[<id>]: posting state … with a matching request ID.
  6. Click the recovery link in the email. Confirm another POST fires, now with recovery_session_entered_at populated and recovery_mode_email_last_sent unchanged.
  7. Exit recovery mode via the admin bar button. Confirm a third POST fires, with recovery_session_exited_at populated and recovery_session_entered_at unchanged (regression test for the delete_all() clobber).
  8. Load a normal page (no fatal). Confirm no outbound POST — the listener only runs when a recovery option was written.

Note: the wpcom-side endpoint POST /sites/{blog_id}/recovery-mode-status must exist to accept the payload. Until it lands, requests will 404 server-side but wpcomsh behavior is still exercised and observable locally.

Capture WordPress recovery-mode option writes and POST a state snapshot
to /sites/{blog_id}/recovery-mode-status from a PHP shutdown function so
the signal reaches WPcom even on fatal-error requests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

Are you an Automattician? Please test your changes on all WordPress.com environments to help mitigate accidental explosions.

  • To test on WoA, go to the Plugins menu on a WoA dev site. Click on the "Upload" button and follow the upgrade flow to be able to upload, install, and activate the Jetpack Beta plugin. Once the plugin is active, go to Jetpack > Jetpack Beta, select your plugin (WordPress.com Site Helper), and enable the add/wpcomsh-recovery-mode-sync branch.

Interested in more tips and information?

  • In your local development environment, use the jetpack rsync command to sync your changes to a WoA dev blog.
  • Read more about our development workflow here: PCYsg-eg0-p2
  • Figure out when your changes will be shipped to customers here: PCYsg-eg5-p2

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

Thank you for your PR!

When contributing to Jetpack, we have a few suggestions that can help us test and review your patch:

  • ✅ Include a description of your PR changes.
  • ✅ Add a "[Status]" label (In Progress, Needs Review, ...).
  • ✅ Add testing instructions.
  • ✅ Specify whether this PR includes any changes to data or privacy.
  • ✅ Add changelog entries to affected projects

This comment will be updated as you work on your PR and make changes. If you think that some of those checks are not needed for your PR, please explain why you think so. Thanks for cooperation 🤖


Follow this PR Review Process:

  1. Ensure all required checks appearing at the bottom of this PR are passing.
  2. Make sure to test your changes on all platforms that it applies to. You're responsible for the quality of the code you ship.
  3. You can use GitHub's Reviewers functionality to request a review.
  4. When it's reviewed and merged, you will be pinged in Slack to deploy the changes to WordPress.com simple once the build is done.

If you have questions about anything, reach out in #jetpack-developers for guidance!


Wpcomsh plugin:

  • Next scheduled release: Atomic deploys happen twice daily on weekdays (p9o2xV-2EN-p2)

If you have any questions about the release process, please ask in the #jetpack-releases channel on Slack.

@github-actions github-actions Bot added the [Status] Needs Author Reply We need more details from you. This label will be auto-added until the PR meets all requirements. label Apr 21, 2026
@taipeicoder taipeicoder self-assigned this Apr 21, 2026
@taipeicoder taipeicoder marked this pull request as draft April 21, 2026 06:26
@jp-launch-control
Copy link
Copy Markdown

jp-launch-control Bot commented Apr 21, 2026

Code Coverage Summary

Coverage changed in 1 file.

File Coverage Δ% Δ Uncovered
projects/plugins/wpcomsh/wpcomsh.php 112/363 (30.85%) -0.09% 1 ❤️‍🩹

1 file is newly checked for coverage.

File Coverage
projects/plugins/wpcomsh/feature-plugins/class-wpcomsh-recovery-mode-sync.php 5/100 (5.00%) 💔

Full summary · PHP report

Coverage check overridden by I don't care about code coverage for this PR Use this label to ignore the check for insufficient code coveage. .

WP_Paused_Extensions_Storage::delete_all() rewrites the session option
via update_option() when only one extension type is being removed but
entries of the other type remain. Treating that rewrite as a new entry
was bumping entered_at to the exit time.

Drop the updated_option listener; only added_option signals a new
recovery session entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@taipeicoder taipeicoder changed the title wpcomsh: Signal recovery-mode state to WPcom dashboard wpcomsh: Report recovery-mode state to WPcom Apr 21, 2026
@taipeicoder taipeicoder changed the title wpcomsh: Report recovery-mode state to WPcom wpcomsh: Report recovery-mode state to wpcom Apr 21, 2026
taipeicoder and others added 8 commits April 21, 2026 14:51
Remove dashboard-specific wording since how wpcom consumes the signal
(dashboard, alerts, etc.) is out of scope for this module. Normalize
"WPcom" to "wpcom" in docblocks and the changelog.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Error paths (WP_Error response, non-2xx, thrown exception) log to
WPCOMSH_Log/Kibana so operational issues surface site-wide.

Success-path traces (hook captures and outbound POST snapshots) go
through a filter-gated error_log helper, matching the migrate-guru-canary
pattern. Default off; flip `wpcomsh_recovery_mode_sync_logging_enabled`
to true on a specific site during verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add trace() calls at each early return inside send() so we can tell
exactly why a shutdown callback bails without posting — useful for
diagnosing environments where the Jetpack Connection Client isn't
autoloaded, the _wpcom_get_current_blog_id() helper isn't available,
or the blog id isn't resolvable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The option writes that trigger a capture (paused_extensions, recovery
email timestamp) happen from inside WP's own fatal-handler shutdown
callback. register_shutdown_function() called from within a shutdown
callback is not reliably executed, which left send() never firing
even when capture_session_start was observed.

Register the shutdown function in init() so it's queued before WP's
fatal handler runs. The callback already no-ops when $payload is null,
so the cost on requests without a recovery event is negligible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Makes it possible to correlate logs across the capture point and the
shutdown send() when they're interleaved with other requests' output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
register_shutdown_function — even when registered eagerly in init() —
was not reliably invoking send() on the recovery-link request where
the option writes actually happen. Per-request trace IDs confirmed
the capture trace and the send()-entered trace came from different
requests, so the capture request never ran its own send().

Drop the shutdown deferral. POST directly from each capture listener.
Captures are rare (only on recovery-mode option writes), so the
multiple-POST-per-request cost is negligible, and the flow is now
independent of PHP shutdown behavior.

Also unify all logging through the filter-gated trace() helper and
drop the hot-path send()-entered / null-payload traces now that
send() is only reachable from capture listeners.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Connection Client signs wpcom_json_api_request_as_blog() calls with
wp_rand() / wp_generate_password(), which live in pluggable.php. That
file is loaded late in WP bootstrap and is often not yet available when
we're invoked from inside WP's fatal-handler shutdown path — which is
the exact scenario this feature exists to handle. Pull it in ourselves,
matching the migrate-guru-canary pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Avoids a secondary fatal if ABSPATH is defined but wp-includes/pluggable.php
is missing from its expected location. Falls through to the existing
wp_rand() availability trace-and-abort.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@taipeicoder taipeicoder marked this pull request as ready for review April 21, 2026 08:14
@taipeicoder taipeicoder added [Status] Needs Review This PR is ready for review. I don't care about code coverage for this PR Use this label to ignore the check for insufficient code coveage. and removed [Status] Needs Author Reply We need more details from you. This label will be auto-added until the PR meets all requirements. [Status] In Progress labels Apr 21, 2026
Declare $payload as a non-null array<string,int> defaulting to []
and gate send()/snapshot() on empty(). Drops the null sentinel, which
Phan couldn't narrow through the snapshot() indirection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@taipeicoder
Copy link
Copy Markdown
Contributor Author

Note: the wpcom-side endpoint POST /sites/{blog_id}/recovery-mode-status will be implemented as a follow-up. For this PR we can use logging to determine whether the behavior is correct or not.

/**
* Listener for the recovery-mode email timestamp.
*/
public static function capture_email_last_sent() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get latest value from the hook

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True! But I think relying on snapshot() provides better consistency across functions so I'd rather leaving it as is 🙂

Comment thread projects/plugins/wpcomsh/feature-plugins/class-wpcomsh-recovery-mode-sync.php Outdated
Comment thread projects/plugins/wpcomsh/feature-plugins/class-wpcomsh-recovery-mode-sync.php Outdated
Comment thread projects/plugins/wpcomsh/feature-plugins/class-wpcomsh-recovery-mode-sync.php Outdated
snapshot() already reads the just-written option value via get_option()
(WP fires the *_option_X hooks after the cache is updated), so the
explicit reassignments after snapshot() were no-ops.

Addresses review feedback on PR #48213.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@taipeicoder taipeicoder merged commit 0bf181f into trunk Apr 22, 2026
73 checks passed
@taipeicoder taipeicoder deleted the add/wpcomsh-recovery-mode-sync branch April 22, 2026 04:25
@github-actions github-actions Bot removed [Status] In Progress [Status] Needs Review This PR is ready for review. labels Apr 22, 2026
arthur791004 added a commit that referenced this pull request May 4, 2026
* wpcomsh recovery-mode sync: include per-extension error info

Follow-up to #48213. The state snapshot now also carries an extracted
view of the live *_paused_extensions option, so wpcom-side consumers
(Calypso) can surface what fataled instead of just that something
fataled.

Each record carries kind/slug/version + errno/message/file/line plus
the transportable signature token from #48369, so a fatal seen via
the recovery email and via the wpcomsh fatal-error screen can be
joined on the same opaque token. file is reduced to its basename so
server paths don't leak.

Reading from the live option on every snapshot (instead of stashing
errors in our own option, or threading them through one capture path)
means every POST — email / session-start / session-end — emits a
complete state, and session-end naturally shows errors=[] without any
explicit clear step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Phan: widen \$payload type to array<string,mixed>

The new recovery_session_errors field is an array of records, so the
existing array<string,int> phpdoc no longer fits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Drop signature from recovery-mode-sync error records

The flat fields (kind/slug/version/errno/message/file/line) cover the
Calypso display use case. The signature was for cross-surface
analytics joining (recovery email vs. fatal-error screen logstash),
which has no consumer yet. We can re-add when one materializes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Drop signature mention from changelog entry

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Capture error_get_last() at email-send time

So the fatal-request POST already carries the error info, instead of
waiting for the admin to click the recovery email link.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Recovery sync: match core's slug shape in resolve_extension_for_file

Use the first path segment under WP_PLUGIN_DIR as the plugin slug — the
same value WP_Recovery_Mode::get_extension_for_error() produces and the
key WP itself uses inside *_paused_extensions. Previously we returned
the main-file path (e.g. akismet/akismet.php), which would not match the
slug stored once a session is created for the same fatal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI pushed a commit to dognose24/jetpack that referenced this pull request May 4, 2026
…tic#48440)

* wpcomsh recovery-mode sync: include per-extension error info

Follow-up to Automattic#48213. The state snapshot now also carries an extracted
view of the live *_paused_extensions option, so wpcom-side consumers
(Calypso) can surface what fataled instead of just that something
fataled.

Each record carries kind/slug/version + errno/message/file/line plus
the transportable signature token from Automattic#48369, so a fatal seen via
the recovery email and via the wpcomsh fatal-error screen can be
joined on the same opaque token. file is reduced to its basename so
server paths don't leak.

Reading from the live option on every snapshot (instead of stashing
errors in our own option, or threading them through one capture path)
means every POST — email / session-start / session-end — emits a
complete state, and session-end naturally shows errors=[] without any
explicit clear step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Phan: widen \$payload type to array<string,mixed>

The new recovery_session_errors field is an array of records, so the
existing array<string,int> phpdoc no longer fits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Drop signature from recovery-mode-sync error records

The flat fields (kind/slug/version/errno/message/file/line) cover the
Calypso display use case. The signature was for cross-surface
analytics joining (recovery email vs. fatal-error screen logstash),
which has no consumer yet. We can re-add when one materializes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Drop signature mention from changelog entry

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Capture error_get_last() at email-send time

So the fatal-request POST already carries the error info, instead of
waiting for the admin to click the recovery email link.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Recovery sync: match core's slug shape in resolve_extension_for_file

Use the first path segment under WP_PLUGIN_DIR as the plugin slug — the
same value WP_Recovery_Mode::get_extension_for_error() produces and the
key WP itself uses inside *_paused_extensions. Previously we returned
the main-file path (e.g. akismet/akismet.php), which would not match the
slug stored once a session is created for the same fatal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: dognose24 <6869813+dognose24@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

I don't care about code coverage for this PR Use this label to ignore the check for insufficient code coveage. [Plugin] Wpcomsh

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants