Plugin Directory: Tokenise PHP source for block / dashboard widget detection#626
Plugin Directory: Tokenise PHP source for block / dashboard widget detection#626dd32 wants to merge 3 commits intoWordPress:trunkfrom
Conversation
…tection. Replace the regex paths in find_blocks_in_file() and find_dashboard_widgets_in_file() with a single token-based extractor in a new Tools\Tokenisation_Helpers class. The helper walks tokens once per file, ignores matches in comments and string literals, and follows i18n wrappers like __(), _x(), and esc_html__() to the inner string value. Includes a 29-test PHPUnit class covering the supported cases plus the deliberate shortcuts taken to keep the helper small. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oard widget calls. - Add Tokenisation_Helpers::find_function_call_first_arg_and_array_value() for the registration pattern (first-arg identifier plus an inline options array). Uses it in find_blocks_in_file() to capture the optional title for register_block_type() and new WP_Block_Type(). - find_function_call_arg_strings() now returns one entry per matched call, yielding an empty string when the target arg has no literal. find_dashboard_widgets_in_file() therefore reports every detected call, allowing the section term to be applied even when the label is not parseable; import_from_svn skips empty values when storing dashboard_widget_name post meta. - Tests: rename the dashboard-widget class-constant case to use generic identifiers, update the variable / class-constant assertions for the new contract, and add coverage for the new title-extraction method (long/short array, translation wrapper, missing key, no options, variable value, non-literal first arg). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN: To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR replaces regex-based PHP source scanning for block and dashboard widget detection with a shared token-based extractor to avoid false positives in comments/strings and improve argument/title extraction.
Changes:
- Added
Tools\Tokenisation_Helpersto parse PHP tokens and extract string-literal arguments and array metadata. - Updated importer logic to use tokenisation for
register_block_type/new WP_Block_Typeandwp_add_dashboard_widget, and to avoid storing empty widget-name meta. - Added PHPUnit coverage for the new tokenisation behaviors and documented shortcuts.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| wordpress.org/public_html/wp-content/plugins/plugin-directory/tools/class-tokenisation-helpers.php | Introduces token-walker and string/array literal extraction utilities. |
| wordpress.org/public_html/wp-content/plugins/plugin-directory/tests/Tokenisation_Helpers_Test.php | Adds unit tests for token-based detection, including false-positive prevention and edge cases. |
| wordpress.org/public_html/wp-content/plugins/plugin-directory/cli/class-import.php | Switches block/widget detection to tokenisation helper and filters empty widget meta values. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return array(); | ||
| } | ||
|
|
||
| $is_new = str_starts_with( $function_name, 'new ' ); |
| continue; | ||
| } | ||
| $matches_simple = ( T_STRING === $tok[0] && 0 === strcasecmp( $tok[1], $needle ) ); | ||
| $matches_global = ( T_NAME_FULLY_QUALIFIED === $tok[0] && 0 === strcasecmp( $tok[1], $global_form ) ); |
| $prev_id = is_array( $pt ) ? $pt[0] : null; | ||
| break; | ||
| } | ||
| if ( in_array( $prev_id, array( T_OBJECT_OPERATOR, T_DOUBLE_COLON, T_FUNCTION, T_NULLSAFE_OBJECT_OPERATOR ), true ) ) { |
| if ( $contents ) { | ||
| foreach ( array( 'register_block_type', 'new WP_Block_Type' ) as $needle ) { | ||
| foreach ( Tokenisation_Helpers::find_function_call_first_arg_and_array_value( $contents, $needle, 1, 'title' ) as $name => $title ) { |
| $tokens = @token_get_all( $contents ); | ||
| if ( ! $tokens ) { |
| /** | ||
| * Walk PHP tokens for calls to `$function_name` and yield each call's | ||
| * arg-list tokens, split into per-arg slices at top-level commas. | ||
| * | ||
| * @return array[] One entry per matched call: [ arg0_tokens, arg1_tokens, ... ]. | ||
| */ | ||
| private static function walk_calls( $contents, $function_name ) { |
Summary
Replaces the regex paths in
find_blocks_in_file()(PHP branches) andfind_dashboard_widgets_in_file()with a single token-based extractor in a newTools\Tokenisation_Helpersclass. Fixes false positives inside comments, strings, and method/static calls; restores title extraction for bothregister_block_type()andnew WP_Block_Type(); and corrects the dashboard widget label capture for plugins that pass class constants as the widget ID (e.g. Jetpack, where the previous regex picked up the wordStatsfrom an adjacent doc comment).Builds on #625 (merged) and addresses the deferred Copilot review comments around tokenization and escaped quotes.
Behavioural changes
__,_x,esc_html__, etc.$obj->name(),Class::name(),function name()declaration\register_block_type()(leading backslash)Foo\register_block_type()(arbitrary namespace)titlein second-arg options arraynew WP_Block_Typeonly__()-wrapped values'prefix-' . $var\w+/\w+shape check (same outcome)wp_add_dashboard_widgetExamples
Block titles via the options array (now also for
register_block_type):Tokenisation_Helpers::find_function_call_first_arg_and_array_value( $src, 'register_block_type', 1, 'title' )returns:Class const + inline doc comment between args (the Jetpack pattern):
find_function_call_arg_strings( $src, 'wp_add_dashboard_widget', 1 )returnsarray( 'My Widget' ). The previous regex was sensitive to the contents of any inline comment between args — when the comment contained quote characters, the inner regex would match those quotes first and capture the wrong text (the original Jetpack source has/** "Stats" is a product name. */between the ID and the label, which is what triggered the regression).Variable as label still tags the section:
returns
array( '' )— the call is detected (so thedashboard-widgetsplugin_section term is applied), but nodashboard_widget_namemeta value is stored (the empty entry is filtered at the meta-storage point).False positives are now ignored:
All return
array().Documented shortcuts
Four test cases under
Shortcut (to reduce complexity):doc comments record deliberate trade-offs:\register_block_type(T_NAME_FULLY_QUALIFIED) is treated as the global function; arbitraryFoo\Bar\register_block_typeis not matched.$obj->method( 'Inner' )) yields its inner literal.find_function_call_first_arg_and_array_value()instead.Test plan
tests/Tokenisation_Helpers_Test.phpcover the cases above and the four documented shortcuts.phpcs-branch.phpexit 0).find_blocks_in_file()still returns the same set of blocks (no regression for plugins that useregister_block_typeornew WP_Block_Typewith literal names).dashboard_widget_namepost meta now containsJetpack Stats(wasStats).wp_add_dashboard_widgetwith a variable label still gets thedashboard-widgetssection term assigned, and no emptydashboard_widget_namerows are stored.🤖 Generated with Claude Code