Comments: Allow WP_Comment_Query fields to select arbitrary columns#11934
Comments: Allow WP_Comment_Query fields to select arbitrary columns#11934dd32 wants to merge 1 commit into
Conversation
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN: To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
9115f83 to
5f33400
Compare
Extends the `fields` argument of `WP_Comment_Query` to accept:
- A column name from the `$wpdb->comments` table (e.g.
`'comment_post_ID'`). Returns a flat array of distinct values for
that column.
- A `'col_a=>col_b'` pair (e.g. `'comment_ID=>comment_post_ID'`).
Returns an associative array keyed by col_a's value, with col_b's
value as the entry — mirroring `WP_Query` / `WP_Term_Query`'s
`'id=>parent'` idiom.
- An array of column names. Returns an array of `stdClass` objects
with the requested columns as properties (one entry per matched
comment, no deduplication).
The motivation is to avoid hydrating full `WP_Comment` objects when the
caller only needs a subset of columns. In one reference case
(gp-translation-helpers), hydrating ~32K comments to extract distinct
post IDs took ~6.6s; the equivalent raw SQL took ~610ms (~11x), all of
it from skipping hydration.
Single-column results apply `DISTINCT` in SQL so callers can drop the
surrounding `array_unique( wp_list_pluck( ... ) )`. The array form does
not — it returns one row per matched comment, like a hand-written
projection.
Field-selection results are cached separately from the base
comment-ID query, sharing the comment `last_changed` salt so any
comment mutation invalidates both layers together. A repeat call with
identical args runs zero SQL queries.
Column names must be passed in their exact case; the only exception is
the `ID` suffix of `comment_ID` / `comment_post_ID`, which is accepted
in any case.
`'ids'` and `''` retain their existing meaning; the new behaviour is
purely additive.
Props dd32.
See #65313.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5f33400 to
908e631
Compare
westonruter
left a comment
There was a problem hiding this comment.
An initial review pass. I'll have some more suggestions, but I'm short on time for this week.
| * @param string $field | ||
| * @return string|null | ||
| */ | ||
| private function parse_field_column( $field ) { |
There was a problem hiding this comment.
| private function parse_field_column( $field ) { | |
| private function parse_field_column( string $field ): ?string { |
| * | ||
| * @since 7.1.0 | ||
| * | ||
| * @param string $field |
There was a problem hiding this comment.
| * @param string $field | |
| * @param string $field Field name. |
| * @since 7.1.0 | ||
| * | ||
| * @param string $field | ||
| * @return string|null |
There was a problem hiding this comment.
| * @return string|null | |
| * @return non-empty-string|null Canonical column name or null if unknown. |
| if ( preg_match( '/^(comment(?:_post)?_)([iI][dD])$/', $field, $m ) ) { | ||
| return $m[1] . 'ID'; |
There was a problem hiding this comment.
To avoid abbreviations:
| if ( preg_match( '/^(comment(?:_post)?_)([iI][dD])$/', $field, $m ) ) { | |
| return $m[1] . 'ID'; | |
| if ( preg_match( '/^(comment(?:_post)?_)([iI][dD])$/', $field, $matches ) ) { | |
| return $matches[1] . 'ID'; |
| * @param array $fields Tagged tuple from `parse_fields()`. | ||
| * @return array | ||
| */ | ||
| private function get_comment_field_values( $comment_ids, $fields ) { |
There was a problem hiding this comment.
| private function get_comment_field_values( $comment_ids, $fields ) { | |
| private function get_comment_field_values( array $comment_ids, array $fields ) { |
| * @param mixed $fields Raw `fields` query var. | ||
| * @return array|null | ||
| */ | ||
| private function parse_fields( $fields ) { |
There was a problem hiding this comment.
| private function parse_fields( $fields ) { | |
| private function parse_fields( $fields ): ?array { |
| * @since 7.1.0 | ||
| * | ||
| * @param mixed $fields Raw `fields` query var. | ||
| * @return array|null |
There was a problem hiding this comment.
| * @return array|null | |
| * @param string[]|string $fields Raw `fields` query var. | |
| * @return array{ 'list', non-empty-string[] } | |
| * |array{ 'map', non-empty-string, non-empty-string } | |
| * |array{ 'col', non-empty-string } | |
| * |null | |
| */ |
| * Return shapes: | ||
| * - `array( 'col', $column )` — single column, DISTINCT. | ||
| * - `array( 'map', $key_col, $val_col )` — `'col_a=>col_b'` associative map. | ||
| * - `array( 'list', $columns )` — array form, returns `stdClass[]`. |
There was a problem hiding this comment.
I was analyzing this method with PHPStan and this doesn't seem to be correct. It seems this is actually:
| * - `array( 'list', $columns )` — array form, returns `stdClass[]`. | |
| * - `array( 'list', $columns )` — array form, returns `string[]`. |
| * @return string|null | ||
| */ | ||
| private function parse_field_column( $field ) { | ||
| static $columns = array( |
There was a problem hiding this comment.
I suggest the static be removed. It doesn't really gain much if anything.
| static $columns = array( | |
| $columns = array( |
Also, PHPStan is tripped up by it below, for some reason:
Parameter #2 $haystack of function in_array expects array, mixed given.
But what about a private const instead?
This could be added to the class:
private const COLUMNS = array(
'comment_ID',
'comment_post_ID',
'comment_author',
'comment_author_email',
'comment_author_url',
'comment_author_IP',
'comment_date',
'comment_date_gmt',
'comment_content',
'comment_karma',
'comment_approved',
'comment_agent',
'comment_type',
'comment_parent',
'user_id',
);This would seem to be reusable in the ::parse_orderby() method as well.
Trac ticket: https://core.trac.wordpress.org/ticket/65313
Summary
Extends
WP_Comment_Query'sfieldsargument to accept:$wpdb->comments(e.g.'comment_post_ID') — returns a flat array of distinct values for that column.'col_a=>col_b'pair (e.g.'comment_ID=>comment_post_ID') — returns an associative array keyed by col_a's value with col_b's value as the entry, mirroring the'id=>parent'idiom inWP_QueryandWP_Term_Query.'ids'and''(default) are unchanged — the new behaviour is purely additive.Why
Plugins that want "the distinct
comment_post_IDs matching X" have three options today, none good:fields => ''+array_column( $q->comments, 'comment_post_ID' )— hydrates every matched comment just to discard all but one field.fields => 'ids'+get_comment( $id )->comment_post_IDin a loop — N+1 cache lookups.In the gp-translation-helpers reference case in the Trac ticket, hydrating ~32K comments to extract distinct post IDs took ~6.6s; the equivalent raw SQL returned the same 25,694 IDs in ~610ms (~11x speedup, all of it from skipping hydration).
API
Column names must be passed in their exact case. The only exception is the
IDsuffix ofcomment_ID/comment_post_ID, which is accepted in any case (comment_id,comment_Id,comment_ID). Unknown columns fall through to the default behaviour (WP_Commentobjects), keeping the path forward-compatible.Test plan
npm run test:php -- tests/phpunit/tests/comment/query.php— 166 tests / 489 assertions pass.npm run test:php -- --group comment— 540 tests / 1367 assertions pass (no regressions).'col_a=>col_b'map form returns associative array.IDsuffix on both sides of=>; strict case for non-IDcolumns.WP_Commentobjects.'ids'semantics unchanged.array()without a follow-up query.phpcs --standard=phpcs.xml.distclean on both changed files (only pre-existing warnings on untouched lines).🤖 Generated with Claude Code