Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add paragraph on the performance of meta_queries using EXISTS/NOT_EXISTS #355

Open
moraleida opened this issue Feb 22, 2021 · 2 comments
Open
Assignees

Comments

@moraleida
Copy link

moraleida commented Feb 22, 2021

Is your enhancement related to a problem? Please describe.

It is very common that developers introduce new meta_keys without setting default values for posts that already existed in the database, or that they delete meta_keys when the value is not needed instead of setting it to a default or false value.

This becomes an issue on larger database tables when users need to find some information by filtering not only by the meta_key and value but also sometimes by checking if that meta_key does not exist at all. This use case is covered in WP_Meta_Query, allowing for the use of compare => 'NOT EXISTS'

This solution should, however, be discouraged, as it generates very expensive queries that can really hammer a database performance.

Example:

  • Database with 253k rows in wp_posts and 5 million rows in wp_postmeta
  • There are 51,406 rows with the _publicize_pending meta_key in wp_postmeta
/**
 * Using NOT_EXISTS in a Meta Query generates a LEFT JOIN clause
 * with an IS NULL check in WHERE
 *
 * Rows searched: 268,358 (all rows in the join)
 */
$query = new \WP_Query(
	array(
		'suppress_filters' => true,
		'meta_query'       => array(
			array(
				'key'     => '_publicize_pending',
				'compare' => 'NOT EXISTS',
			),
		),
	) );

Generates:

MySQL [wordpress]> EXPLAIN SELECT SQL_CALC_FOUND_ROWS  wp_posts.ID FROM wp_posts  
    -> LEFT JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id AND wp_postmeta.meta_key = 'some_key' ) 
    -> WHERE 1=1  
    -> AND ( wp_postmeta.post_id IS NULL ) 
    -> GROUP BY wp_posts.ID ORDER BY wp_posts.post_date DESC LIMIT 0, 10;
+----+-------------+-------------+------------+-------+-----------------------------------------------------------------+---------+---------+-----------------------+--------+----------+---------------------------------+
| id | select_type | table       | partitions | type  | possible_keys                                                   | key     | key_len | ref                   | rows   | filtered | Extra                           |
+----+-------------+-------------+------------+-------+-----------------------------------------------------------------+---------+---------+-----------------------+--------+----------+---------------------------------+
|  1 | SIMPLE      | wp_posts    | NULL       | index | PRIMARY,post_name,type_status_date,post_parent,post_author,guid | PRIMARY | 8       | NULL                  | 268339 |   100.00 | Using temporary; Using filesort |
|  1 | SIMPLE      | wp_postmeta | NULL       | ref   | post_id,meta_key,meta_key_meta_value                            | post_id | 8       | wordpress.wp_posts.ID |     19 |   100.00 | Using where; Not exists         |
+----+-------------+-------------+------------+-------+-----------------------------------------------------------------+---------+---------+-----------------------+--------+----------+---------------------------------+

In contrast, this query checking for a specific value in the declared meta_key only needs to read 89,970 rows, or roughly 70% fewer rows.

/**
 * Querying only for the meta_key and value generates an INNER JOIN clause
 * with a key/value match in WHERE
 *
 * Rows searched: 89,970
 */
$query = new \WP_Query(
	array(
		'suppress_filters' => true,
		'meta_query'       => array(
			array(
				'key'     => '_publicize_pending',
				'value'     => '1',
				'compare' => '!=',
			),
		),
	) );

Generates:

MySQL [wordpress]> EXPLAIN SELECT wp_posts.ID
    -> FROM wp_posts
    -> INNER JOIN wp_postmeta
    -> ON ( wp_posts.ID = wp_postmeta.post_id )
    -> WHERE 1=1
    -> AND ( ( wp_postmeta.meta_key = '_publicize_pending'
    -> AND wp_postmeta.meta_value != '1' ) );
+----+-------------+-------------+------------+--------+--------------------------------------+---------------------+---------+-------------------------------+-------+----------+-------------+
| id | select_type | table       | partitions | type   | possible_keys                        | key                 | key_len | ref                           | rows  | filtered | Extra       |
+----+-------------+-------------+------------+--------+--------------------------------------+---------------------+---------+-------------------------------+-------+----------+-------------+
|  1 | SIMPLE      | wp_postmeta | NULL       | range  | post_id,meta_key,meta_key_meta_value | meta_key_meta_value | 1534    | NULL                          | 89970 |   100.00 | Using where |
|  1 | SIMPLE      | wp_posts    | NULL       | eq_ref | PRIMARY                              | PRIMARY             | 8       | wordpress.wp_postmeta.post_id |     1 |   100.00 | Using index |
+----+-------------+-------------+------------+--------+--------------------------------------+---------------------+---------+-------------------------------+-------+----------+-------------+
2 rows in set, 1 warning (0.001 sec)
@vishalkakadiya
Copy link

vishalkakadiya commented Aug 14, 2021

@moraleida Hi 👋, you guys are doing awesome work with Engineering Practices! 🙌 Thank you for making these awesome docs.

I was recently checking this issue and would like to share my thoughts, not sure if this is a good idea for bigger picture but let me share:

  • NOT EXISTS check: if the user makes a meta_key with appending _1 after the actual meta_key then it will be straight forward query to just check meta_key = _publicize_pending_1, so the end query will be as below:
$query = new \WP_Query(
	array(
		'suppress_filters' => true,
		'meta_key'         => '_publicize_pending_1',
	)
);

The above query will fetch all the posts which are pending for publicizing, and I guess will be somewhat faster than checking with the value. As we all know, in the normal WP environment index is available on meta_key only, not on the meta_value, however the WordPress VIP platform indexing meta_value in the combination of meta_key but in some limit of characters. So I guess we can think of this way as well for this issue, still correct me if I have interrupted in wrong discussions.

Thanks! 🙂

@moraleida
Copy link
Author

@vishalkakadiya I think your solution should work fine. It taps into the same idea that is, always look for something that exists, instead of for something that doesn't. I suggest adding different values for a meta key, but having different meta-keys will most likely work just fine too.

I don't think indexing the meta_value column makes a lot of difference in this case, because the first thing the query will do is join the tables to contain only that meta_key. But then again, if we step into the millions of records within the join, then that might be a good replacement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants