Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.4.3: prevent DB_Driver_WPDB::insert_record() from inserting empty meta values #1080

Closed
lkraav opened this issue Apr 16, 2020 · 8 comments · Fixed by #1307 · May be fixed by #1312
Closed

3.4.3: prevent DB_Driver_WPDB::insert_record() from inserting empty meta values #1080

lkraav opened this issue Apr 16, 2020 · 8 comments · Fixed by #1307 · May be fixed by #1312

Comments

@lkraav
Copy link
Contributor

lkraav commented Apr 16, 2020

My platform Stream database is growing large, wp_stream_meta contains 63M rows.

Bug Report

Analysis with

SELECT count(*) AS `count` FROM `wp_stream_meta` WHERE `wp_stream_meta`.`meta_value` = ''

shows about 20-25% are rows with empty meta_value, which to me looks like pure database bloat.

Expected Behavior

wp_stream_meta should only contain rows with non-empty meta_value.

On read, any such parameters can simply be displayed empty in code.

Actual Behavior

Noticeable amount of wp_stream_meta table is rows with empty meta_value.

Example data log:

[22-Apr-2020 11:27:04 UTC] insert_meta: empty value, $key user_meta                                                                                                                            
[22-Apr-2020 11:27:05 UTC] insert_meta: empty value, $key user_name                                                                                                                            
[22-Apr-2020 11:27:05 UTC] insert_meta: empty value, $key user_id                                                                                                                              
[22-Apr-2020 11:27:05 UTC] insert_meta: empty value, $key user_meta                                                                                                                            
[22-Apr-2020 11:27:05 UTC] insert_meta: empty value, $key user_name                                                                                                                            
[22-Apr-2020 11:27:05 UTC] insert_meta: empty value, $key user_id
[22-Apr-2020 11:27:05 UTC] insert_meta: empty value, $key user_meta                                                                                                                            
[22-Apr-2020 11:27:06 UTC] insert_meta: empty value, $key user_meta                                                                                                                            
[22-Apr-2020 11:27:08 UTC] insert_meta: empty value, $key user_name                                                                                                                            
[22-Apr-2020 11:27:08 UTC] insert_meta: empty value, $key user_id
[22-Apr-2020 11:27:08 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:08 UTC] insert_meta: empty value, $key user_name
[22-Apr-2020 11:27:08 UTC] insert_meta: empty value, $key user_id
[22-Apr-2020 11:27:08 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:09 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:09 UTC] insert_meta: empty value, $key user_name
[22-Apr-2020 11:27:09 UTC] insert_meta: empty value, $key is_spam
[22-Apr-2020 11:27:09 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:09 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:09 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:09 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:28 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:29 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:29 UTC] insert_meta: empty value, $key user_name
[22-Apr-2020 11:27:29 UTC] insert_meta: empty value, $key user_id
[22-Apr-2020 11:27:29 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:29 UTC] insert_meta: empty value, $key user_name
[22-Apr-2020 11:27:29 UTC] insert_meta: empty value, $key user_id
[22-Apr-2020 11:27:29 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:30 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:31 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:31 UTC] insert_meta: empty value, $key user_name
[22-Apr-2020 11:27:31 UTC] insert_meta: empty value, $key is_spam
[22-Apr-2020 11:27:31 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:31 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:31 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:31 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:39 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:41 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:41 UTC] insert_meta: empty value, $key user_name
[22-Apr-2020 11:27:41 UTC] insert_meta: empty value, $key user_id
[22-Apr-2020 11:27:41 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:42 UTC] insert_meta: empty value, $key user_name
[22-Apr-2020 11:27:42 UTC] insert_meta: empty value, $key user_id
[22-Apr-2020 11:27:42 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:42 UTC] insert_meta: empty value, $key user_name
[22-Apr-2020 11:27:42 UTC] insert_meta: empty value, $key user_id
[22-Apr-2020 11:27:42 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:44 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:44 UTC] insert_meta: empty value, $key user_name
[22-Apr-2020 11:27:44 UTC] insert_meta: empty value, $key is_spam
[22-Apr-2020 11:27:44 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:44 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:44 UTC] insert_meta: empty value, $key user_meta
[22-Apr-2020 11:27:44 UTC] insert_meta: empty value, $key user_meta

Empty meta value connector analysis:

MariaDB> SELECT `wp_stream`.`connector` AS `connector`, count(*) AS `count` FROM `wp_stream_meta` LEFT JOIN `wp_stream` `wp_stream` ON `wp_stream_meta`.`record_id` = `wp_st
ream`.`ID` WHERE `wp_stream_meta`.`meta_value` = '' GROUP BY `wp_stream`.`connector` ORDER BY `count` DESC;                                                                                    
+-----------------------------------+---------+
| connector                         | count   |
+-----------------------------------+---------+
| comments                          | 9474347 |
| posts-viewed                      | 3549141 |
| settings                          |  437761 |
| users                             |  421972 |
| posts                             |  180445 |
| gravityforms                      |  151214 |
| media                             |    6104 |
| wordpressseo                      |    5229 |
| userswitching                     |    4057 |
| woocommerce                       |    2979 |
| installer                         |    1219 |
| taxonomies                        |     938 |
| menus                             |     455 |
| widgets                           |     402 |
| woocommerce-memberships-for-teams |     175 |
| NULL                              |      98 |
| buddypress                        |      38 |
| slack-memberships-api             |       4 |
+-----------------------------------+---------+
18 rows in set (5 min 35.921 sec)

Steps to Reproduce the Problem

  1. Run query at the top of opening comment.

System Information

  • 3.4.3
  • 5.4
  • 7.3

Solution proposal

Perhaps this code block

// Insert record meta
foreach ( (array) $meta as $key => $vals ) {
foreach ( (array) $vals as $val ) {
$this->insert_meta( $record_id, $key, $val );
}
}

could have a simple if ( $val ) check before insert_meta() call?

@kasparsd
Copy link
Contributor

Thanks for reporting the issue @lkraav!

I wonder if this is somewhat related to #1056 -- would the regular cleanup get rid of those entries?

Skipping entries with '' === $val sounds like a good approach to me.

@lkraav
Copy link
Contributor Author

lkraav commented Apr 16, 2020

I wonder if this is somewhat related to #1056 -- would the regular cleanup get rid of those entries?

I have disabled cleanup - this platform's usage history is to be stamped for eternity.

@lkraav
Copy link
Contributor Author

lkraav commented Apr 18, 2020

I wonder if insert_meta() could or should also be batched? I think it's currently doing multiple separate INSERTs?

@lkraav
Copy link
Contributor Author

lkraav commented Apr 22, 2020

https://wordpress.stackexchange.com/questions/136613/wpdb-insert-multiple-record-at-once this seems relevant for batching INSERTs. It looks like $wpdb->insert() does not support batching natively, but we craft a multi-VALUE query instead:

$wpdb->query("INSERT INTO settings-table
            (option_name, option_value, option_created, option_edit, option_user)
            VALUES
            ('name-1', 'val-1', current_time('mysql'), current_time('mysql'), 'user-1'),
            ('name-2', 'val-2', current_time('mysql'), current_time('mysql'), 'user-2'),
            ('name-3', 'val-3',  current_time('mysql'), current_time('mysql'), 'user-3')")

@lkraav
Copy link
Contributor Author

lkraav commented Apr 22, 2020

Added example key list with empty values in opening comment.

@lkraav
Copy link
Contributor Author

lkraav commented May 3, 2020

I monkey-patched my live Stream copy to avoid inserting empty meta values something like 1 week ago.

Zero ill effects, everything still works.

I think it's probably safe to do.

@kasparsd
Copy link
Contributor

kasparsd commented May 4, 2020

Thank you for testing @lkraav!

What are your thoughts on #1038 -- it touches the same lines and I'm not sure why the meta naming logic is like it is right now. Can you please share your opinion on that changeset?

@lkraav
Copy link
Contributor Author

lkraav commented Feb 20, 2022

What are your thoughts on #1038 -- it touches the same lines and I'm not sure why the meta naming logic is like it is right now. Can you please share your opinion on that changeset?

I think these are 2 separate issues.

#1038 doesn't alleviate massively bloating the database with empty values in any way, but does seem to have located another bug that also needs fixing.

lkraav added a commit to lkraav/stream that referenced this issue Feb 20, 2022
lkraav added a commit to lkraav/stream that referenced this issue Feb 20, 2022
lkraav added a commit to lkraav/stream that referenced this issue Feb 22, 2022
kasparsd added a commit that referenced this issue Mar 8, 2022
Prevent inserting empty meta values, fix #1080
kasparsd added a commit that referenced this issue Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants