New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bulk indexing extremely slow #453

Closed
mustafauysal opened this Issue Feb 21, 2016 · 16 comments

Comments

Projects
None yet
5 participants
@mustafauysal
Contributor

mustafauysal commented Feb 21, 2016

We upgraded to version 1.8 and realized that bulk indexing extremely slow. The reason of this, calling wp_cache_flush() in wp-cli.php

Is there anybody have same problem? (We are using memcached for object caching)

Thanks.

@mustafauysal

This comment has been minimized.

Contributor

mustafauysal commented Feb 22, 2016

Thanks @cmmarslender for 8f5058f#diff-c537e2c20a88117b8e57e904f6e80417 (I didn't realize before)
Hope new version coming soon!

@tlovett1 tlovett1 reopened this Mar 16, 2016

@tlovett1

This comment has been minimized.

Member

tlovett1 commented Mar 16, 2016

@mustafauysal was this never resolved?

@mustafauysal

This comment has been minimized.

Contributor

mustafauysal commented Mar 16, 2016

Nope, It isn't resolved. So we didn't update ElasticPress on production, still running 1.7.

@cmmarslender

This comment has been minimized.

Contributor

cmmarslender commented Mar 16, 2016

Just to be certain.. @mustafauysal Have you tested 1.9 to see if there were any improvements? It was only released a day or two ago.

@mustafauysal

This comment has been minimized.

Contributor

mustafauysal commented Mar 17, 2016

@cmmarslender Today I checked on same machine, at the same time.

Results

ElasticPress 1.7

Number of posts indexed on site 1: 170746
Total time elapsed: 5.972,593
ElasticPress is currently deactivated, activating...
Success: ElasticPress was activated!
Success: Done!

ElasticPress 1.9

Number of posts indexed on site 1: 170746
Total time elapsed: 15.200,681
ElasticPress is currently deactivated, activating...
Success: ElasticPress was activated!
Success: Done!


@cmmarslender

This comment has been minimized.

Contributor

cmmarslender commented Mar 17, 2016

@mustafauysal Running some tests now - Are you using an object caching drop in (object-cache.php)? If so, can you link me to the particular one you are using?

@mustafauysal

This comment has been minimized.

Contributor

mustafauysal commented Mar 18, 2016

Yes, I'm using memcached object cache (Version 2.0.2) – https://wordpress.org/plugins/memcached/

@cmmarslender

This comment has been minimized.

Contributor

cmmarslender commented Mar 29, 2016

Hi @mustafauysal

I've done a bit more looking into this, and came up with the following results:

Indexing in general is slower with object-cache.php file, unless all your posts are currently in cache. Likely, when you are indexing, most of your posts aren't in the object-cache, so things slow down a bit (at this point, I'm assuming WordPress is adding items that were missing from cache as it goes). I'm not sure if there is anything to be done about this, but it's an observation that I was able to replicate consistently.

Even with this difference though, my overall indexing average per post is quite a bit lower than yours seem to be. I have a few ideas on what might be causing this.

For me, about 15% of the indexing time is communicating with the ElasticSearch server - the rest of that is querying MySQL, running any filters for post content, etc. If you have any filters that are running on post_content, or any heavy shortcodes, you could try disabling those during indexing, to see if that helps improve speed at all (as long as you don't need the processed shortcode indexed)

Also, take a look at the "Segments and Merging" section of this page: https://www.elastic.co/blog/performance-considerations-elasticsearch-indexing
One issue that I've run into with larger sites is merges falling behind. If you see INFO level log messages saying "now throttling indexing" in the elasticsearch server log, this could be an issue for you. If you have fast enough disks, or aren't worried about saturating disk i/o during indexing on your elastic search server, you could consider setting the index.store.throttle.type to none, to prevent this throttling. In my experience, this sped things up significantly for me, on larger datasets.

@mustafauysal

This comment has been minimized.

Contributor

mustafauysal commented Apr 1, 2016

@cmmarslender Thanks for detailed info.

We fixed this problem by removing the following parameters.

'cache_results '  => false,
'update_post_meta_cache' => false,
'update_post_term_cache' => false,

If these $args effect another ElasticPress users we might think to remove it, otherwise I can use filter.

Thanks.

@cmmarslender cmmarslender self-assigned this Apr 1, 2016

@cmmarslender cmmarslender added this to the 2.0 milestone Apr 1, 2016

@cmmarslender

This comment has been minimized.

Contributor

cmmarslender commented Apr 4, 2016

Here is what the impact is, from a "time to index" point of view on 8,424 posts:

1 2 3 4 5 6 Average
No Caching 65.736 63.182 67.756 67.75 71.885 63.785 66.68
Allow Caching 46.317 48.572 48.469 45.143 46.335 43.674 46.42

Pretty big change in time to index, and if we're working on a slower ES backend or with more posts, the effect will just be magnified.

I think these changes were added back when we were trying to solve issues of php running out of memory on larger datasets during indexing - the commit is here

@lukaspawlik Do you remember if this was specifically added to address memory issues?

@lukaspawlik

This comment has been minimized.

Collaborator

lukaspawlik commented Apr 5, 2016

@cmmarslender yes - this was specifically added to address memory issues however the final solution was added here 622dade

If we remove

'cache_results '  => false,
'update_post_meta_cache' => false,
'update_post_term_cache' => false,

we may see memory consumption issue back again however it shouldn't be as that significant as it was in the past (as 622dade solves the biggest memory usage). Maybe we could add wp-cli switch to enable / disable using cache during indexing?

@cmmarslender

This comment has been minimized.

Contributor

cmmarslender commented Apr 5, 2016

@tlovett1 @allan23 Thoughts on your preferred way to proceed here?

@tlovett1

This comment has been minimized.

Member

tlovett1 commented Apr 21, 2016

I'm fine with adding a command line parameter to disable caching.

@tuanmh

This comment has been minimized.

Contributor

tuanmh commented Apr 22, 2016

I'm fine with adding a command line parameter to disable caching.

Also would be great if we had these options on admin UI as well.

@tlovett1 tlovett1 removed this from the 2.0 milestone May 25, 2016

@tlovett1

This comment has been minimized.

Member

tlovett1 commented Jul 5, 2016

Would love to see this tested in 2.1 in the develop branch.

@cmmarslender

This comment has been minimized.

Contributor

cmmarslender commented Jul 14, 2016

Tested with and without cache_results and results were very similar to above, with indexing being significantly faster with this removed. Tested memory usage with and without, and overall, we use about 2MB more memory with it removed, but memory usage doesn't grow with each iteration, so overall, my recommendation is to remove this - Pushed f0ef475 that should help significantly with the slowness here.

Opened PR #532 for this

@tlovett1 tlovett1 closed this in f0ef475 Jul 15, 2016

tlovett1 added a commit that referenced this issue Jul 15, 2016

Merge pull request #532 from 10up/fix/453
Allow caching results of WP_Query during index. Fixes #453
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment