Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
bulk indexing extremely slow #453
@cmmarslender Today I checked on same machine, at the same time.
I've done a bit more looking into this, and came up with the following results:
Indexing in general is slower with object-cache.php file, unless all your posts are currently in cache. Likely, when you are indexing, most of your posts aren't in the object-cache, so things slow down a bit (at this point, I'm assuming WordPress is adding items that were missing from cache as it goes). I'm not sure if there is anything to be done about this, but it's an observation that I was able to replicate consistently.
Even with this difference though, my overall indexing average per post is quite a bit lower than yours seem to be. I have a few ideas on what might be causing this.
For me, about 15% of the indexing time is communicating with the ElasticSearch server - the rest of that is querying MySQL, running any filters for post content, etc. If you have any filters that are running on post_content, or any heavy shortcodes, you could try disabling those during indexing, to see if that helps improve speed at all (as long as you don't need the processed shortcode indexed)
Also, take a look at the "Segments and Merging" section of this page: https://www.elastic.co/blog/performance-considerations-elasticsearch-indexing
@cmmarslender Thanks for detailed info.
We fixed this problem by removing the following parameters.
If these $args effect another ElasticPress users we might think to remove it, otherwise I can use filter.
Here is what the impact is, from a "time to index" point of view on 8,424 posts:
Pretty big change in time to index, and if we're working on a slower ES backend or with more posts, the effect will just be magnified.
I think these changes were added back when we were trying to solve issues of php running out of memory on larger datasets during indexing - the commit is here
@lukaspawlik Do you remember if this was specifically added to address memory issues?
If we remove
we may see memory consumption issue back again however it shouldn't be as that significant as it was in the past (as 622dade solves the biggest memory usage). Maybe we could add wp-cli switch to enable / disable using cache during indexing?
Tested with and without
Opened PR #532 for this