Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I was able to make populate cmd 5-10 times faster. #1186

Closed
makasim opened this issue Jan 3, 2017 · 11 comments · Fixed by #1192
Closed

I was able to make populate cmd 5-10 times faster. #1186

makasim opened this issue Jan 3, 2017 · 11 comments · Fixed by #1192

Comments

@makasim
Copy link
Member

makasim commented Jan 3, 2017

The idea is split the job into smaller chunks and do it in parallel. I used enqueue bundle and RabbitMQ broker to do the job. 5x gain of performance was achieved by using 10 consumers and single ElasticSearch node. 500k entities were indexed within 3 minutes instead of 15 minutes.
It looks same as current populate command, including the progress bar.

Here's the code php-enqueue/enqueue-sandbox#1

Most important classes to look at AppBundle\Async\ ElasticaPopulateProcessor and AppBundle\Elasticsearch\ AsyncProvider.

The question is. Would it be something that we can add to fos elastica bundle as default or optional solution?

Update:

I put the code to a bundle https://github.com/php-enqueue/enqueue-elastica-bundle. It extends fos elastica bundle with messaging features.

@makasim
Copy link
Member Author

makasim commented Jan 5, 2017

I can do it as a stand alone bundle but maybe would you prefer to see it as part of the bundle?

@makasim
Copy link
Member Author

makasim commented Jan 10, 2017

@lsmith77
Copy link
Member

cool .. maybe create a doc PR to reference to this Bundle?

@lsmith77 lsmith77 changed the title I was able to make populate cmd 5-10 times faster. I was able to make populate cmd 5-10 times faster. Jan 10, 2017
@makasim
Copy link
Member Author

makasim commented Jan 10, 2017

@lsmith77 will do!

@Koc
Copy link

Koc commented Jan 10, 2017

@makasim is it works without slowdowns when offset is big? Also is it possible run it without rabbitmq, for example populate command exec N commands using nohup &?

@makasim
Copy link
Member Author

makasim commented Jan 10, 2017

@Koc

for example populate command exec N commands using nohup &?

no it is not possible, the solution is built on top of messaging concept. Though you can use any transport. Currently Amqp and Stomp are supported. They cover a lot of brokers out there including RabbitMQ. In theory something simple like file transport could be added.

is it works without slowdowns when offset is big?

I did not modify any queries, so query performance should be the same as it is now.

@makasim
Copy link
Member Author

makasim commented Jan 10, 2017

messaging allows to distribute the job among different servers, exec + nohup is tied to one server.

@guillaume-perreal
Copy link

Is there any plan for a listener that delegates indexation to the workers ?

@makasim
Copy link
Member Author

makasim commented Jan 11, 2017

@guillaume-perreal what do you mean by "workers"?

@makasim
Copy link
Member Author

makasim commented Jan 17, 2017

tested the FOSElasticaBundle + EnqueueElasticaBundle with filesystem transport. php-enqueue/enqueue-dev#12

It took 5 minutes to index 400k entities (10 consumers). RabbitMQ took 3 minutes. Good results IMO.

Pros:

  1. You dont need any brokers, like rabbitmq
  2. Simple to use

Cons:

  1. Only auto ack mode supported
  2. Local. The work could not be distributed among several servers.

@XWB
Copy link
Member

XWB commented Feb 27, 2017

We will continue in #1192

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants