Improved delete_by_query to not run the wait_task if not appropriate #209

redox · 2017-06-23T06:42:32Z

Fix #204

raphi · 2017-06-23T07:01:39Z

👍

coveralls · 2017-06-23T07:13:09Z

Coverage decreased (-0.1%) to 93.241% when pulling bee3fc6 on fix/204 into a9b496f on master.

coveralls · 2017-06-23T07:13:09Z

Coverage decreased (-0.1%) to 93.241% when pulling bee3fc6 on fix/204 into a9b496f on master.

obahareth · 2017-06-23T11:33:17Z

Hey, in my case I was deleting a large amount of objects (my records are paragraphs from large PDF files, and I don't really have an ID per paragraph stored anywhere) so I easily get more than 1000 objects. The delete by query literally took 15 mins+

Is there any other way we can speed it up for that scenario?

obahareth · 2017-07-12T12:40:39Z

@redox Do you have any ideas or suggestions on how to handle a scenario like that?

JanPetr · 2017-07-24T09:07:54Z

lib/algolia/index.rb

-        res = delete_objects(res['hits'].map { |h| h['objectID'] })
+        ids = res['hits'].map { |h| h['objectID'] }
+        res = delete_objects(ids)
+        break if ids.size != 1000 # no need to wait_task if we had less than 1000 objects matching; we won't get more after anyway.


Shouldn't this condition be continue if ids.size = 1000?
If I get 1000 results it most probably means that I'll get another result with another search and there is no need for waiting for the delete task. Am I missing something?

So in the end the more results I get the longer I wait.

I think the problem here is that if you don't wait, the next search() calls might return the same object ids, thus flooding the API with duplicates delete calls.

One way to mitigate that would be to first scan the entire result set, and then calls to delete all of them.

Yes, didn't think of that.
Changing implementation to you browse would break compatibility as the API key now needs only search and delete records ACLs. browse implementation would need browse ACL.

@raphi By scanning the entire result set do you mean treating it as a paginated search and acquiring all the IDs of the objects to delete and then doing asynchronous deletes? I was about to try something like that.

@obahareth yes exactly. @redox what do you think?

@raphi and @redox, I've played around with this in my own project and had very successful results, I tried deleting ~3000 objects with it and it happened within milliseconds instead of more than 5 minutes. I've opened PR #211 with the solution that worked for me based on @raphi's suggestion.

@raphi

As suggested by @raphi in algolia#209, add a new version of delete_by_query that scans the entire index for all IDs to delete without awaiting delete results like in the old `delete_by_query` (which is now `delete_by_query!`) and then delete all matching objects asynchronously in one call to delete_objects. Comment: algolia#209 (comment)

@raphi

As suggested by @raphi in #209, add a new version of delete_by_query that scans the entire index for all IDs to delete without awaiting delete results like in the old `delete_by_query` (which is now `delete_by_query!`) and then delete all matching objects asynchronously in one call to delete_objects. Comment: #209 (comment)

Fix #204

@raphi

As suggested by @raphi in #209, add a new version of delete_by_query that scans the entire index for all IDs to delete without awaiting delete results like in the old `delete_by_query` (which is now `delete_by_query!`) and then delete all matching objects asynchronously in one call to delete_objects. Comment: #209 (comment)

coveralls · 2017-08-24T09:05:34Z

Coverage decreased (-0.5%) to 92.818% when pulling 535bfe4 on fix/204 into 7231ccb on master.

coveralls · 2017-08-24T09:05:34Z

Coverage decreased (-0.5%) to 92.818% when pulling 535bfe4 on fix/204 into 7231ccb on master.

coveralls · 2017-08-24T09:05:34Z

Coverage decreased (-0.5%) to 92.818% when pulling 535bfe4 on fix/204 into 7231ccb on master.

coveralls · 2017-08-24T09:05:34Z

Coverage decreased (-0.5%) to 92.818% when pulling 535bfe4 on fix/204 into 7231ccb on master.

coveralls · 2017-08-24T09:32:10Z

Coverage decreased (-0.1%) to 92.507% when pulling 1c023af on fix/204 into d88d80d on master.

coveralls · 2017-08-24T09:32:10Z

Coverage decreased (-0.1%) to 92.507% when pulling 1c023af on fix/204 into d88d80d on master.

coveralls · 2017-08-24T09:32:10Z

Coverage decreased (-0.1%) to 92.507% when pulling 1c023af on fix/204 into d88d80d on master.

coveralls · 2017-08-24T09:32:10Z

Coverage decreased (-0.1%) to 92.507% when pulling 1c023af on fix/204 into d88d80d on master.

…algoliasearch-client-ruby into fix/204

coveralls · 2017-08-24T14:39:54Z

Coverage increased (+0.9%) to 93.496% when pulling 1d4754c on fix/204 into d88d80d on master.

coveralls · 2017-08-24T14:39:54Z

Coverage increased (+0.9%) to 93.496% when pulling 1d4754c on fix/204 into d88d80d on master.

raphi · 2017-08-24T14:53:17Z

🙌

raphi · 2017-08-24T14:54:35Z

@obahareth I merge your PR here in order to trigger Travis's build

obahareth · 2017-08-24T14:56:55Z

Thanks @raphi, I'm sorry there were still issues here and there.

obahareth · 2017-08-24T15:03:50Z

@raphi I didn't get counted as a contributor 😟

redox mentioned this pull request Jun 23, 2017

Make delete_by_query async and add delete_by_query! to run synchronously #208

Closed

JanPetr reviewed Jul 24, 2017

View reviewed changes

obahareth mentioned this pull request Jul 24, 2017

Improve delete_by_query performance for queries that return more than 1000 hits #211

Closed

obahareth and others added 10 commits August 17, 2017 12:23

Add method for sanitizing delete_by_query params

fe522cb

Trigger Travis build

8750d93

Fix put back break condition to avoid crash when no hits

74d546c

Fix webmock and breaking condition

3a2c1a9

Improved delete_by_query to not run the wait_task if not appropriate

411da12

Fix #204

Add method for sanitizing delete_by_query params

48af093

Fix put back break condition to avoid crash when no hits

9e76515

Fix webmock and breaking condition

1c023af

raphi force-pushed the fix/204 branch from 535bfe4 to 1c023af Compare August 24, 2017 08:58

raphi added 2 commits August 24, 2017 14:44

Merge branch 'async_delete_by_query' of https://github.com/obahareth/…

740399f

…algoliasearch-client-ruby into fix/204

Fix spec

1d4754c

raphi merged commit 736a20e into master Aug 24, 2017

raphi deleted the fix/204 branch August 24, 2017 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved delete_by_query to not run the wait_task if not appropriate #209

Improved delete_by_query to not run the wait_task if not appropriate #209

redox commented Jun 23, 2017

raphi commented Jun 23, 2017

coveralls commented Jun 23, 2017 •

edited

coveralls commented Jun 23, 2017

obahareth commented Jun 23, 2017

obahareth commented Jul 12, 2017

JanPetr Jul 24, 2017

JanPetr Jul 24, 2017

raphi Jul 24, 2017

raphi Jul 24, 2017

JanPetr Jul 24, 2017

obahareth Jul 24, 2017

raphi Jul 24, 2017

obahareth Jul 24, 2017

coveralls commented Aug 24, 2017 •

edited

coveralls commented Aug 24, 2017

coveralls commented Aug 24, 2017

coveralls commented Aug 24, 2017

coveralls commented Aug 24, 2017

coveralls commented Aug 24, 2017

coveralls commented Aug 24, 2017 •

edited

coveralls commented Aug 24, 2017 •

edited

coveralls commented Aug 24, 2017 •

edited

coveralls commented Aug 24, 2017 •

edited

raphi commented Aug 24, 2017

raphi commented Aug 24, 2017

obahareth commented Aug 24, 2017

obahareth commented Aug 24, 2017 •

edited

Improved delete_by_query to not run the wait_task if not appropriate #209

Improved delete_by_query to not run the wait_task if not appropriate #209

Conversation

redox commented Jun 23, 2017

raphi commented Jun 23, 2017

coveralls commented Jun 23, 2017 • edited

coveralls commented Jun 23, 2017

obahareth commented Jun 23, 2017

obahareth commented Jul 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Aug 24, 2017 • edited

coveralls commented Aug 24, 2017

coveralls commented Aug 24, 2017

coveralls commented Aug 24, 2017

coveralls commented Aug 24, 2017

coveralls commented Aug 24, 2017

coveralls commented Aug 24, 2017 • edited

coveralls commented Aug 24, 2017 • edited

coveralls commented Aug 24, 2017 • edited

coveralls commented Aug 24, 2017 • edited

raphi commented Aug 24, 2017

raphi commented Aug 24, 2017

obahareth commented Aug 24, 2017

obahareth commented Aug 24, 2017 • edited

coveralls commented Jun 23, 2017 •

edited

coveralls commented Aug 24, 2017 •

edited

coveralls commented Aug 24, 2017 •

edited

coveralls commented Aug 24, 2017 •

edited

coveralls commented Aug 24, 2017 •

edited

coveralls commented Aug 24, 2017 •

edited

obahareth commented Aug 24, 2017 •

edited