Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved delete_by_query to not run the wait_task if not appropriate #209

Merged
merged 12 commits into from Aug 24, 2017

Conversation

redox
Copy link
Contributor

@redox redox commented Jun 23, 2017

Fix #204

@raphi
Copy link
Contributor

raphi commented Jun 23, 2017

👍

@coveralls
Copy link

coveralls commented Jun 23, 2017

Coverage Status

Coverage decreased (-0.1%) to 93.241% when pulling bee3fc6 on fix/204 into a9b496f on master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 93.241% when pulling bee3fc6 on fix/204 into a9b496f on master.

@obahareth
Copy link

Hey, in my case I was deleting a large amount of objects (my records are paragraphs from large PDF files, and I don't really have an ID per paragraph stored anywhere) so I easily get more than 1000 objects. The delete by query literally took 15 mins+

Is there any other way we can speed it up for that scenario?

@obahareth
Copy link

@redox Do you have any ideas or suggestions on how to handle a scenario like that?

res = delete_objects(res['hits'].map { |h| h['objectID'] })
ids = res['hits'].map { |h| h['objectID'] }
res = delete_objects(ids)
break if ids.size != 1000 # no need to wait_task if we had less than 1000 objects matching; we won't get more after anyway.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this condition be continue if ids.size = 1000?
If I get 1000 results it most probably means that I'll get another result with another search and there is no need for waiting for the delete task. Am I missing something?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the end the more results I get the longer I wait.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem here is that if you don't wait, the next search() calls might return the same object ids, thus flooding the API with duplicates delete calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to mitigate that would be to first scan the entire result set, and then calls to delete all of them.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, didn't think of that.
Changing implementation to you browse would break compatibility as the API key now needs only search and delete records ACLs. browse implementation would need browse ACL.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raphi By scanning the entire result set do you mean treating it as a paginated search and acquiring all the IDs of the objects to delete and then doing asynchronous deletes? I was about to try something like that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@obahareth yes exactly. @redox what do you think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raphi and @redox, I've played around with this in my own project and had very successful results, I tried deleting ~3000 objects with it and it happened within milliseconds instead of more than 5 minutes. I've opened PR #211 with the solution that worked for me based on @raphi's suggestion.

obahareth added a commit to obahareth/algoliasearch-client-ruby that referenced this pull request Jul 24, 2017
As suggested by @raphi in algolia#209, add a new version of delete_by_query that scans the entire index for all IDs to delete without awaiting delete results like in the old `delete_by_query` (which is now `delete_by_query!`) and then delete all matching objects asynchronously in one call to delete_objects.

Comment:
algolia#209 (comment)
obahareth and others added 10 commits August 17, 2017 12:23
As suggested by @raphi in #209, add a new version of delete_by_query that scans the entire index for all IDs to delete without awaiting delete results like in the old `delete_by_query` (which is now `delete_by_query!`) and then delete all matching objects asynchronously in one call to delete_objects.

Comment:
#209 (comment)
As suggested by @raphi in #209, add a new version of delete_by_query that scans the entire index for all IDs to delete without awaiting delete results like in the old `delete_by_query` (which is now `delete_by_query!`) and then delete all matching objects asynchronously in one call to delete_objects.

Comment:
#209 (comment)
@coveralls
Copy link

coveralls commented Aug 24, 2017

Coverage Status

Coverage decreased (-0.5%) to 92.818% when pulling 535bfe4 on fix/204 into 7231ccb on master.

3 similar comments
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.5%) to 92.818% when pulling 535bfe4 on fix/204 into 7231ccb on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.5%) to 92.818% when pulling 535bfe4 on fix/204 into 7231ccb on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.5%) to 92.818% when pulling 535bfe4 on fix/204 into 7231ccb on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 92.507% when pulling 1c023af on fix/204 into d88d80d on master.

3 similar comments
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 92.507% when pulling 1c023af on fix/204 into d88d80d on master.

@coveralls
Copy link

coveralls commented Aug 24, 2017

Coverage Status

Coverage decreased (-0.1%) to 92.507% when pulling 1c023af on fix/204 into d88d80d on master.

@coveralls
Copy link

coveralls commented Aug 24, 2017

Coverage Status

Coverage decreased (-0.1%) to 92.507% when pulling 1c023af on fix/204 into d88d80d on master.

@coveralls
Copy link

coveralls commented Aug 24, 2017

Coverage Status

Coverage increased (+0.9%) to 93.496% when pulling 1d4754c on fix/204 into d88d80d on master.

1 similar comment
@coveralls
Copy link

coveralls commented Aug 24, 2017

Coverage Status

Coverage increased (+0.9%) to 93.496% when pulling 1d4754c on fix/204 into d88d80d on master.

@raphi
Copy link
Contributor

raphi commented Aug 24, 2017

🙌

@raphi raphi merged commit 736a20e into master Aug 24, 2017
@raphi
Copy link
Contributor

raphi commented Aug 24, 2017

@obahareth I merge your PR here in order to trigger Travis's build

@obahareth
Copy link

Thanks @raphi, I'm sorry there were still issues here and there.

@raphi raphi deleted the fix/204 branch August 24, 2017 14:57
@obahareth
Copy link

obahareth commented Aug 24, 2017

@raphi I didn't get counted as a contributor 😟

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants