Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deep paging #1623

Closed
wants to merge 10 commits into from
Closed

Deep paging #1623

wants to merge 10 commits into from

Conversation

reese-allison
Copy link

Implements deep paging using search_after and Point in Time (PIT). This can be used to page through all results much more cheaply than using the Elasticsearch scan method. PITs are cheaper to open, so this should be safe for user requests, and can be used as a drop in replacement for scan in many cases.

Closes #1329

Copy link

@rkhudov rkhudov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good to merge this PR, since iteration with PIT and search_after is widely uses

Copy link

@rkhudov rkhudov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While iterating throw the generator, I get such error

pit = search._using.open_point_in_time(
AttributeError: 'str' object has no attribute 'open_point_in_time'

@reese-allison
Copy link
Author

While iterating throw the generator, I get such error

pit = search._using.open_point_in_time( AttributeError: 'str' object has no attribute 'open_point_in_time'

Looks like using is set to 'default'

While iterating throw the generator, I get such error

pit = search._using.open_point_in_time( AttributeError: 'str' object has no attribute 'open_point_in_time'

Looks like _using defualts to the string 'default'. I'll update the PR

reese-allison and others added 2 commits October 22, 2022 19:50
Co-authored-by: Rostyslav Khudov <59306666+rkhudov@users.noreply.github.com>
Co-authored-by: Rostyslav Khudov <59306666+rkhudov@users.noreply.github.com>
@reese-allison
Copy link
Author

It is good to merge this PR, since iteration with PIT and search_after is widely uses

I merged in your suggestions.

Copy link

@rkhudov rkhudov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be interesting as well is to add some sort of pagination with search_after and PIT

@reese-allison
Copy link
Author

Yeah, I suppose it would be good to use search_after to page forward and back rather than just scrolling through all results.

@reese-allison
Copy link
Author

What will be interesting as well is to add some sort of pagination with search_after and PIT

The only way I think this would be possible is if we save last_document["sort"] and use it to create a previous/next context. To get the previous page, you would use the previous last_document["sort"]; if you want the next page, you pass the current last_document["sort"]. The only issue with this is that your page context would only last for as long as your PIT is set to expire.

@lucasvc
Copy link

lucasvc commented Jan 23, 2023

Hi @reese-allison, I did a question on the issue #1329.

@Sbacon017
Copy link

Hi! This looks really great, and I was hoping to use it in a project I'm working on. Are there plans to merge this sometime soon? Anything I can do to help push it across the finish line? :)

@pquentin
Copy link
Member

pquentin commented Mar 20, 2024

I've merged main into this pull request and fixed the conflicts. The only CI failure is due to the usage of the walrus operator. It will go away when #1717 is merged, after which we can review this. Thank you!

@reese-allison
Copy link
Author

@pquentin, thanks for getting this updated for me! I haven't looked at it in a while.

@miguelgrinberg
Copy link
Collaborator

I will be looking at this PR along with #806 to try to come up with a general approach to pagination. Thanks.

@miguelgrinberg
Copy link
Collaborator

@reese-allison Thank you so much. Based on your work I have added iterate(), point_in_time() and search_after() methods. The first provides the same functionality as your page(). The other two are supporting methods that can be used directly for more specific needs beyond pagination. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

how can I usge search_after in elasticsearch-dsl 7.1.0
6 participants