Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preliminary support for page_above in base entry collections #1560

Merged
merged 14 commits into from Mar 17, 2023

Conversation

ml-evs
Copy link
Member

@ml-evs ml-evs commented Mar 14, 2023

This PR should make it easier to customize which pagination mechanism each backend supports, by adding a class attribute for it (that follows the names of the query parameters).

All supported pagination parameters are now passed down to the collection (with page_cursor and page_below still left as unimplemented for all collections).

Once this PR is merged, we can work to make page_above the default ES pagination mechanism (i.e., the one that is used in next links) and maybe even consider disabling offset-based pagination for ES. It might be cleaner to move the unsupported params attribute down to the collection itself too.

  • Pass page_above through query param handler
  • Customisable default pagination mechanism
  • Skeleton of ES implementation of page_above that maintains current behaviour for now.
  • Update test cases

@ml-evs ml-evs force-pushed the ml-evs/support_value_based_pagination branch 4 times, most recently from af6d4b6 to 90fcb3b Compare March 14, 2023 18:07
@codecov
Copy link

codecov bot commented Mar 14, 2023

Codecov Report

Merging #1560 (2700ee7) into master (d779725) will decrease coverage by 0.06%.
The diff coverage is 88.52%.

@@            Coverage Diff             @@
##           master    #1560      +/-   ##
==========================================
- Coverage   91.09%   91.04%   -0.06%     
==========================================
  Files          74       74              
  Lines        4538     4578      +40     
==========================================
+ Hits         4134     4168      +34     
- Misses        404      410       +6     
Flag Coverage Δ
project 91.04% <88.52%> (-0.06%) ⬇️
validator 90.93% <88.52%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
optimade/server/middleware.py 95.33% <ø> (+0.63%) ⬆️
optimade/server/query_params.py 98.27% <ø> (ø)
optimade/server/entry_collections/mongo.py 95.31% <50.00%> (-1.47%) ⬇️
optimade/server/entry_collections/elasticsearch.py 95.55% <85.71%> (-1.98%) ⬇️
...made/server/entry_collections/entry_collections.py 95.88% <90.00%> (-1.96%) ⬇️
optimade/__init__.py 100.00% <100.00%> (ø)
optimade/server/entry_collections/__init__.py 100.00% <100.00%> (ø)
optimade/server/routers/utils.py 95.93% <100.00%> (-0.04%) ⬇️
optimade/validator/utils.py 94.59% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@ml-evs ml-evs force-pushed the ml-evs/support_value_based_pagination branch from 72d535a to e424b05 Compare March 14, 2023 19:16
@ml-evs ml-evs marked this pull request as ready for review March 14, 2023 19:18
more_data_available = page_offset + limit < data_returned
else:
# Needs to be set by some custom elastic query: is the ID the last ID?
more_data_available = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More data is only available, if data_returned == len(results). There is the inevitable chance that you will do a request with empty response.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand your comment.

I think data_returned is the total number of entries that match the filter, while len(results) is the number of returned entries. So if data_returned == len(results) all the matching entries have been returned and no more data is available.

Copy link
Member Author

@ml-evs ml-evs Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify here, I was working under a potentially wonky assumption that search_after would mangle the total hits for a filter, but perhaps this is not the case, in which case we can still directly use data_returned to set more_data_available (as suggested by @JPBergsma below).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the documentation of Elasticsearch, they recommand setting "track_total_hits": false to speed things up.
So I would expect that data_returned would give a sane result as we have set "track_total_hits": true.
We could perhaps include the value of data_returned in the next link, so we can set "track_total_hits": false when retrieving the rest of the data.

optimade/server/routers/utils.py Outdated Show resolved Hide resolved
optimade/server/routers/utils.py Outdated Show resolved Hide resolved
@ml-evs
Copy link
Member Author

ml-evs commented Mar 17, 2023

Hi @markus1978 and @JPBergsma, I think the bones of this are all in now, even if we are not testing page_above for ES.

I think I'd like to merge this PR and then continue working on the actual page_above implementation and tests in another PR.

@ml-evs ml-evs force-pushed the ml-evs/support_value_based_pagination branch from 654a304 to 2700ee7 Compare March 17, 2023 11:44
@ml-evs ml-evs added the server Issues pertaining to the example server implementation label Mar 17, 2023
Copy link
Contributor

@JPBergsma JPBergsma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I see so far looks good.
Good luck with creating the implementations.

@ml-evs ml-evs changed the title Support for page_above in base entry collections Preliminary support for page_above in base entry collections Mar 17, 2023
@ml-evs ml-evs merged commit d894c7a into master Mar 17, 2023
12 checks passed
@ml-evs ml-evs deleted the ml-evs/support_value_based_pagination branch March 17, 2023 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
server Issues pertaining to the example server implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants