Preliminary support for `page_above` in base entry collections #1560

ml-evs · 2023-03-14T14:58:52Z

This PR should make it easier to customize which pagination mechanism each backend supports, by adding a class attribute for it (that follows the names of the query parameters).

All supported pagination parameters are now passed down to the collection (with page_cursor and page_below still left as unimplemented for all collections).

Once this PR is merged, we can work to make page_above the default ES pagination mechanism (i.e., the one that is used in next links) and maybe even consider disabling offset-based pagination for ES. It might be cleaner to move the unsupported params attribute down to the collection itself too.

Pass page_above through query param handler
Customisable default pagination mechanism
Skeleton of ES implementation of page_above that maintains current behaviour for now.
Update test cases

codecov · 2023-03-14T19:12:54Z

Codecov Report

Merging #1560 (2700ee7) into master (d779725) will decrease coverage by 0.06%.
The diff coverage is 88.52%.

@@            Coverage Diff             @@
##           master    #1560      +/-   ##
==========================================
- Coverage   91.09%   91.04%   -0.06%     
==========================================
  Files          74       74              
  Lines        4538     4578      +40     
==========================================
+ Hits         4134     4168      +34     
- Misses        404      410       +6

Flag	Coverage Δ
project	`91.04% <88.52%> (-0.06%)`	⬇️
validator	`90.93% <88.52%> (-0.06%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
optimade/server/middleware.py	`95.33% <ø> (+0.63%)`	⬆️
optimade/server/query_params.py	`98.27% <ø> (ø)`
optimade/server/entry_collections/mongo.py	`95.31% <50.00%> (-1.47%)`	⬇️
optimade/server/entry_collections/elasticsearch.py	`95.55% <85.71%> (-1.98%)`	⬇️
...made/server/entry_collections/entry_collections.py	`95.88% <90.00%> (-1.96%)`	⬇️
optimade/__init__.py	`100.00% <100.00%> (ø)`
optimade/server/entry_collections/__init__.py	`100.00% <100.00%> (ø)`
optimade/server/routers/utils.py	`95.93% <100.00%> (-0.04%)`	⬇️
optimade/validator/utils.py	`94.59% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

markus1978 · 2023-03-15T07:40:46Z

optimade/server/entry_collections/elasticsearch.py

+                more_data_available = page_offset + limit < data_returned
+            else:
+                # Needs to be set by some custom elastic query: is the ID the last ID?
+                more_data_available = False


More data is only available, if data_returned == len(results). There is the inevitable chance that you will do a request with empty response.

I do not understand your comment.

I think data_returned is the total number of entries that match the filter, while len(results) is the number of returned entries. So if data_returned == len(results) all the matching entries have been returned and no more data is available.

Just to clarify here, I was working under a potentially wonky assumption that search_after would mangle the total hits for a filter, but perhaps this is not the case, in which case we can still directly use data_returned to set more_data_available (as suggested by @JPBergsma below).

In the documentation of Elasticsearch, they recommand setting "track_total_hits": false to speed things up.
So I would expect that data_returned would give a sane result as we have set "track_total_hits": true.
We could perhaps include the value of data_returned in the next link, so we can set "track_total_hits": false when retrieving the rest of the data.

optimade/server/routers/utils.py

optimade/server/entry_collections/elasticsearch.py

ml-evs · 2023-03-17T11:42:45Z

Hi @markus1978 and @JPBergsma, I think the bones of this are all in now, even if we are not testing page_above for ES.

I think I'd like to merge this PR and then continue working on the actual page_above implementation and tests in another PR.

- Fix potential for infinite loop in broken test

JPBergsma

What I see so far looks good.
Good luck with creating the implementations.

ml-evs force-pushed the ml-evs/support_value_based_pagination branch 4 times, most recently from af6d4b6 to 90fcb3b Compare March 14, 2023 18:07

ml-evs force-pushed the ml-evs/support_value_based_pagination branch from 72d535a to e424b05 Compare March 14, 2023 19:16

ml-evs marked this pull request as ready for review March 14, 2023 19:18

ml-evs requested review from CasperWA, JPBergsma and markus1978 as code owners March 14, 2023 19:18

markus1978 reviewed Mar 15, 2023

View reviewed changes

JPBergsma reviewed Mar 15, 2023

View reviewed changes

optimade/server/entry_collections/elasticsearch.py Outdated Show resolved Hide resolved

ml-evs added 14 commits March 17, 2023 11:42

Enable page_above query param and disable for mongo

4c495e4

Allow customisation of default pagination mechanism at collection level

e56c11d

Skeleton of elasticsearch page_above implementation

83f0534

Add simple test for default pagination

57ec46f

Linting

22b076e

Minor fix

e88c2d9

Move test

b513e85

Fix typo

43f1c35

Check response for pagination test

2f1f1ce

Fix default pagination for ES

3b5b33d

Set default pagination to for ES to allow tests to pass

314bb88

Try to match initial offset behaviour

0c39fbd

Integrate pagination mechanism with new collection method

724975e

- Fix potential for infinite loop in broken test

Rework more_data_available for page_above

2700ee7

ml-evs force-pushed the ml-evs/support_value_based_pagination branch from 654a304 to 2700ee7 Compare March 17, 2023 11:44

ml-evs added the server Issues pertaining to the example server implementation label Mar 17, 2023

JPBergsma approved these changes Mar 17, 2023

View reviewed changes

ml-evs changed the title ~~Support for page_above in base entry collections~~ Preliminary support for page_above in base entry collections Mar 17, 2023

ml-evs merged commit d894c7a into master Mar 17, 2023
12 checks passed

ml-evs deleted the ml-evs/support_value_based_pagination branch March 17, 2023 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preliminary support for `page_above` in base entry collections #1560

Preliminary support for `page_above` in base entry collections #1560

ml-evs commented Mar 14, 2023 •

edited

codecov bot commented Mar 14, 2023 •

edited

markus1978 Mar 15, 2023

JPBergsma Mar 15, 2023

ml-evs Mar 17, 2023 •

edited

JPBergsma Mar 17, 2023

ml-evs commented Mar 17, 2023

JPBergsma left a comment

Preliminary support for page_above in base entry collections #1560

Preliminary support for page_above in base entry collections #1560

Conversation

ml-evs commented Mar 14, 2023 • edited

codecov bot commented Mar 14, 2023 • edited

Codecov Report

markus1978 Mar 15, 2023

Choose a reason for hiding this comment

JPBergsma Mar 15, 2023

Choose a reason for hiding this comment

ml-evs Mar 17, 2023 • edited

Choose a reason for hiding this comment

JPBergsma Mar 17, 2023

Choose a reason for hiding this comment

ml-evs commented Mar 17, 2023

JPBergsma left a comment

Choose a reason for hiding this comment

Preliminary support for `page_above` in base entry collections #1560

Preliminary support for `page_above` in base entry collections #1560

ml-evs commented Mar 14, 2023 •

edited

codecov bot commented Mar 14, 2023 •

edited

ml-evs Mar 17, 2023 •

edited