Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client: add ability to count the number of matches for a filter using binary search over pagination #1925

Merged
merged 4 commits into from Jan 10, 2024

Conversation

ml-evs
Copy link
Member

@ml-evs ml-evs commented Jan 9, 2024

Closes #1924.

This PR implements a workaround for the client to count the number of results returned by a filter using binary search over pagination. When a database does not return meta->data_returned, for whatever reason, the client will now execute the query with a series of probe page_offset values to find the number of matching results, for example, starting with a query for page 1,000,000, and finding no results, the client will then try 1,000. If a result is found on that page, the window has been narrowed to 1,000 to 1,000,000. The window will be reduced logarithmically until each end of the window has the same approximate power of 10, at which point the average value will be taken, e.g., if the query returned 1,001 entries, the trial values would be:

  • 10^3 -> 10^6 => int(10^4.5)
  • 10^3 -> 10^4.5 => int(10^3.75)
  • 10^3 -> 10^3.75 => int(10^3.375)
  • 10^3 (1000) -> 10^3.375 (2371) => 3371 // 2 = 1685
  • 1000 -> 1685 => 1342
  • 1000 -> 1342 => 1171
  • etc until reaching 1001.

I think this scheme makes more sense than vanilla binary search or exponential search as OPTIMADE queries of this kind are probably either smallish or largeish, without much middle ground...

Caveats:

  • the server must currently implement page_offset; page_number is possible but not yet implemented
  • the code for one API must currently be run asynchronously

PR also adds a verbosity flag -vvv that enables some debug printing on the client.

@ml-evs ml-evs added enhancement New feature or request client Issues/PRs relating to the OPTIMADE client. labels Jan 9, 2024
@ml-evs ml-evs force-pushed the ml-evs/client-binary-search branch from 858629b to 1cac30b Compare January 9, 2024 20:45
Copy link

codecov bot commented Jan 9, 2024

Codecov Report

Attention: 16 lines in your changes are missing coverage. Please review.

Comparison is base (ed27075) 90.81% compared to head (5805ef8) 90.77%.

Files Patch % Lines
optimade/client/client.py 79.22% 16 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1925      +/-   ##
==========================================
- Coverage   90.81%   90.77%   -0.04%     
==========================================
  Files          75       75              
  Lines        4661     4728      +67     
==========================================
+ Hits         4233     4292      +59     
- Misses        428      436       +8     
Flag Coverage Δ
project 90.77% <79.74%> (-0.04%) ⬇️
validator 90.69% <79.74%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ml-evs ml-evs force-pushed the ml-evs/client-binary-search branch from 7f113b8 to df3bd85 Compare January 9, 2024 21:15
@ml-evs ml-evs merged commit 65bb1cb into master Jan 10, 2024
11 of 12 checks passed
@ml-evs ml-evs deleted the ml-evs/client-binary-search branch January 10, 2024 21:08
@ml-evs ml-evs changed the title Client: add ability to count the number of results using binary search over pagination Client: add ability to count the number of matches for a filter using binary search over pagination Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client Issues/PRs relating to the OPTIMADE client. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Client: counting number of matching entries when data_returned is not available
1 participant