-
Notifications
You must be signed in to change notification settings - Fork 0
Improve 'all' tab pagination to handle edge cases #290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
bea3fd9 to
7e80c7d
Compare
7e80c7d to
3584c1b
Compare
1e7cf2d to
91ad3c3
Compare
|
@JPrevost Let me know what you think of the refactor I just pushed up. I know we talked about just adding caching, but this felt like a could opportunity to move the 'all' tab logic to a service and slim down the controller. I find it a bit easier to follow, and the decoupling will become useful if we do move to an external orchestrator. I guess the downside is now we have a 200+ line service. 🙃 I left this a separate commit in case you wanted to see the differences between the first and second efforts. Happy to squash before review instead if that's easier. |
Why these changes are being introduced: The zipper merge we implemented naively queries n/2 results from each API and interleaves them, where n is the per-page value. This works if both APIs return many results, but it can cause problems in smaller, unbalanced result sets. For example, the query term `doc edgerton` returns 50 Primo results and 4 TIMDEX results. Page 1 only shows 14 results (4 TIMDEX and 10 Primo), and each subsequent page returns only 10 (all Primo). Relevant ticket(s): - [USE-179](https://mitlibraries.atlassian.net/browse/USE-179) How this addresses that need: This implements more sophisticated logic that first checks the number of hits returned by each API and passes that, along with the pagination information, to a Merged Search Paginator class. This service object develops a 'merge plan', calculates API offsets, and merges the results for each page. Queries on the 'all' tab now fetch twice from each API: once to determine the total number of hits for the Merged Search Paginator then again to fetch results at the appropriate offset. While hardly ideal, this was the only option I could figure to avoid losing results. I limited these extra calls to queries beyond page 1, which is the only case where they are needed. Side effects of this change: * We now clear cache before each search controller test. This was done to avoid odd test behavior, but I ran the suite 50 times without any issues, so it might be excessively cautious. * The search controller continues to grow with this new logic. I tried to split things into multiple helper methods, so if we want to move more things to service objects later, it might be easier to do so. * A failing cassette has been replaced with a mock.
Why these changes are being introduced: In discussions of the PR in review for USE-179, we determined that the proposed pagination improvements could be more efficient. We had also determined that the code changes were difficult to follow, and could use better documentation. Relevant ticket(s): - USE-179 How this addresses that need: This commit caches the page 1 'summary' API calls, which we use to gather the hit counts from each API to calculate pagination on deeper pages. It also abstracts the 'all' tab code to a 'Merged Search Service', mirroring the design pattern of the Merged Search Paginator, and adds docstrings to the methods in that service. Side effects of this change: We are still making two API calls on deeper pages when the hit totals are not cached. I could not find a workaround to this while still supporting nonlinear pagination. However, the vast majority of users (even bots, presumably) will begin their search at page 1, so hopefully this is a rare occurrence.
91ad3c3 to
0106e5f
Compare
Pull Request Test Coverage Report for Build 19934295824Details
💛 - Coveralls |
Why these changes are being introduced:
The zipper merge we implemented naively queries n/2 results from each API and interleaves them, where n is the per-page value. This works if both APIs return many results, but it can cause problems in smaller, unbalanced result sets.
For example, the query term
doc edgertonreturns 50 Primo results and 4 TIMDEX results. Page 1 only shows 14 results (4 TIMDEX and 10 Primo), and each subsequent page returns only 10 (all Primo).Relevant ticket(s):
How this addresses that need:
This implements more sophisticated logic that first checks the number of hits returned by each API and passes that, along with the pagination information, to a Merged Search Paginator class. This service object develops a 'merge plan', calculates API offsets, and merges the results for each page.
Queries on the 'all' tab now fetch twice from each API: once to determine the total number of hits for the Merged Search Paginator then again to fetch results at the appropriate offset. While hardly ideal, this was the only option I could figure to avoid losing results. I limited these extra calls to queries beyond page 1, which is the only case where they are needed.
Side effects of this change:
Developer
Accessibility
New ENV
Approval beyond code review
Additional context needed to review
This is a pretty unwieldy changeset, so please reach out if you have questions!
Code Reviewer
Code
added technical debt.
Documentation
(not just this pull request message).
Testing