Skip to content

Commit bea3fd9

Browse files
committed
Improve 'all' tab pagination to handle edge cases
Why these changes are being introduced: The zipper merge we implemented naively queries n/2 results from each API and interleaves them, where n is the per-page value. This works if both APIs return many results, but it can cause problems in smaller, unbalanced result sets. For example, the query term `doc edgerton` returns 50 Primo results and 4 TIMDEX results. Page 1 only shows 14 results (4 TIMDEX and 10 Primo), and each subsequent page returns only 10 (all Primo). Relevant ticket(s): - [USE-179](https://mitlibraries.atlassian.net/browse/USE-179) How this addresses that need: This implements more sophisticated logic that first checks the number of hits returned by each API and passes that, along with the pagination information, to a Merged Search Paginator class. This service object develops a 'merge plan', calculates API offsets, and merges the results for each page. Queries on the 'all' tab now fetch twice from each API: once to determine the total number of hits for the Merged Search Paginator then again to fetch results at the appropriate offset. While hardly ideal, this was the only option I could figure to avoid losing results. I limited these extra calls to queries beyond page 1, which is the only case where they are needed. Side effects of this change: * We now clear cache before each search controller test. This was done to avoid odd test behavior, but I ran the suite 50 times without any issues, so it might be excessively cautious. * The search controller continues to grow with this new logic. I tried to split things into multiple helper methods, so if we want to move more things to service objects later, it might be easier to do so. * A failing cassette has been replaced with a mock.
1 parent f5168a2 commit bea3fd9

File tree

5 files changed

+483
-171
lines changed

5 files changed

+483
-171
lines changed

app/controllers/search_controller.rb

Lines changed: 105 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -88,48 +88,117 @@ def load_timdex_results
8888
end
8989

9090
def load_all_results
91-
# Fetch results from both APIs in parallel
92-
primo_data, timdex_data = fetch_all_data
91+
current_page = @enhanced_query[:page] || 1
92+
per_page = ENV.fetch('RESULTS_PER_PAGE', '20').to_i
93+
data = if current_page.to_i == 1
94+
fetch_all_tab_first_page(current_page, per_page)
95+
else
96+
fetch_all_tab_deeper_pages(current_page, per_page)
97+
end
9398

94-
# Combine errors from both APIs
95-
@errors = combine_errors(primo_data[:errors], timdex_data[:errors])
99+
@results = data[:results]
100+
@errors = data[:errors]
101+
@pagination = data[:pagination]
102+
@show_primo_continuation = data[:show_primo_continuation]
103+
end
96104

97-
# Zipper merge results from both APIs
98-
@results = merge_results(primo_data[:results], timdex_data[:results])
105+
def fetch_all_tab_first_page(current_page, per_page)
106+
primo_data, timdex_data = parallel_fetch({ offset: 0, per_page: per_page }, { offset: 0, per_page: per_page })
99107

100-
# Use Analyzer for combined pagination calculation
101-
@pagination = Analyzer.new(@enhanced_query, timdex_data[:hits], :all,
102-
primo_data[:hits]).pagination
108+
paginator = build_paginator_from_data(primo_data, timdex_data, current_page, per_page)
103109

104-
# Handle primo continuation for high page numbers
105-
@show_primo_continuation = primo_data[:show_continuation] || false
110+
assemble_all_tab_result(paginator, primo_data, timdex_data, current_page, per_page)
106111
end
107112

108-
def fetch_all_data
109-
# Parallel fetching from both APIs
110-
primo_thread = Thread.new { fetch_primo_data }
111-
timdex_thread = Thread.new { fetch_timdex_data }
113+
def fetch_all_tab_deeper_pages(current_page, per_page)
114+
primo_summary, timdex_summary = parallel_fetch({ offset: 0, per_page: 1 }, { offset: 0, per_page: 1 })
115+
116+
paginator = build_paginator_from_data(primo_summary, timdex_summary, current_page, per_page)
117+
118+
primo_data, timdex_data = fetch_all_tab_page_chunks(paginator)
119+
120+
assemble_all_tab_result(paginator, primo_data, timdex_data, current_page, per_page, deeper: true)
121+
end
122+
123+
# Launch parallel fetch threads for Primo and Timdex and return their data
124+
def parallel_fetch(primo_opts = {}, timdex_opts = {})
125+
primo_thread = Thread.new { fetch_primo_data(**primo_opts) }
126+
timdex_thread = Thread.new { fetch_timdex_data(**timdex_opts) }
112127

113128
[primo_thread.value, timdex_thread.value]
114129
end
115130

131+
# Build a paginator from raw API response data
132+
def build_paginator_from_data(primo_data, timdex_data, current_page, per_page)
133+
primo_total = primo_data[:hits] || 0
134+
timdex_total = timdex_data[:hits] || 0
135+
136+
MergedSearchPaginator.new(
137+
primo_total: primo_total,
138+
timdex_total: timdex_total,
139+
current_page: current_page,
140+
per_page: per_page
141+
)
142+
end
143+
144+
# For deeper pages, compute merge_plan and api_offsets, then conditionally fetch page chunks
145+
def fetch_all_tab_page_chunks(paginator)
146+
merge_plan = paginator.merge_plan
147+
primo_count = merge_plan.count(:primo)
148+
timdex_count = merge_plan.count(:timdex)
149+
primo_offset, timdex_offset = paginator.api_offsets
150+
151+
primo_thread = primo_count > 0 ? Thread.new { fetch_primo_data(offset: primo_offset, per_page: primo_count) } : nil
152+
timdex_thread = if timdex_count > 0
153+
Thread.new do
154+
fetch_timdex_data(offset: timdex_offset, per_page: timdex_count)
155+
end
156+
end
157+
158+
primo_data = if primo_thread
159+
primo_thread.value
160+
else
161+
{ results: [], errors: nil, hits: paginator.primo_total, show_continuation: false }
162+
end
163+
164+
timdex_data = if timdex_thread
165+
timdex_thread.value
166+
else
167+
{ results: [], errors: nil, hits: paginator.timdex_total }
168+
end
169+
170+
[primo_data, timdex_data]
171+
end
172+
173+
# Assemble the final result hash from paginator and API data
174+
def assemble_all_tab_result(paginator, primo_data, timdex_data, current_page, per_page, deeper: false)
175+
primo_total = primo_data[:hits] || 0
176+
timdex_total = timdex_data[:hits] || 0
177+
178+
merged = paginator.merge_results(primo_data[:results] || [], timdex_data[:results] || [])
179+
errors = combine_errors(primo_data[:errors], timdex_data[:errors])
180+
pagination = Analyzer.new(@enhanced_query, timdex_total, :all, primo_total).pagination
181+
182+
show_primo_continuation = if deeper
183+
page_offset = (current_page - 1) * per_page
184+
primo_data[:show_continuation] || (page_offset >= Analyzer::PRIMO_MAX_OFFSET)
185+
else
186+
primo_data[:show_continuation]
187+
end
188+
189+
{ results: merged, errors: errors, pagination: pagination, show_primo_continuation: show_primo_continuation }
190+
end
191+
116192
def combine_errors(*error_arrays)
117193
all_errors = error_arrays.compact.flatten
118194
all_errors.any? ? all_errors : nil
119195
end
120196

121-
def merge_results(primo_results, timdex_results)
122-
(primo_results || []).zip(timdex_results || []).flatten.compact
123-
end
124-
125-
def fetch_primo_data
197+
def fetch_primo_data(offset: nil, per_page: nil)
198+
# Default to current page if not provided
126199
current_page = @enhanced_query[:page] || 1
127-
per_page = if @active_tab == 'all'
128-
ENV.fetch('RESULTS_PER_PAGE', '20').to_i / 2
129-
else
130-
ENV.fetch('RESULTS_PER_PAGE', '20').to_i
131-
end
132-
offset = (current_page - 1) * per_page
200+
per_page ||= ENV.fetch('RESULTS_PER_PAGE', '20').to_i
201+
offset ||= (current_page - 1) * per_page
133202

134203
# Check if we're beyond Primo API limits before making the request.
135204
if offset >= Analyzer::PRIMO_MAX_OFFSET
@@ -139,7 +208,7 @@ def fetch_primo_data
139208
primo_response = query_primo(per_page, offset)
140209
hits = primo_response.dig('info', 'total') || 0
141210
results = NormalizePrimoResults.new(primo_response, @enhanced_query[:q]).normalize
142-
pagination = Analyzer.new(@enhanced_query, hits , :primo).pagination
211+
pagination = Analyzer.new(@enhanced_query, hits, :primo).pagination
143212

144213
# Handle empty results from Primo API. Sometimes Primo will return no results at a given offset,
145214
# despite claiming in the initial query that more are available. This happens randomly and
@@ -151,8 +220,9 @@ def fetch_primo_data
151220
if results.empty?
152221
docs = primo_response['docs'] if primo_response.is_a?(Hash)
153222
if docs.nil? || docs.empty?
154-
# Only show continuation for pagination scenarios (page > 1), not for searches with no results
155-
show_continuation = true if current_page > 1
223+
# Only show continuation for pagination scenarios (where offset is present), not for
224+
# searches with no results
225+
show_continuation = true if offset > 0
156226
else
157227
errors = [{ 'message' => 'No more results available at this page number.' }]
158228
end
@@ -164,19 +234,10 @@ def fetch_primo_data
164234
{ results: [], pagination: {}, errors: handle_primo_errors(e), show_continuation: false, hits: 0 }
165235
end
166236

167-
def fetch_timdex_data
168-
# For all tab, modify query to use half page size
169-
if @active_tab == 'all'
170-
per_page = ENV.fetch('RESULTS_PER_PAGE', '20').to_i / 2
171-
page = @enhanced_query[:page] || 1
172-
from_offset = ((page - 1) * per_page).to_s
173-
174-
query_builder = QueryBuilder.new(@enhanced_query)
175-
query = query_builder.query
176-
query['from'] = from_offset
177-
else
178-
query = QueryBuilder.new(@enhanced_query).query
179-
end
237+
def fetch_timdex_data(offset: nil, per_page: nil)
238+
query = QueryBuilder.new(@enhanced_query).query
239+
query['from'] = offset.to_s if offset
240+
query['size'] = per_page.to_s if per_page
180241

181242
response = query_timdex(query)
182243
errors = extract_errors(response)
@@ -223,7 +284,8 @@ def query_timdex(query)
223284

224285
def query_primo(per_page, offset)
225286
# We generate unique cache keys to avoid naming collisions.
226-
cache_key = generate_cache_key(@enhanced_query)
287+
# Include per_page and offset in the cache key to ensure pagination works correctly.
288+
cache_key = generate_cache_key(@enhanced_query.merge(per_page: per_page, offset: offset))
227289

228290
Rails.cache.fetch("#{cache_key}/primo", expires_in: 12.hours) do
229291
primo_search = PrimoSearch.new
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# frozen_string_literal: true
2+
3+
# MergedSearchPaginator encapsulates stateless merged pagination logic for combining two API result sets.
4+
# It calculates the merge plan, API offsets, and merges the results for a given page.
5+
class MergedSearchPaginator
6+
attr_reader :primo_total, :timdex_total, :current_page, :per_page
7+
8+
def initialize(primo_total:, timdex_total:, current_page:, per_page:)
9+
@primo_total = primo_total
10+
@timdex_total = timdex_total
11+
@current_page = current_page
12+
@per_page = per_page
13+
end
14+
15+
# Returns an array of :primo and :timdex symbols for the merged result order on this page
16+
def merge_plan
17+
total_results = primo_total + timdex_total
18+
start_index = (current_page - 1) * per_page
19+
end_index = [start_index + per_page, total_results].min
20+
plan = []
21+
primo_used = 0
22+
timdex_used = 0
23+
i = 0
24+
while i < end_index
25+
if primo_used < primo_total && (timdex_used >= timdex_total || primo_used <= timdex_used)
26+
source = :primo
27+
primo_used += 1
28+
elsif timdex_used < timdex_total
29+
source = :timdex
30+
timdex_used += 1
31+
end
32+
plan << source if i >= start_index
33+
i += 1
34+
end
35+
plan
36+
end
37+
38+
# Returns [primo_offset, timdex_offset] for the start of this page
39+
def api_offsets
40+
start_index = (current_page - 1) * per_page
41+
primo_offset = 0
42+
timdex_offset = 0
43+
i = 0
44+
while i < start_index
45+
if primo_offset < primo_total && (timdex_offset >= timdex_total || primo_offset <= timdex_offset)
46+
primo_offset += 1
47+
elsif timdex_offset < timdex_total
48+
timdex_offset += 1
49+
else
50+
break
51+
end
52+
i += 1
53+
end
54+
[primo_offset, timdex_offset]
55+
end
56+
57+
# Merges two result arrays according to the merge plan
58+
def merge_results(primo_results, timdex_results)
59+
merged = []
60+
primo_idx = 0
61+
timdex_idx = 0
62+
merge_plan.each do |source|
63+
if source == :primo
64+
merged << primo_results[primo_idx] if primo_idx < primo_results.length
65+
primo_idx += 1
66+
else
67+
merged << timdex_results[timdex_idx] if timdex_idx < timdex_results.length
68+
timdex_idx += 1
69+
end
70+
end
71+
merged
72+
end
73+
end

0 commit comments

Comments
 (0)