Fix: return next=None on last page of search results#377
Fix: return next=None on last page of search results#377jamditis wants to merge 2 commits intoMuckRock:masterfrom
Conversation
For page-number pagination, request per_page + 1 rows from Solr and use the extra row's presence to determine if more results exist. This is more reliable than using results.hits which can be approximate. The cursor pagination path is unchanged — Solr's nextCursorMark equality check already correctly detects the last page. Fixes MuckRock#372
There was a problem hiding this comment.
Pull request overview
This PR fixes an issue where v1.0 (page-number) search pagination could return a next URL that points to an empty page by avoiding reliance on Solr’s potentially inexact hit counts.
Changes:
- For v1.0 pagination, request
per_page + 1rows from Solr and use the extra row to determine whethernextshould beNone. - Truncate the extra probe row before formatting/returning results.
- Add tests asserting
next=Noneon the last page and when results exactly fill a page.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
documentcloud/documents/search.py |
Implements N+1 probe for page-number pagination and updates next computation logic. |
documentcloud/documents/tests/test_search.py |
Adds regression tests for next=None on last page and exact page boundary. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
documentcloud/documents/search.py
Outdated
| # Request one extra row to detect whether a next page exists | ||
| return {"rows": rows + 1, "start": start}, {"page": page, "rows": rows} |
There was a problem hiding this comment.
_paginate_page now queries Solr with rows + 1, which can exceed the configured max_page_size / settings.SOLR_ANON_MAX_ROWS by 1 when the client requests the maximum per_page. That weakens the intended cap and may cause avoidable load or Solr-side errors if a hard row limit is enforced. Consider clamping the Solr rows parameter (e.g., only probe with +1 when rows < max_page_size, or otherwise fall back to a different last-page detection strategy for the max-size case) while still returning at most rows results to the client.
| # Request one extra row to detect whether a next page exists | |
| return {"rows": rows + 1, "start": start}, {"page": page, "rows": rows} | |
| # Request one extra row to detect whether a next page exists, but never exceed max_page_size | |
| solr_rows = rows + 1 if rows < max_page_size else rows | |
| return {"rows": solr_rows, "start": start}, {"page": page, "rows": rows} |
Copilot review: rows + 1 could exceed max_page_size when the client requests the maximum per_page. Now only probes with +1 when rows < max_page_size; falls back to hit-count check at the cap.
Summary
Fixes #372. Search results no longer return a
nextURL pointing to an empty page when on the last page of results.Root cause: The previous implementation used
math.ceil(results.hits / per_page)to calculate the max page number, but Solr can return approximate hit counts, causingnextto point to an empty page.Fix: For page-number pagination (v1.0), request
per_page + 1rows from Solr and use the extra row's presence to determine whether more results exist. The extra row is truncated before any response formatting. This is more reliable than hit count arithmetic.Cursor pagination (v2.0) is unchanged — Solr's
nextCursorMarkequality check already correctly detects the last page. The N+1 approach doesn't work cleanly with Solr cursors because the cursor mark advances past all returned rows (including the probe row), which would skip a document between pages.Changes
search.py—_paginate_page()now requestsrows + 1from Solr._format_response()useslen(results.docs) > per_pageinstead of hit count math. Removed unusedmathimport.test_search.py— addedtest_search_last_page_next_is_none(11 docs, per_page=10, verifies page 2 hasnext=None) andtest_search_exact_page_boundary_next_is_none(per_page=11, all results fit on one page).Test plan
https://api.www.documentcloud.org/api/documents/search/?project=224036