Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cursor-based pagination #21

Merged
merged 7 commits into from May 20, 2020
Merged

Cursor-based pagination #21

merged 7 commits into from May 20, 2020

Conversation

georgeclaghorn
Copy link
Contributor

@georgeclaghorn georgeclaghorn commented May 16, 2020

We’re trying out Vitess as a sharding proxy at Basecamp. It’s pretty sweet. Our Rails apps can query Vitess like it’s a monolithic MySQL database. To serve a cross-shard query, Vitess decomposes it into a query against each separate shard and stitches the results together.

Today, Geared Pagination uses an offset-based pagination strategy. It takes a page number and computes an offset into the paginated recordset. Then it issues a SELECT query with the computed OFFSET. Pretty simple.

One particular limitation of Vitess throws a wrench into the works. Vitess won’t query more than 10,000 rows from any single shard. When a cross-shard SELECT has an OFFSET, Vitess can’t propagate the OFFSET to each shard because it doesn’t know where the offset into the logical relation begins on any given shard. It instead has to:

  1. Query each shard for the first OFFSET + LIMIT matching rows
  2. Merge and sort the results in a temporary table
  3. Seek to the OFFSET in the temporary table
  4. Take and return LIMIT resulting rows

With an OFFSET greater than about 10,000, Vitess realizes it would have to select more than 10,000 rows from each tablet and rejects the query. That puts a hard cap on how far you can paginate into a large, cross-shard recordset.

The alternative is cursor-based pagination, implemented here. Instead of taking a page number and computing an offset, we take a “cursor” containing the relevant column values from the last row in the previous page. We build a query that only matches rows after the cursor. Vitess can propagate the query predicates to shards, selecting no more than LIMIT rows from each.

Here’s an example. Let’s say we’re paginating Post.where(status: :published).order(published_at: :desc, id: :desc). Instead of a query like this:

SELECT *
FROM `posts`
WHERE `status` = 'published'
ORDER BY `published_at` DESC, `id` DESC
OFFSET 200
LIMIT 50

…we generate a query like this, assuming the last post on the previous page had a published_at value of 2019-08-23 04:52:21.617 and an ID of 96770:

SELECT *
FROM `posts`
WHERE `status` = 'published'
AND (
  (`published_at` = '2019-08-23 04:52:21.617' AND `id` < 96770)
  OR `published_at` < '2019-08-23 04:52:21.617'
)
ORDER BY `published_at` DESC, `id` DESC
LIMIT 50

Cursor-based pagination can be beneficial even if you don’t shard. DBMSes commonly execute queries with OFFSETs by counting past OFFSET matching rows one at a time. Each page takes longer to find than the last. With cursor-based pagination and an index on the ORDER BY columns, the DB can seek directly to the beginning of each page and grab LIMIT matching rows. No need to scan to the beginning of the page.

You can use cursor-based pagination with Geared Pagination by passing the :ordered_by option to set_page_and_extract_portion_from in your controllers:

set_page_and_extract_portion_from Post.where(published: true),
  ordered_by: { published_at: :desc, id: :desc }, per_page: [ 15, 30, 50, 100 ]

Call next_param on @page to get the cursor for the next page:

<%= link_to "Next page", posts_path(page: @page.next_param) %>
<!-- <a href="/posts/?page=eyJwYWdlX251bWJlciI6MywidmFsdWVzIjp7InB1Ymxpc2hlZF9hdCI6IjIwMTktMDgtMjMgMDQ6NTI6MjEuNjE3IiwiaWQiOjk2NzcwfX0%3D">Next page</a> -->

A cursor is an URL-safe Base64-encoded JSON object containing the page number and relevant values from the last row of the previous page:

> Base64.urlsafe_decode64 "eyJwYWdlX251bWJlciI6MywidmFsdWVzIjp7InB1Ymxpc2hlZF9hdCI6IjIwMTktMDgtMjMgMDQ6NTI6MjEuNjE3IiwiaWQiOjk2NzcwfX0="
=> "{\"page_number\":3,\"values\":{\"published_at\":\"2019-08-23 04:52:21.617\",\"id\":96770}}"

However, do not rely on cursor structure. It may change in any Geared Pagination release without warning. Treat cursors as opaque page identifiers.

The page number is used to determine the size of the page—this is Geared Pagination, after all! The values are used to build queries for records on the page.

@sougou
Copy link

sougou commented May 17, 2020

I'm the co-creator of Vitess. Someone linked me this PR.
The 10,000 row limit applies only to "oltp" workloads, which is the default. If you set workload='olap', then there's no limit.

Looks like you figured out the best solution. We wrote up this proposal to support this use case natively vitessio/vitess#3351. But we felt that the application can do this for itself without much help from vitess.

@georgeclaghorn
Copy link
Contributor Author

Thanks, @sougou! The limit’s a good constraint in this case. We don’t want to load and filter out an unbounded number of rows, so we’re happy to tweak our queries.

@xdmx
Copy link

xdmx commented May 17, 2020

@georgeclaghorn nice to see this library supporting cursors. Do you mind adding some docs/reference to the Readme to not loosing track in the future? Without having to dig into the code to discover that it supports them.

@georgeclaghorn
Copy link
Contributor Author

@xdmx: Yep, docs are coming.

SQL AND has higher precedence than OR.
Copy link
Member

@jorgemanrubia jorgemanrubia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terrific job @georgeclaghorn 🙌.

Added a pretty minor comment for your consideration

lib/geared_pagination/portions/portion_at_cursor.rb Outdated Show resolved Hide resolved
@georgeclaghorn georgeclaghorn merged commit 5df1888 into master May 20, 2020
@georgeclaghorn georgeclaghorn deleted the cursor branch May 20, 2020 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants