New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cursor-based pagination #21
Conversation
I'm the co-creator of Vitess. Someone linked me this PR. Looks like you figured out the best solution. We wrote up this proposal to support this use case natively vitessio/vitess#3351. But we felt that the application can do this for itself without much help from vitess. |
Thanks, @sougou! The limit’s a good constraint in this case. We don’t want to load and filter out an unbounded number of rows, so we’re happy to tweak our queries. |
@georgeclaghorn nice to see this library supporting cursors. Do you mind adding some docs/reference to the Readme to not loosing track in the future? Without having to dig into the code to discover that it supports them. |
@xdmx: Yep, docs are coming. |
SQL AND has higher precedence than OR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Terrific job @georgeclaghorn 🙌.
Added a pretty minor comment for your consideration
We’re trying out Vitess as a sharding proxy at Basecamp. It’s pretty sweet. Our Rails apps can query Vitess like it’s a monolithic MySQL database. To serve a cross-shard query, Vitess decomposes it into a query against each separate shard and stitches the results together.
Today, Geared Pagination uses an offset-based pagination strategy. It takes a page number and computes an offset into the paginated recordset. Then it issues a
SELECT
query with the computedOFFSET
. Pretty simple.One particular limitation of Vitess throws a wrench into the works. Vitess won’t query more than 10,000 rows from any single shard. When a cross-shard
SELECT
has anOFFSET
, Vitess can’t propagate theOFFSET
to each shard because it doesn’t know where the offset into the logical relation begins on any given shard. It instead has to:OFFSET
+LIMIT
matching rowsOFFSET
in the temporary tableLIMIT
resulting rowsWith an
OFFSET
greater than about 10,000, Vitess realizes it would have to select more than 10,000 rows from each tablet and rejects the query. That puts a hard cap on how far you can paginate into a large, cross-shard recordset.The alternative is cursor-based pagination, implemented here. Instead of taking a page number and computing an offset, we take a “cursor” containing the relevant column values from the last row in the previous page. We build a query that only matches rows after the cursor. Vitess can propagate the query predicates to shards, selecting no more than
LIMIT
rows from each.Here’s an example. Let’s say we’re paginating
Post.where(status: :published).order(published_at: :desc, id: :desc)
. Instead of a query like this:…we generate a query like this, assuming the last post on the previous page had a
published_at
value of2019-08-23 04:52:21.617
and an ID of96770
:Cursor-based pagination can be beneficial even if you don’t shard. DBMSes commonly execute queries with
OFFSET
s by counting pastOFFSET
matching rows one at a time. Each page takes longer to find than the last. With cursor-based pagination and an index on theORDER BY
columns, the DB can seek directly to the beginning of each page and grabLIMIT
matching rows. No need to scan to the beginning of the page.You can use cursor-based pagination with Geared Pagination by passing the
:ordered_by
option toset_page_and_extract_portion_from
in your controllers:Call
next_param
on@page
to get the cursor for the next page:A cursor is an URL-safe Base64-encoded JSON object containing the page number and relevant values from the last row of the previous page:
However, do not rely on cursor structure. It may change in any Geared Pagination release without warning. Treat cursors as opaque page identifiers.
The page number is used to determine the size of the page—this is Geared Pagination, after all! The values are used to build queries for records on the page.