Persistence implementation for pagination in some requests #1838

dimas-b · 2025-06-09T18:27:02Z

Following up on #1555

Refactor pagination code to delineate API-level page tokens and
internal "pointers to data"
Requests deal with the "previous" token, user-provided page size
(optional) and the previous request's page size.
Concentrate the logic of combining page size requests
and previous tokens in PageTokenUtil
PageToken subclasses are no longer necessary. EntityIdPaging handles
pagination over ordered result sets with static helper methods.

snazy

Overall the approach LGTM.

Some places however could be simplified and clarified though.

polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/PageTokenUtil.java

polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/Page.java

snazy · 2025-06-10T11:04:46Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/PageRequest.java

+   * @param requestedPageSize optional page size for the next page. If not set, the page size of the
+   *     previus page (encoded in the previous page token) will be reused.
+   */
+  public static PageRequest nextPage(


The function name and phrase 'from the previous API-level...are a bit confusing: what's "next"? What's "previous"? Why's there no "current"? All it does is constructing aPageRequest` from the two pagination-parameters.

Why not call this function constructFromParameters?

refactored to PageToken.decode()... Is it better?

snazy · 2025-06-10T11:06:34Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/PageRequest.java

+  }
+
+  /** Represents a request to start paginating with a particular page size. */
+  public static PageRequest firstPage(int limit) {


What's the general use case of this one? It's only used in some drop*() implementation.

This is now PageToken.fromLimit() (same name as before).

The use case is for handling initial paged requests where a size is provided, but a token in not. For API-based calls that is taken care of by the decode() method. This method is a convenience entry point for internal calls.

polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/PageRequest.java

...ris-core/src/main/java/org/apache/polaris/core/persistence/pagination/EntityIdPageToken.java

polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/PageTokenUtil.java

dimas-b · 2025-06-10T21:49:55Z

Squashed, rebased, resolved conflicts and addressed some feedback. Please review again.

Following up on apache#1555 * Refactor pagination code to delineate API-level page tokens and internal "pointers to data" * Requests deal with the "previous" token, user-provided page size (optional) and the previous request's page size. * Concentrate the logic of combining page size requests and previous tokens in PageTokenUtil * PageToken subclasses are no longer necessary. EntityIdPaging handles pagination over ordered result sets with static helper methods. Co-authored-by: Eric Maynard <eric.maynard+oss@snowflake.com>

eric-maynard · 2025-06-11T03:27:07Z

.../java/org/apache/polaris/extension/persistence/impl/eclipselink/PolarisEclipseLinkStore.java

+    if (pageToken.paginationRequested()) {
+      hql += " order by m.id asc";
+
+      if (pageToken.hasDataReference()) {


This is still polymorphism, we're just not taking advantage of the fact that this is built into the language already:

if (pageToken isinstanceof EntityIdPageToken e) { tokenId = e.getEntityId(); }

Polymorphism at the java type level is not necessary in this case. Users of the "data pointer" know how to parse it based on the API method called. It is not a matter of providing a sub-type, but about understanding the format.

Currently parsing is achieved via static methods in EntityIdPaging.

eric-maynard · 2025-06-11T03:39:33Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/Page.java

+   * @param request defines pagination parameters that were uses to produce this page of data.
+   * @param items stream of source data
+   * @param mapper converter from source data types to response data types.
+   * @param dataPointer determines the internal pointer to the next page of data given the last item


I'm not sure what this is or why we're calling it a "data pointer". it looks like a function that converts a list of elements into a page token, except for some reason we're not using the Java type PageToken and we're going directly to the serialized representation of a PageToken

The java type PageToken is a container for a tuple of:

Page size

the "data pointer"

The latter is an opaque piece of data as far as the java type PageToken is concerned. "Data pointers" are interpreted only by the code that actually produces them.

I think we need a Java type for what you're calling a "data pointer" -- for what the API calls page-token. In #1555, I called this PageToken. If we need to wrap this PageToken in a PageRequest, that's okay.

public record PageToken() { public void getEncodedString() { ... } public static PageToken fromEncodedString() { ... } } public record PageRequest(Optional<PageToken> pageToken, Optional<Integer> pageSize) {}

How about keeping current PageToken class mostly as it is, but replacing its String encodedDataReference with DataPointer dataPointer?

It does make the code nicer by having a specific type, but IMHO it adds a difficulty from the processing perspective:

Parsing DataPointer to a specific sub-type when PageToken is constructed will require encoding type information in the token. That is redundant IMHO, but I'm ok with that change.

Persistence methods that need to "understand" a DataPointer will have to do type casts, which is not elegant, IMHO. Currently they do not have to, persistence implementations currently just call a known parse method, which then will raise exceptions if the data is malformed.

As a middle ground option WDYT about using DataPointer to wrap the encoded String (no sub-types), but still use static parsing methods to be called from Persistence?

I don't think just wrapping the encoded string is an adequate Java representation of the page-token -- we should extract the information from that encoded string.

polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/PageToken.java

polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/PageTokenUtil.java

eric-maynard · 2025-06-11T03:45:03Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/PageTokenUtil.java

+  static PageToken decodePageRequest(
+      @Nullable String requestedPageToken, @Nullable Integer requestedPageSize) {
+    int pageSize;
+    String encodedDataReference = null;


If a token has multiple components, should that be modeled as one "encoded data reference"? Multiple?

The interpretation of encodedDataReference is up to the code that constructs it for paged responses. Let's deal with complex cases when we have them in practice.

Since we're defining the serde/deserde of the page token now, let's do it in a way that captures future page tokens we think are reasonable. I think that includes page requests with more than 2 discrete pieces of information.

The current approach does allow for future extensions. Different "data pointers" can be introduced when a Persistence impl. needs them. Since the "data pointer" always makes a round trip from Persistence code to the client and back to the same persistence code, that code is free to introduce new encoding/decoding methods without affecting anything else.

It is remotely possible that a page token ends up in an API method call different from the one that produced it. However, having "data pointer" sub-types (as opposed to just pairs of encode/decode methods) does not add any safety here. We could encode some "place" identifier into the token to bind it firmly to the API method, but I think it's an overkill.

I think this thread is just about not hard-coding the number of components in a page token right? Yes, we could add different encode/decode methods, but this one in PageTokenUtil looks like the canonical one in this design

eric-maynard · 2025-06-11T03:45:26Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/PageTokenUtil.java

+    return new PageToken(encodedDataReference, pageSize);
+  }
+
+  static @Nullable String encodePageToken(


Why is this not just a member of PageToken?

To co-locate encoding and decoding code in one java file.

You could co-locate them in PageToken, so I don't think that really answers the question

decodePageRequest is called from PageToken when the token is received, but encodePageToken is called fromPage when it is returned to the client. This class is the place to share code between PageToken and Page.

I still don't understand why this encoding isn't an attribute of the PageToken. I think this may just be a side effect of us directly using the encoded string instead of a PageToken, as is discussed above.

eric-maynard · 2025-06-11T03:46:09Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/Page.java

+   * servicing the request for the next page of related data.
+   */
+  public @Nullable String encodedResponseToken() {
+    return PageTokenUtil.encodePageToken(request, encodedDataReference);


Also, this looks like quite a thin method

Yes, it does :)

Yeah I think it would be reasonable for callers to just call PageTokenUtil.encodePageToken directly.

Calling PageTokenUtil.encodePageToken(Page) from a place that has a reference to a Page is less elegant than calling Page.encodedResponseToken()... This is a matter of opinion, of course :)

I can make this change if you prefer. No worries.

eric-maynard · 2025-06-11T03:47:14Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/PageToken.java

+   * Reconstructs a page token from the API-level page token string (returned to the client in the
+   * response to a previous request for similar data) and an API-level new requested page size.


What does "API-level" mean here? When we get a table identifier in a method like IcebergCatalog.newTableOps, we don't say it's an API-level TableIdentifier just because it came in via the API.

"API-level" here means any round trip to a client (specifically the Iceberg REST Catalog API).

Do you have a suggestion for improving this javadoc text?

Isn't it just "a page token string"?

eric-maynard · 2025-06-11T03:49:27Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/EntityIdPaging.java

+
+  /**
+   * Produces the reference to the next page of data in the form of a numerical entity ID. Entities
+   * in the associated stream of data are assumed to be ordered by ID.


What stream? This is just taking one entity and returning Long.toString(entity.getId()).

This is in reference to how the document method is used in relation to handling paginated requests and responses (which deal with streams of data).

I found this a little confusing, as the doc doesn't seem to match the actual behavior of the method

eric-maynard · 2025-06-11T03:50:08Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/EntityIdPaging.java

+import org.apache.polaris.core.entity.PolarisBaseEntity;
+
+/** Utility class for processing pagination of streams of entities ordered by some ID property. */
+public final class EntityIdPaging {


I'm not sure how useful this is going to be, since its methods are static there can't be another implementation. The methods both look very thin.

EntityIdPaging is a specific case of constructing and parsing PageToken.encodedDataReference() for Persistence implementations that rely of sorted streams of entities.

Having these static methods allows having the encoding/decoding code across several use cases in the current Polaris code.

I see, but I'm trying to understand what this will look like if we have SomeOtherEntityIdPaging. Will callers have some way of selecting a ...Paging based on the token structure or method?

If SomeOtherEntityIdPaging produces a single primitive value from a "data pointer" it would be very similar to EntityIdPaging, I guess.

If it produces a rich value, then it will need a companion type to hold the parsed data... but that is something to be done if and when we need it. It could be something like static NameAndIdPaging.entityBoundary(PageToken pageToken) returning a new NameAndIdPaging.Boundary(name, id).

but that is something to be done if and when we need it

We have an existing PR in #1555 that supports this use case, so I have some reservations about an alternative design that loses this functionality

…nceImpl

Based on apache#1838, following up on apache#1555 * Allows multiple implementations of `Token` referencing the "next page", encapsulated in `PageToken`. No changes to `polaris-core` needed to add custom `Token` implementations. * Extensible to (later) support (cryptographic) signatures to prevent tampered page-token * Refactor pagination code to delineate API-level page tokens and internal "pointers to data" * Requests deal with the "previous" token, user-provided page size (optional) and the previous request's page size. * Concentrate the logic of combining page size requests and previous tokens in `PageTokenUtil` * `PageToken` subclasses are no longer necessary. * Serialzation of `PageToken` uses Jackson serialization (smile format) Since no (metastore level) implementation handling pagination existed before, no backwards compatibility is needed.

Based on apache#1838, following up on apache#1555 * Allows multiple implementations of `Token` referencing the "next page", encapsulated in `PageToken`. No changes to `polaris-core` needed to add custom `Token` implementations. * Extensible to (later) support (cryptographic) signatures to prevent tampered page-token * Refactor pagination code to delineate API-level page tokens and internal "pointers to data" * Requests deal with the "previous" token, user-provided page size (optional) and the previous request's page size. * Concentrate the logic of combining page size requests and previous tokens in `PageTokenUtil` * `PageToken` subclasses are no longer necessary. * Serialzation of `PageToken` uses Jackson serialization (smile format) Since no (metastore level) implementation handling pagination existed before, no backwards compatibility is needed. Co-authored-by: Dmitri Bourlatchkov <dmitri.bourlatchkov@gmail.com> Co-authored-by: Eric Maynard <eric.maynard+oss@snowflake.com>

dimas-b requested review from adutra, ashvina, dennishuo, eric-maynard, jackye1995, jbonofre, vvcephei, collado-mike, snazy and RussellSpitzer as code owners June 9, 2025 18:27

github-project-automation bot added this to Basic Kanban Board Jun 9, 2025

dimas-b requested review from takidau, MonkeyCanCode, flyrain, ebyhr, ajantha-bhat, HonahX, singhpk234 and pingtimeout as code owners June 9, 2025 18:27

github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Jun 9, 2025

dimas-b changed the title ~~Pagination alt~~ Delineate pagination requests and tokens Jun 9, 2025

snazy reviewed Jun 10, 2025

View reviewed changes

dimas-b force-pushed the pagination-alt branch from b0b0543 to 22dacc4 Compare June 10, 2025 21:47

dimas-b changed the title ~~Delineate pagination requests and tokens~~ Persistence implementation for pagination in some requests Jun 10, 2025

dimas-b force-pushed the pagination-alt branch from 22dacc4 to 9b284f6 Compare June 10, 2025 21:49

dimas-b force-pushed the pagination-alt branch from 9b284f6 to c2fcf3b Compare June 10, 2025 21:53

dimas-b force-pushed the pagination-alt branch from c2fcf3b to 636ba44 Compare June 10, 2025 21:58

eric-maynard reviewed Jun 11, 2025

View reviewed changes

polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/PageToken.java Show resolved Hide resolved

eric-maynard reviewed Jun 11, 2025

View reviewed changes

polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/PageToken.java Show resolved Hide resolved

eric-maynard reviewed Jun 11, 2025

View reviewed changes

polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/PageTokenUtil.java Show resolved Hide resolved

eric-maynard reviewed Jun 11, 2025

View reviewed changes

review: remove spurious javadoc change

14e562e

dimas-b force-pushed the pagination-alt branch from dfd679d to 14e562e Compare June 11, 2025 15:53

dimas-b added 4 commits June 11, 2025 12:36

review: Rename PageToken.decode() back to .build()

ed9b517

review: avoid magic numbers in PolarisEclipseLinkStore

ad1b0a6

review: restore List return type of ListEntitiesResult.getEntities()

45369dc

review: locally immutable tokenFilter in TreeMapTransactionalPersiste…

cf88a5c

…nceImpl

snazy mentioned this pull request Jun 25, 2025

Extensible pagination token implementation #1938

Open

		* Reconstructs a page token from the API-level page token string (returned to the client in the
		* response to a previous request for similar data) and an API-level new requested page size.

Persistence implementation for pagination in some requests #1838

Are you sure you want to change the base?

Persistence implementation for pagination in some requests #1838

Uh oh!

Conversation

dimas-b commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snazy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dimas-b commented Jun 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimas-b Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimas-b commented Jun 9, 2025 •

edited

Loading

dimas-b Jun 13, 2025 •

edited

Loading

eric-maynard Jun 13, 2025 •

edited

Loading