NoSQL: Node IDs - API, SPI + general implementation #2728

snazy · 2025-09-30T15:03:16Z

This PR provides a mechanism to assign a Polaris-cluster-wide unique node-ID to each Polaris instance, which is then used when generating Polaris-cluster-wide unique Snowflake-IDs.

The change is fundamental for the NoSQL work, but also demanded for the existing relational JDBC persistence.

Does not include any persistence specific implementation.

...tence/nosql/nodes/impl/src/main/java/org/apache/polaris/nodeids/impl/NodeManagementImpl.java

...istence/nosql/nodes/impl/src/main/java/org/apache/polaris/nodes/impl/NodeManagementImpl.java

flyrain

Thanks for working on it! Left some comments. Given this is a big change(23 new files and 3 new modules), is it worth to have a dev list discussion? So that people are aware of the changes and contribute their ideas.

flyrain · 2025-10-02T22:45:12Z

persistence/nosql/nodes/README.md

+Some ID generation mechanisms,
+like [Snowflake-IDs](https://medium.com/@jitenderkmr/demystifying-snowflake-ids-a-unique-identifier-in-distributed-computing-72796a827c9d),
+require unique integer IDs for each running node. This framework provides a mechanism to assign each running node a
+unique integer ID.


If snowflake id generator requires such complex node id generator, maybe we should consider other options. Would it possible to use other id generators? Since we are in the persistence module already, why cannot we use something like ObjectID in mongoDB, or Java UUID?

The snowflake id generator is already used by the NoSQL persistence impl., of which this PR is just a sub-component.

flyrain · 2025-10-02T22:49:45Z

persistence/nosql/nodes/README.md

+* `polaris-nodes-api` provides the necessary Java interfaces and immutable types.
+* `polaris-nodes-impl` provides the storage agnostic implementation.
+* `polaris-nodes-spi` provides the necessary interfaces to provide a storage specific implementation.
+* `polaris-nodes-store-nosql` provides the storage implementation based on `polaris-persistence-nosql-api`.


Where is the module?

Currently it's in the end-to-end NoSQL PR: #1189 ... to be made available for review later (to allow for smaller, easier-to-review PRs, as discussed)

flyrain · 2025-10-02T22:52:38Z

persistence/nosql/nodes/api/src/main/java/org/apache/polaris/nodes/api/Node.java

+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.polaris.nodes.api;


I don't think anywhere else in Polaris needs this. Can we rename it to org.apache.polaris.nosql.nodes.api or org.apache.polaris.nosql.snowflakeid.nodes.api?

It is true that this PR adds code that in meant to support Snowflake ID generators.

Proposal: Align with existing ID Gen code in main.

package org.apache.polaris.ids.nodes.*

Location: persistence/nosql/idgen/nodes/...

@snazy @flyrain @dennishuo WDYT?

While the main API interface mentions API to lease node IDs..., the use cases are not limited to that sole use case. Other IMHO interesting use cases are to get an overview of the active processes (= nodes) of a single Polaris cluster. Adding too specific use cases or even specific call-sites to the package name(s) feels like restricting the use cases.

I'd prefer to keep the current packages names. I'm okay to rename the packages to org.apache.polaris.nodeleases.* or org.apache.polaris.nodeids.* though. But the whole effort isn't user-facing at all, so later renames are possible w/o the risk of breaking anything.

+1 to nodeids

to get an overview of the active processes (= nodes) of a single Polaris cluster.

Hi @snazy, could you elaborate the use cases of node id generator beyond the snowflake id generator?

Hm, not sure I understand your cite of the get an overview of the active processes use case and question about another use case.

Hi @flyrain : the concrete use case ATM is feeding nodes IDs into the Snowflake ID generator. That is required for the NoSQL persistence to work end-to-end (#1189).

As a side benefit of maintaining a list of active node IDs, one can use that information to report the status of Polaris JVMs that allocate those node IDs. However, this is completely at the discretion of downstream projects that include Polaris libraries.

Hm, not sure I understand your cite of the get an overview of the active processes use case and question about another use case.

The citation probably doesn't quite matter. I was trying to understand the node id generator use cases beyond snowflake id.

As a side benefit of maintaining a list of active node IDs, one can use that information to report the status of Polaris JVMs that allocate those node IDs. However, this is completely at the discretion of downstream projects that include Polaris libraries.

Thanks, Dmitri! This feels more like a K8s-level concern rather than something at the application level (referring to the Polaris service). Could you shed some light on how downstream projects make use of these node IDs?

The only use case that I know of is the Snowflake IDs. I mentioned downstream together with "can", I did not mean to imply that such downstream projects already exist ATM :)

Generally I agree with @flyrain about starting with more constrained package names where possible, not because we're necessarily implying that the concepts within the package can't be useful in other use cases, but because it's best to be more "deliberate" when adopting the libraries into those other use cases, where we'll be able to better assess the suitability of which aspects constitute a stable SPI, whether there are pitfalls to document better, etc.

I do think the nodeids package name is at least an improvement over the more general nodes at this stage though, so maybe that's enough for now.

flyrain · 2025-10-02T22:55:37Z

persistence/nosql/nodes/README.md

+* `polaris-nodes-api` provides the necessary Java interfaces and immutable types.
+* `polaris-nodes-impl` provides the storage agnostic implementation.
+* `polaris-nodes-spi` provides the necessary interfaces to provide a storage specific implementation.


These modules are used by snowflake id generator only, can we merge it into the modules holding snowflake id generators? So that the snowflake id generator is more consistent and self-contained.

That is a valid point 👍 I made a specific renaming proposal in the thread about the package name (above).

This PR provides a mechanism to assign a Polaris-cluster-wide unique node-ID to each Polaris instance, which is then used when generating Polaris-cluster-wide unique Snowflake-IDs. The change is fundamental for the NoSQL work, but also demanded for the existing relational JDBC persistence. Does not include any persistence specific implementation.

Also move the expensive part to a `@PostConstruct` to not block CDI entirely from initializing.

dennishuo

I agree nodeids is an improvement over nodes as a package name, and I'm okay with moving forward with this PR as-is for now to unblock further work, though my top preference would've still been to constrain to a nosql package name initially, then if there are non-nosql use cases we can always move into a more general package name along with discussion about deeper documentation preferences as it comes.

We can maybe better come up with standard guidance within our related "SPI" discussion - to me, package names constitute some degree of "prescriptive" scoping of shared code, in contrast to the separation of compilation modules being more "descriptive" in nature. So it's more about what we're communicating to (especially, new) developers trying to find their way around the codebase than any pure technical consideration.

And in that vein it's always easier to start more constrained and make it more open as needed rather than the other way around.

The two sides of the coin for commitment to SPIs are that we can provide better stability and broad generalization of usage of core SPI packages by being selective in avoiding premature generalization.

Following up on apache#2728 this change moves "nodeids" code to the `org.apache.polaris.nosql.nodeids` package.

dimas-b · 2025-10-28T23:35:24Z

@dennishuo @flyrain : Follow package rename PR: #2931

Following up on apache#2728 this change moves "nodeids" code to the `org.apache.polaris.nosql.nodeids` package.

github-project-automation bot added this to Basic Kanban Board Sep 30, 2025

github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Sep 30, 2025

dimas-b requested a review from dennishuo September 30, 2025 15:04

snazy force-pushed the nosql-nodes-1 branch from c06eb94 to 6e21df6 Compare October 1, 2025 09:45

dimas-b reviewed Oct 1, 2025

View reviewed changes

...tence/nosql/nodes/impl/src/main/java/org/apache/polaris/nodeids/impl/NodeManagementImpl.java Show resolved Hide resolved

...istence/nosql/nodes/impl/src/main/java/org/apache/polaris/nodes/impl/NodeManagementImpl.java Outdated Show resolved Hide resolved

snazy force-pushed the nosql-nodes-1 branch from 6e21df6 to 202c66c Compare October 2, 2025 12:50

dimas-b previously approved these changes Oct 2, 2025

View reviewed changes

github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Oct 2, 2025

flyrain reviewed Oct 2, 2025

View reviewed changes

snazy added 4 commits October 20, 2025 10:50

NoSQL: simplify node allocation

7381692

NoSQL: Fail node-management-impl init after timeout

b4107ff

Also move the expensive part to a `@PostConstruct` to not block CDI entirely from initializing.

rename

8ebdfd2

snazy dismissed dimas-b’s stale review via 8ebdfd2 October 20, 2025 09:32

snazy force-pushed the nosql-nodes-1 branch from 05f32c7 to 8ebdfd2 Compare October 20, 2025 09:32

dimas-b approved these changes Oct 20, 2025

View reviewed changes

dennishuo approved these changes Oct 28, 2025

View reviewed changes

dimas-b merged commit 3dd46b9 into apache:main Oct 28, 2025
20 of 23 checks passed

github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Oct 28, 2025

dimas-b added a commit to dimas-b/polaris that referenced this pull request Oct 28, 2025

Move nodeids to nosql package parent

e67c82f

Following up on apache#2728 this change moves "nodeids" code to the `org.apache.polaris.nosql.nodeids` package.

dimas-b mentioned this pull request Oct 28, 2025

Move nodeids to nosql package parent #2931

Open

snazy deleted the nosql-nodes-1 branch October 29, 2025 08:18

snazy pushed a commit to snazy/polaris that referenced this pull request Oct 29, 2025

Move nodeids to nosql package parent

4e9e11f

Following up on apache#2728 this change moves "nodeids" code to the `org.apache.polaris.nosql.nodeids` package.

Uh oh!

NoSQL: Node IDs - API, SPI + general implementation #2728

NoSQL: Node IDs - API, SPI + general implementation #2728

Uh oh!

Conversation

snazy commented Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

flyrain left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimas-b Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dennishuo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dimas-b commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dimas-b Oct 20, 2025 •

edited

Loading