Skip to content

[adapter] Batch-allocate user IDs to reduce persist writes during DDL#35460

Merged
mtabebe merged 2 commits intoMaterializeInc:mainfrom
mtabebe:ma/dd/batch-id-fix
Mar 13, 2026
Merged

[adapter] Batch-allocate user IDs to reduce persist writes during DDL#35460
mtabebe merged 2 commits intoMaterializeInc:mainfrom
mtabebe:ma/dd/batch-id-fix

Conversation

@mtabebe
Copy link
Contributor

@mtabebe mtabebe commented Mar 12, 2026

Two commits: the original from #35310

And a new commit that fixes the issue by finding the max used ID from the catalog.

Keeping commits separate to make it easier to re-review.

…MaterializeInc#35310)

# Problem:
Every DDL statement (CREATE TABLE, CREATE VIEW, etc.) performs a persist
write + timestamp oracle call to allocate a single user ID. This
represents one extra round trip to the timestamp oracle, in addition to
the necessary catalog operation for DDL.

# Solution:
- Add `IdPool` struct with `allocate()`, `allocate_many()`, and
`refill()` methods to pre-allocate user IDs in a contiguous range
- Add `USER_ID_POOL_BATCH_SIZE` dyncfg (default 512) to control batch
size
- Add `user_id_pool` field to `Coordinator` initialized as empty
- Add Coordinator wrapper methods `allocate_user_id()` and
`allocate_user_ids(count)` that draw from the pool and auto-refill from
catalog when exhausted
- Replace all call sites that used the two-line pattern of
`get_catalog_write_ts()` + `catalog().allocate_user_id(ts)` with the new
pooled methods
- This eliminates the timestamp oracle call, and catalog operation

# Testing:
- New `IdPool` unit tests covering empty pool, single/batch allocation,
refill, mixed allocation, and invalid range panic
- New SLT regression test (`test/sqllogictest/id_pool.slt`) exercising
tables, views, materialized views, indexes, types, secrets, and
drop/recreate cycles
- New MZCompose restart test to ensure IDs remain unique across the
restart

# Perf
On local machine: 10% saving: from 160ms to 145 ms on a empty catalog

Closes SQL-115
@mtabebe mtabebe requested review from aljoscha and teskje March 12, 2026 20:55
@mtabebe mtabebe requested review from a team as code owners March 12, 2026 20:55
@github-actions
Copy link

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

@mtabebe
Copy link
Contributor Author

mtabebe commented Mar 12, 2026

Problem:
In the preflight phase of a zero down time upgrade, get_next_ids()
is used to determine if there has been ongoing DDL.

Since the IdPool batch-allocates IDs, the counter can advance far ahead
of actually-created items. New objects created from the pool then had
IDs below the captured baseline and were missed by DDL detection,
so the standby could fail to see new tables/views.

MaterializeInc/database-issues#11232

Solution:
- Compute next user item ID and next user replica ID from the max existing
  IDs in the catalog (transaction get_items / get_cluster_replicas) instead
  of the allocator counter.

Testing:
- New cargo test: mz-catalog test_persist_ddl_detection_with_batch_allocated_ids
- New MZ Compose test: workflow_ddl_detection_with_id_pool
@mtabebe mtabebe force-pushed the ma/dd/batch-id-fix branch from 1edf51f to 1913e45 Compare March 12, 2026 21:11
Copy link
Contributor

@def- def- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding a 0dt test!

Copy link
Contributor

@aljoscha aljoscha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks good, but I think there's a cargo test failure.

@mtabebe mtabebe merged commit 5b9fb22 into MaterializeInc:main Mar 13, 2026
126 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants