adapter: share transient `GlobalId` generator with the compute controller by teskje · Pull Request #27558 · MaterializeInc/materialize

teskje · 2024-06-11T10:42:37Z

For Unified Compute Introspection (epic, design, poc) the compute controller needs access to the transient ID generator, so it can generate IDs for introspection subscribes. To this end, the coordinator's transient_id_gen is made sharable by wrapping it into an atomic, a reference to which is passed to the compute controller.

Motivation

This PR adds a known-desirable feature.

Part of https://github.com/MaterializeInc/database-issues/issues/7898

Tips for reviewer

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:
- N/A

jkosh44 · 2024-06-11T16:45:38Z

src/ore/src/id_gen.rs

+impl<Id: From<u64> + Default> AtomicGen<Id> {
+    /// Allocates a new identifier of type `Id` and advances the generator.
+    pub fn allocate_id(&self) -> Id {
+        let id = self.id.fetch_add(1, Ordering::Relaxed);


Might be worth leaving a comment explaining why Ordering::Relaxed is correct (I haven't thought about it myself tbh).

I'm also not 100% sure. The docs say "In its weakest Ordering::Relaxed, only the memory directly touched by the operation is synchronized." I think this is sufficient here because all we need is that each user of the atomic gets back a different value, the atomic doesn't protect any other state we require to be synchronized.

This random SO post supports my reasoning: https://stackoverflow.com/questions/30407121/which-stdsyncatomicordering-to-use#33293463

Relaxed Ordering
There are no constraints besides any modification to the memory location being atomic (so it either happens completely or not at all). This is fine for something like a counter if the values retrieved by/set by individual threads don't matter as long as they're atomic.

I'll add a comment to that effect. But lmk if you still have doubts! I think it would also be fine to just use the strongest ordering and be done with it.

That reasoning is sound to me.

The best resource for understanding these is chapter 3 from Mara Bos' book on atomics and locks. Here is a link to the section for this specific question but I highly recommend reading the whole chapter https://marabos.nl/atomics/memory-ordering.html#relaxed

Relaxed sounds right for me too

Thanks! I have also enjoyed Herb Sutter's "atomic<> weapons" talk: https://www.youtube.com/watch?v=A8eCGOqgvH4

jkosh44 · 2024-06-11T16:47:34Z

src/ore/src/id_gen.rs

+impl<Id: From<u64> + Default> AtomicGen<Id> {
+    /// Allocates a new identifier of type `Id` and advances the generator.
+    pub fn allocate_id(&self) -> Id {
+        let id = self.id.fetch_add(1, Ordering::Relaxed);


Also, do we no longer care about overflow? Seems pretty unlikely, but previously we explicitly were handling it.

Other ID generators we have (using IdGen) also don't check for overflow, so I figured it's fine to skip it here as well. It's indeed extremely unlikely that we'd ever run out of transient IDs, especially considering our weekly maintenance window.

I did the math: If we allocated 1000 IDs every second we'd need 585 million years to overflow. I think we're probably good :)

jkosh44

LGTM

shepherdlybot · 2024-06-12T07:34:55Z

Mitigations

Completing required mitigations increases Resilience Coverage.

Risk Summary:

The risk score for the pull request is high at 80, indicating a significant likelihood of introducing bugs. This assessment is driven by predictors such as the sum of bug reports of files touched by the PR and the change in executable lines of code. Historically, pull requests with similar characteristics are 110% more likely to cause a bug compared to the repository's baseline. Additionally, there are 4 files modified in this PR that have recently seen a high number of bug fixes, which may contribute to the risk. While the repository's observed bug trend is currently decreasing, the predictors suggest caution for this pull request.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File	Percentile
../src/coord.rs	97
../src/catalog.rs	96
../src/controller.rs	96
../controller/instance.rs	99

def-

Coverage looks good: https://buildkite.com/materialize/coverage/builds/425
Nightly had two surprise timeouts, I'm retriggering them: https://buildkite.com/materialize/nightly/builds/8063 (not sure yet if related to the PR, probably not)

teskje · 2024-06-14T08:50:04Z

Those timeouts occurred on other branches as well, so I assume they were unrelated. I rebased and rerun the nightlies and now the timeouts are gone. There are other failures (a benchmark regression that also occurs on main and a PG output consistency failure) but both are unlikely to be caused by this PR.

teskje force-pushed the shareable-transient_id_gen branch 2 times, most recently from cc9a419 to 648ece6 Compare June 11, 2024 14:36

teskje marked this pull request as ready for review June 11, 2024 15:51

teskje requested review from a team and benesch as code owners June 11, 2024 15:51

teskje requested a review from jkosh44 June 11, 2024 15:51

jkosh44 reviewed Jun 11, 2024

View reviewed changes

jkosh44 approved these changes Jun 11, 2024

View reviewed changes

teskje force-pushed the shareable-transient_id_gen branch from 648ece6 to ea9aadb Compare June 12, 2024 07:34

teskje force-pushed the shareable-transient_id_gen branch from ea9aadb to 7741fc0 Compare June 12, 2024 07:59

def- approved these changes Jun 12, 2024

View reviewed changes

teskje added 2 commits June 13, 2024 15:18

adapter: sharable transient ID generator

ace36ed

controller: give compute controller access to transient ID gen

f60bfd5

teskje force-pushed the shareable-transient_id_gen branch from 7741fc0 to f60bfd5 Compare June 13, 2024 13:18

teskje merged commit 0c838af into MaterializeInc:main Jun 14, 2024

teskje deleted the shareable-transient_id_gen branch June 14, 2024 08:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adapter: share transient `GlobalId` generator with the compute controller#27558

adapter: share transient `GlobalId` generator with the compute controller#27558
teskje merged 2 commits intoMaterializeInc:mainfrom
teskje:shareable-transient_id_gen

teskje commented Jun 11, 2024 •

edited

Loading

Uh oh!

jkosh44 Jun 11, 2024

Uh oh!

teskje Jun 11, 2024

Uh oh!

jkosh44 Jun 11, 2024

Uh oh!

petrosagg Jun 12, 2024

Uh oh!

teskje Jun 12, 2024 •

edited

Loading

Uh oh!

jkosh44 Jun 11, 2024

Uh oh!

teskje Jun 11, 2024

Uh oh!

teskje Jun 12, 2024

Uh oh!

jkosh44 left a comment

Uh oh!

shepherdlybot bot commented Jun 12, 2024 •

edited

Loading

Uh oh!

def- left a comment

Uh oh!

teskje commented Jun 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

teskje commented Jun 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Tips for reviewer

Checklist

Uh oh!

jkosh44 Jun 11, 2024

Choose a reason for hiding this comment

Uh oh!

teskje Jun 11, 2024

Choose a reason for hiding this comment

Uh oh!

jkosh44 Jun 11, 2024

Choose a reason for hiding this comment

Uh oh!

petrosagg Jun 12, 2024

Choose a reason for hiding this comment

Uh oh!

teskje Jun 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkosh44 Jun 11, 2024

Choose a reason for hiding this comment

Uh oh!

teskje Jun 11, 2024

Choose a reason for hiding this comment

Uh oh!

teskje Jun 12, 2024

Choose a reason for hiding this comment

Uh oh!

jkosh44 left a comment

Choose a reason for hiding this comment

Uh oh!

shepherdlybot bot commented Jun 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Mitigations

Uh oh!

def- left a comment

Choose a reason for hiding this comment

Uh oh!

teskje commented Jun 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

teskje commented Jun 11, 2024 •

edited

Loading

teskje Jun 12, 2024 •

edited

Loading

shepherdlybot bot commented Jun 12, 2024 •

edited

Loading