adapter: share transient GlobalId generator with the compute controller#27558
adapter: share transient GlobalId generator with the compute controller#27558teskje merged 2 commits intoMaterializeInc:mainfrom
GlobalId generator with the compute controller#27558Conversation
cc9a419 to
648ece6
Compare
| impl<Id: From<u64> + Default> AtomicGen<Id> { | ||
| /// Allocates a new identifier of type `Id` and advances the generator. | ||
| pub fn allocate_id(&self) -> Id { | ||
| let id = self.id.fetch_add(1, Ordering::Relaxed); |
There was a problem hiding this comment.
Might be worth leaving a comment explaining why Ordering::Relaxed is correct (I haven't thought about it myself tbh).
There was a problem hiding this comment.
I'm also not 100% sure. The docs say "In its weakest Ordering::Relaxed, only the memory directly touched by the operation is synchronized." I think this is sufficient here because all we need is that each user of the atomic gets back a different value, the atomic doesn't protect any other state we require to be synchronized.
This random SO post supports my reasoning: https://stackoverflow.com/questions/30407121/which-stdsyncatomicordering-to-use#33293463
Relaxed Ordering
There are no constraints besides any modification to the memory location being atomic (so it either happens completely or not at all). This is fine for something like a counter if the values retrieved by/set by individual threads don't matter as long as they're atomic.
I'll add a comment to that effect. But lmk if you still have doubts! I think it would also be fine to just use the strongest ordering and be done with it.
There was a problem hiding this comment.
That reasoning is sound to me.
There was a problem hiding this comment.
The best resource for understanding these is chapter 3 from Mara Bos' book on atomics and locks. Here is a link to the section for this specific question but I highly recommend reading the whole chapter https://marabos.nl/atomics/memory-ordering.html#relaxed
Relaxed sounds right for me too
There was a problem hiding this comment.
Thanks! I have also enjoyed Herb Sutter's "atomic<> weapons" talk: https://www.youtube.com/watch?v=A8eCGOqgvH4
| impl<Id: From<u64> + Default> AtomicGen<Id> { | ||
| /// Allocates a new identifier of type `Id` and advances the generator. | ||
| pub fn allocate_id(&self) -> Id { | ||
| let id = self.id.fetch_add(1, Ordering::Relaxed); |
There was a problem hiding this comment.
Also, do we no longer care about overflow? Seems pretty unlikely, but previously we explicitly were handling it.
There was a problem hiding this comment.
Other ID generators we have (using IdGen) also don't check for overflow, so I figured it's fine to skip it here as well. It's indeed extremely unlikely that we'd ever run out of transient IDs, especially considering our weekly maintenance window.
There was a problem hiding this comment.
I did the math: If we allocated 1000 IDs every second we'd need 585 million years to overflow. I think we're probably good :)
648ece6 to
ea9aadb
Compare
MitigationsCompleting required mitigations increases Resilience Coverage.
Risk Summary:The risk score for the pull request is high at 80, indicating a significant likelihood of introducing bugs. This assessment is driven by predictors such as the sum of bug reports of files touched by the PR and the change in executable lines of code. Historically, pull requests with similar characteristics are 110% more likely to cause a bug compared to the repository's baseline. Additionally, there are 4 files modified in this PR that have recently seen a high number of bug fixes, which may contribute to the risk. While the repository's observed bug trend is currently decreasing, the predictors suggest caution for this pull request. Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity. Bug Hotspots:
|
ea9aadb to
7741fc0
Compare
def-
left a comment
There was a problem hiding this comment.
Coverage looks good: https://buildkite.com/materialize/coverage/builds/425
Nightly had two surprise timeouts, I'm retriggering them: https://buildkite.com/materialize/nightly/builds/8063 (not sure yet if related to the PR, probably not)
7741fc0 to
f60bfd5
Compare
|
Those timeouts occurred on other branches as well, so I assume they were unrelated. I rebased and rerun the nightlies and now the timeouts are gone. There are other failures (a benchmark regression that also occurs on main and a PG output consistency failure) but both are unlikely to be caused by this PR. |
For Unified Compute Introspection (epic, design, poc) the compute controller needs access to the transient ID generator, so it can generate IDs for introspection subscribes. To this end, the coordinator's
transient_id_genis made sharable by wrapping it into an atomic, a reference to which is passed to the compute controller.Motivation
Part of https://github.com/MaterializeInc/database-issues/issues/7898
Tips for reviewer
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.