Skip to content

adapter: share transient GlobalId generator with the compute controller#27558

Merged
teskje merged 2 commits intoMaterializeInc:mainfrom
teskje:shareable-transient_id_gen
Jun 14, 2024
Merged

adapter: share transient GlobalId generator with the compute controller#27558
teskje merged 2 commits intoMaterializeInc:mainfrom
teskje:shareable-transient_id_gen

Conversation

@teskje
Copy link
Copy Markdown
Contributor

@teskje teskje commented Jun 11, 2024

For Unified Compute Introspection (epic, design, poc) the compute controller needs access to the transient ID generator, so it can generate IDs for introspection subscribes. To this end, the coordinator's transient_id_gen is made sharable by wrapping it into an atomic, a reference to which is passed to the compute controller.

Motivation

  • This PR adds a known-desirable feature.

Part of https://github.com/MaterializeInc/database-issues/issues/7898

Tips for reviewer

Checklist

@teskje teskje force-pushed the shareable-transient_id_gen branch 2 times, most recently from cc9a419 to 648ece6 Compare June 11, 2024 14:36
@teskje teskje marked this pull request as ready for review June 11, 2024 15:51
@teskje teskje requested review from a team and benesch as code owners June 11, 2024 15:51
@teskje teskje requested a review from jkosh44 June 11, 2024 15:51
impl<Id: From<u64> + Default> AtomicGen<Id> {
/// Allocates a new identifier of type `Id` and advances the generator.
pub fn allocate_id(&self) -> Id {
let id = self.id.fetch_add(1, Ordering::Relaxed);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth leaving a comment explaining why Ordering::Relaxed is correct (I haven't thought about it myself tbh).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not 100% sure. The docs say "In its weakest Ordering::Relaxed, only the memory directly touched by the operation is synchronized." I think this is sufficient here because all we need is that each user of the atomic gets back a different value, the atomic doesn't protect any other state we require to be synchronized.

This random SO post supports my reasoning: https://stackoverflow.com/questions/30407121/which-stdsyncatomicordering-to-use#33293463

Relaxed Ordering
There are no constraints besides any modification to the memory location being atomic (so it either happens completely or not at all). This is fine for something like a counter if the values retrieved by/set by individual threads don't matter as long as they're atomic.

I'll add a comment to that effect. But lmk if you still have doubts! I think it would also be fine to just use the strongest ordering and be done with it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That reasoning is sound to me.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best resource for understanding these is chapter 3 from Mara Bos' book on atomics and locks. Here is a link to the section for this specific question but I highly recommend reading the whole chapter https://marabos.nl/atomics/memory-ordering.html#relaxed

Relaxed sounds right for me too

Copy link
Copy Markdown
Contributor Author

@teskje teskje Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have also enjoyed Herb Sutter's "atomic<> weapons" talk: https://www.youtube.com/watch?v=A8eCGOqgvH4

impl<Id: From<u64> + Default> AtomicGen<Id> {
/// Allocates a new identifier of type `Id` and advances the generator.
pub fn allocate_id(&self) -> Id {
let id = self.id.fetch_add(1, Ordering::Relaxed);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we no longer care about overflow? Seems pretty unlikely, but previously we explicitly were handling it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other ID generators we have (using IdGen) also don't check for overflow, so I figured it's fine to skip it here as well. It's indeed extremely unlikely that we'd ever run out of transient IDs, especially considering our weekly maintenance window.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the math: If we allocated 1000 IDs every second we'd need 585 million years to overflow. I think we're probably good :)

Copy link
Copy Markdown
Contributor

@jkosh44 jkosh44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@teskje teskje force-pushed the shareable-transient_id_gen branch from 648ece6 to ea9aadb Compare June 12, 2024 07:34
@shepherdlybot
Copy link
Copy Markdown

shepherdlybot bot commented Jun 12, 2024

Risk Score:80 / 100 Bug Hotspots:4 Resilience Coverage:33%

Mitigations

Completing required mitigations increases Resilience Coverage.

  • (Required) Code Review 🔍 Detected
  • (Required) Feature Flag
  • (Required) Integration Test
  • (Required) Observability
  • (Required) QA Review 🔍 Detected
  • (Required) Run Nightly Tests
  • Unit Test
Risk Summary:

The risk score for the pull request is high at 80, indicating a significant likelihood of introducing bugs. This assessment is driven by predictors such as the sum of bug reports of files touched by the PR and the change in executable lines of code. Historically, pull requests with similar characteristics are 110% more likely to cause a bug compared to the repository's baseline. Additionally, there are 4 files modified in this PR that have recently seen a high number of bug fixes, which may contribute to the risk. While the repository's observed bug trend is currently decreasing, the predictors suggest caution for this pull request.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File Percentile
../src/coord.rs 97
../src/catalog.rs 96
../src/controller.rs 96
../controller/instance.rs 99

@teskje teskje force-pushed the shareable-transient_id_gen branch from ea9aadb to 7741fc0 Compare June 12, 2024 07:59
Copy link
Copy Markdown
Contributor

@def- def- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage looks good: https://buildkite.com/materialize/coverage/builds/425
Nightly had two surprise timeouts, I'm retriggering them: https://buildkite.com/materialize/nightly/builds/8063 (not sure yet if related to the PR, probably not)

@teskje teskje force-pushed the shareable-transient_id_gen branch from 7741fc0 to f60bfd5 Compare June 13, 2024 13:18
@teskje
Copy link
Copy Markdown
Contributor Author

teskje commented Jun 14, 2024

Those timeouts occurred on other branches as well, so I assume they were unrelated. I rebased and rerun the nightlies and now the timeouts are gone. There are other failures (a benchmark regression that also occurs on main and a PG output consistency failure) but both are unlikely to be caused by this PR.

@teskje teskje merged commit 0c838af into MaterializeInc:main Jun 14, 2024
@teskje teskje deleted the shareable-transient_id_gen branch June 14, 2024 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants