New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop/Create Index Timestamping Race condition #1907
Conversation
… that already has a derived materialized view
.hist_date_offset_millis = zero_const, | ||
.order_entry_date_offset_millis = zero_const, | ||
.orderline_delivery_date_offset_millis = zero_const, | ||
.hist_date_offset_millis = static_cast<inner_type>(std::uniform_int_distribution<int64_t>(946684800,1704067200)), // 2010-2024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are changes like these also intended to produce dates with a greater range of values than just the day that the demo is being run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the metabase demo, which uses this, yes that is beneficial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've configured it to generate dates from 2000 to 2024.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! My understanding is limited, but the in-person explanation made things more clear.
could you include what the meaning of the conversation with andi was? that commit message won't help us remember the rationale later. |
What this PR doesn't solve: Right now we lack a way to "identify" a source instantiation in a way that is 1) unique across dataflows 2) persists across crashes We can achieve 1) by identifying a source instance with the pair (source_id, id of the first index that that view exports). Unfortunately, we cannot necessarily achieve 2): if we have two indices on a view i and ii (say respectively on column a and column b), we will recover the view with a different id than the one we created it with.
This "bug" will have no effect on sources with BYO consistency. For "real-time" consistency, this will have the effect of treating the second "iteration" of the view as a "fresh" iteration with no prior timestamping information. This means before the crash, record 1 can be timestamped with t, and after the crash, it can be retimestamped with t'. It will not cause a crash or any data to be lost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The discussion with Andi was trying to identify a tuple that would refer uniquely to a given source instantiation. The pair (src_id.sid, first_export_id) is unique for a given run of Materialize, but not across runs if we create two indices and drop the first one only (see general comment in main PR)
1) Race condition
There was a race condition where dropping an index and recreate an index in close succession would cause timestamping to stop on a source.
2) Naming collision
This PR also fixes a case of duplicate naming if
3) CHBench Load Generator