New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adapter: correct initial timestamp determination in transactions #20267
adapter: correct initial timestamp determination in transactions #20267
Conversation
There's no tests for this yet because there weren't before and writing a test scenario for exactly this is not currently within the realm of what our test code can do deterministically. We should add tests for this (both randomized and deterministic), but those may take some days to write and I think it may be better to get this into the release before those because we're already in a known broken state. The by-hand tests I did worked, so some tests have been done, just none automated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on Philip's local repro the issue seems fixed. A deterministic test would be even better of course.
I understand how the problem described leads to the errors in #19743, but I'm having trouble figuring out how they lead to the identifier missing errors in the other issues. Would you be able to include a description of that in your explanation? |
source_bundle: &CollectionIdBundle, | ||
source_ids: &BTreeSet<GlobalId>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an existing issue, but I'm finding it very difficult to tell the difference between source_ids
and source_bundle
. source_ids
seems to be IDs directly referenced in the query, while source_bundle
seems to be the result of a call to sufficient_collections
on an index_oracle
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all important for this PR, I just wanted to call it out.
You had marked this as closing #20197 but I'm not seeing how that issue is connected to this PR, so I removed that from the PR description. Let me know if I've erred. |
44a6fbd
to
bf093bf
Compare
In MaterializeInc#19026 various timestamp functions were refactored to allow `EXPLAIN TIMESTAMP` to reflect (and effect) the current transaction. This is tricky code and the refactor incorrectly changed the behavior of transactions. In a transaction our intention is to find a timestamp during the first statement that will be valid for any other object a user might query later in the transaction, then aquire read holds on all those referenced objects. In that change, the first statement only used the referenced IDs when determining a timestamp, not all nearby (objects in the same schema) IDs. Read holds were still correctly aquired on all nearby objects. This could cause the transaction's timestamp to not be in advance of the since of other nearby objects. For example, take two objects in the table. At the first query, object A has a since of 10, object B has a since of 20. A first query in the transaction `SELECT * FROM A` would choose 10 as its timestamp, then aquire read holds on A and B at 10. A later query in the transaction `SELECT * FROM B` would choose 10 as its timestamp (which it fetched from the in-progress transaction state), and panic because B's since is in advance of 10. One of the reasons this happened is the transaction metadata only held on to the TimestampContext. `EXPLAIN TIMESTAMP` needs a larger struct, a TimestampDetermination. Teach the session metadata to track this outer struct so EXPLAIN can use it. This allows us to now only call determine_timestamp once during transaction start, instead of during each query in a transaction. Refactor some of the other read handling code to look more like it did pre-19026.
bf093bf
to
0b176dd
Compare
Just wanted to bump this comment. I'm still a bit confused about this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I don't have an explanation about why this error happened or why this PR no longer is producing that error. My best guess is that those objects were dropped in other connections, and because the read hold was made for a time in the future, the controller thought it was safe to remove them. Then later when a query comes along for them, they don't exist. |
In #19026 various timestamp functions were refactored to allow
EXPLAIN TIMESTAMP
to reflect (and effect) the current transaction. This is tricky code and the refactor incorrectly changed the behavior of transactions. In a transaction our intention is to find a timestamp during the first statement that will be valid for any other object a user might query later in the transaction, then aquire read holds on all those referenced objects. In that change, the first statement only used the referenced IDs when determining a timestamp, not all nearby (objects in the same schema) IDs. Read holds were still correctly aquired on all nearby objects.This could cause the transaction's timestamp to not be in advance of the since of other nearby objects. For example, take two objects in the table. At the first query, object A has a since of 10, object B has a since of 20. A first query in the transaction
SELECT * FROM A
would choose 10 as its timestamp, then aquire read holds on A and B at 10. A later query in the transactionSELECT * FROM B
would choose 10 as its timestamp (which it fetched from the in-progress transaction state), and panic because B's since is in advance of 10.One of the reasons this happened is the transaction metadata only held on to the TimestampContext.
EXPLAIN TIMESTAMP
needs a larger struct, a TimestampDetermination. Teach the session metadata to track this outer struct so EXPLAIN can use it. This allows us to now only call determine_timestamp once during transaction start, instead of during each query in a transaction. Refactor some of the other read handling code to look more like it did pre-19026.After this code, the test repro from 19743 no longer causes crashes.
Fixes #19743
Fixes #19897
Fixes #19398
Motivation
Tips for reviewer
materialize/src/adapter/src/coord/sequencer/inner.rs
Line 2242 in 0e5efd4
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.