New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: DistSQL processors can use a Leaf txn without collecting its metadata #41222
Comments
@andreimatei can you explain more what the effects of "not collecting the metadata" can be. Does this mean that we can "lose a read" and fail to populate the ts cache, or something similar? Is it possible to observe transaction anomalies, for example a query to system.descriptors via a built-in function fail to mark the txn to serialize properly with concurrent DDL? Are there other risks? (I am trying to assess severity for the release) cc @jordanlewis |
Technically the consequences of not collecting that metadata are that the transaction can succeed in refreshing without actually refreshing the reads performed through the leaf transaction whose metadata was not collected. This can result in write skew - so a failure of serializability (albeit not the worst one conceivable). Now, I'm not completely sure that problems can manifest themselves today. In order for badness to occur, you need to a) run one of these builtin functions that uses the txn in a leaf txn and b) not collect the metadata for that leaf txn. It might be the case that today we plan everything that uses such builtins entirely on the gateway (I'm not sure). Or even if we don't plan them on the gateway, as long as there is a I've filed the issue so I can reference it from new code in #41102. That PR might have been making things worse by introducing more leaves on the gateway. But now the future of that PR seems uncertain. |
The main thing I'm worried about is things like |
We have marked this issue as stale because it has been inactive for |
still current |
We have marked this issue as stale because it has been inactive for |
@yuzefovich @rharding6373 @DrewKimball I think we fixed this right? |
No, I think it's still current. In particular, this part
requires further investigation to see whether there are cases like this. I believe that
saves us in vast majority of cases. Namely, we generally use a single |
When using a Leaf txn, someone needs to collect its metadata and pass it through to the DistSQL receiver which merges it with the Root's metadata. A handful of processors (e.g. the
TableReader
) do this by collecting the leaf metadata when draining. However, this was only done for processors directly use the transaction.But I believe any processor can use the transaction through its "render expressions" and such by invoking built-in functions. These functions can use the transaction through the
EvalCtx
. And so I think it might be possible for a txn to be used without anyone collecting its metadata.Perhaps at the moment that's not actually happening because processors share their transaction objects (and also share it with the EvalCtx), but I'm moving towards less sharing. So the thing, if not broken already, is very fragile.
Separately, our collecting of txn metadata is very haphazard. It's possible for multiple processors to collect the same piece of metadata from a shared txn. We should figure out a more principled story.
Jira issue: CRDB-5476
The text was updated successfully, but these errors were encountered: