-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: hazardous use of descriptors in BIND outside of explicit transaction #64140
Comments
@ajwerner I'm assigning you since it looks like you have a PR in progress -- feel free to let us know if you'd like someone else to take a look |
@rytaft I took a stab on Friday. I don't think that approach has legs. It's a bit of a pickle I don't see an easy way out of. Fortunately it seems to mostly cause problems in edge cases involving regclass or user-defined types concurrent with schema changes. I'll place it back in a triage state if my approach doesn't pan out. |
There are scenarios where the transaction can be nil, which is also problematic @ajwerner. I'm wondering if we should even hold onto leases in that case? It can lead to problems with the read timestamp related failures inside UpdateDeadline |
for the case of resolving a name into an OID/regclass, perhaps we could special-case the placeholder typing logic as a follow-up to this PR #60949 to guide our solution for that and the other issues, i will throw in a plug for our datadriven pgwire tests. you can write the exact sequence of pgwire commands you want and run them against either cockroachdb or postgres. e.g. look at |
[written in parallel with the above comment] Discussed offline with @rafiss. In the short term, the main thing we want to fix is the panic when you PREPARE/BIND something referencing UDTs or regclasses as the first thing you do on a connection. We can hack that by literally just putting a committed txn in place if txn is nil. The next step is likely to revert part of #60949 to not resolve regclass or regtype values to their numeric value. If we did that, then I think we'd fix nearly all of the correctness issues. The remaining correctness issue would be to not convert the enum values from their logical to their physical representation until execution. I think that'd solve ~all of the correctness problems. It still leaves us in a weird place where we can hit odd type mismatch errors when swapping type names (I don't care all that much about that case). The more glaring issue it leaves us with is a bizarre use of the descs.Collection while there is no open transaction. As we move forward, the descs.Collection is gaining a tighter coupling to the transaction. We'll need to negotiate that somehow. I'll pick up the panic fix and leave the rest for follow-up. |
i was curious myself about what postgres did for regclass placeholders so i tried it out. it turns out that postgres will handle
so i think reverting that part of #60949 could just bring back the incompatibility that caused us to write it in the first place -- a client might get confused that the placeholder is a different type than it expects. the other thing i noticed is that we should try to do better error handling for cases where the regclass resolution fails. e.g. if we try to bind a non-existent table name for the placeholder, cockroachdb returns anyway, for the fixes for regclass placeholders, feel free to hand off to sql-experience. |
Heh this is exactly the issue! If the descriptor doesn't exist for leasing we attempt to read it from the store using the bogus finalized transaction :). If we fixed the root cause here, we'd return the right error. |
I think what I'd want is to change is that where we call The above approach doesn't solve the type problems. |
That makes a lot of sense in retrospect! |
This further paper's over the rough edges of not resetting the planner state to keep it in sync with the connExecutor state. The test used to panic before this change. Touches cockroachdb#64140. Release note (bug fix): Fixed a bug which could cause a panic when running a EXECUTE of a previously PREPARE'd statement with a REGCLASS, REGTYPE parameter or a user-defined type argument after running BEGIN AS OF SYSTEM TIME with an invalid timestamp.
65083: kvserver: speed up TestRollbackSyncRangedIntentResolution r=tbg a=erikgrinaker Release note: None 65103: schemaexpr: remove expr_filter.go r=mgartner a=mgartner This commit removes the `pkg/sql/catalog/schemaexpr/expr_filter.go` file. The `schemaexpr` package is intended to contain logic for dealing with expressions defined in a schema, such as check constraints, computed columns, and partial index predicates. The two exported functions in the file, `RemapIVarsInTypedExpr` and `RunFilter` did not fit this theme. They have been moved elsewhere. Release note: None 65108: sql: minor fixes to fix panics related to type resolution and bad planner usage r=otan a=ajwerner See individual commits. The first commit matters but does not leave one with a good feeling. The second one is more copacetic. Relates to #64140. Fixes #64975. Release note (bug fix): Fixed a bug which could cause a panic when running a EXECUTE of a previously PREPARE'd statement with a REGCLASS, REGTYPE parameter or a user-defined type argument after running BEGIN AS OF SYSTEM TIME with an invalid timestamp. Release note (bug fix): Fixed a bug which could cause a panic when issuing a query referencing a user-defined type as a placeholder as the first operation on a new connection. 65130: cli/flags.go: fix typo in comment r=rauchenstein a=knz Thanks to @stevendanna for spotting this. Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com> Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: Andrew Werner <awerner32@gmail.com> Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
This further paper's over the rough edges of not resetting the planner state to keep it in sync with the connExecutor state. The test used to panic before this change. Touches cockroachdb#64140. Release note (bug fix): Fixed a bug which could cause a panic when running a EXECUTE of a previously PREPARE'd statement with a REGCLASS, REGTYPE parameter or a user-defined type argument after running BEGIN AS OF SYSTEM TIME with an invalid timestamp.
This further paper's over the rough edges of not resetting the planner state to keep it in sync with the connExecutor state. The test used to panic before this change. Touches cockroachdb#64140. Release note (bug fix): Fixed a bug which could cause a panic when running a EXECUTE of a previously PREPARE'd statement with a REGCLASS, REGTYPE parameter or a user-defined type argument after running BEGIN AS OF SYSTEM TIME with an invalid timestamp.
The code which deals with resolving placeholder types and OIDs is totally bogus when it comes to the transaction and leasing life-cycle. This has been discussed at some length in cockroachdb#64140. The enum test in cockroachdb#64725 surfaced another problematic case whereby the bad bind is combined with an AS OF SYSTEM TIME. We deal with this case by dropping any leases which might have been acquired by the bind operation. There are some edge cases where the types may mismatch, the values may be bogus, or an OID may be resolved to the wrong value. However, the first few of these *should* be detected and *should* be hard to hit. We're going to need to do something about all of this in 21.2 anyway, howver, this fix here is good for backport to 21.1 and 20.2. Also, this change converted the fatal to an assertion failure error. Fixes cockroachdb#65136. Relates to cockroachdb#64140. Release note (bug fix): Fixed a bug whereby using an enum value as a placeholder in an AS OF SYSTEM TIME query which precedes a recent change to that enum could result in a fatal error.
The code which deals with resolving placeholder types and OIDs is totally bogus when it comes to the transaction and leasing life-cycle. This has been discussed at some length in cockroachdb#64140. The enum test in cockroachdb#64725 surfaced another problematic case whereby the bad bind is combined with an AS OF SYSTEM TIME. We deal with this case by dropping any leases which might have been acquired by the bind operation. There are some edge cases where the types may mismatch, the values may be bogus, or an OID may be resolved to the wrong value. However, the first few of these *should* be detected and *should* be hard to hit. We're going to need to do something about all of this in 21.2 anyway, howver, this fix here is good for backport to 21.1 and 20.2. Also, this change converted the fatal to an assertion failure error. Fixes cockroachdb#65136. Relates to cockroachdb#64140. Release note (bug fix): Fixed a bug whereby using an enum value as a placeholder in an AS OF SYSTEM TIME query which precedes a recent change to that enum could result in a fatal error.
65620: sql: prevent assertion due to bad use of leases in bind r=andreimatei a=ajwerner The code which deals with resolving placeholder types and OIDs is totally bogus when it comes to the transaction and leasing life-cycle. This has been discussed at some length in #64140. The enum test in #64725 surfaced another problematic case whereby the bad bind is combined with an AS OF SYSTEM TIME. We deal with this case by dropping any leases which might have been acquired by the bind operation. There are some edge cases where the types may mismatch, the values may be bogus, or an OID may be resolved to the wrong value. However, the first few of these *should* be detected and *should* be hard to hit. We're going to need to do something about all of this in 21.2 anyway, howver, this fix here is good for backport to 21.1 and 20.2. Also, this change converted the fatal to an assertion failure error. Fixes #65136. Relates to #64140. Release note (bug fix): Fixed a bug whereby using an enum value as a placeholder in an AS OF SYSTEM TIME query which precedes a recent change to that enum could result in a fatal error. Co-authored-by: Andrew Werner <awerner32@gmail.com>
Moving this over to SQL Experience since it seems like the remaining issues will fall to that team. Send it back if you need any help from SQL Queries - thanks! |
@jameswsj10 your work to avoid the fallback reading from store by using export requests to find historical versions should be able to address the weirdness in #64140 (comment). It's just another band-aid over a bigger problem, but it's a good one. |
71632: sql: make sure pgwire bind always happens in a transaction r=otan a=rafiss fixes #70378 and maybe #64140 The approach of using a transaction matches what we do for pgwire Parse messages already. This is important to make sure that user-defined types are leased correctly. This also updated the SQL EXECUTE command to resolve user-defined types so that it gets the latest changes. Release note (bug fix): Adding new values to a user-defined enum type will previously would cause a prepared statement using that type to not work. This now works as expected. 71836: clusterversion: introduce 22.1 development versions r=celiala a=celiala Fix: #69828 Looking at past Version values for `Start{XX_X}`, I think the `Version` value for `Key: Start22_1` should instead be `Major: 21, Minor: 2, ...`. Start21_1 (from #70268): ``` // v21.1 versions. Internal versions defined here-on-forth must be even. { Key: Start21_1, Version: roachpb.Version{Major: 20, Minor: 2, Internal: 2}, }, ``` Start 21_2: ``` { Key: Start21_2, Version: roachpb.Version{Major: 21, Minor: 1, Internal: 1102}, }, ``` Release justification: Non-production code change. Release note: None 71856: bazel: skip recompilation when using dev+bazel r=irfansharif a=irfansharif Fixes #71835. When switching between using `dev` and `bazel` raw, I kept seeing our C++ protobuf dependency getting recompiled (slowly). It appears that the bazel build for protobuf has a dependency on $PATH (see bazelbuild/intellij#1169 and bazelbuild/bazel#7095). Specifying [`--incompatible_strict_action_env`](https://docs.bazel.build/versions/main/command-line-reference.html#flag--incompatible_strict_action_env) pins PATH and avoids the build cache thrashing we were seeing before. Release note: None Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: Celia La <celiala456@gmail.com> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
I think #76792 completely resolves this, but please reopen if there are still concerns. |
Describe the problem
In SQL you can bind arguments to a prepared statement outside of a transaction. This generally happens in the binary wire protocol if I understand correctly. The client will prepare a statement and then it will bind the arguments to an unnamed portal and then it will execute the portal. The prepare is safe and fine because we'll re-acquire leases inside the implicit transaction when checking if the plan is stale. The bind, however, has a hazardous and unsafe properties.
When the type of argument to a bind is either a user-defined type or refers to an OID type by name, we resolve the type and value during while doing the
BIND
. This is problematic because it will use the transaction currently sitting on theconnExecutor
. This is generally the previously committed transaction (I suspect there may be cases where this is nil but I don't know). This happens here (with the regclass/regtype resolution happening underneathtree.ParseDOid
):cockroach/pkg/sql/conn_executor_prepare.go
Lines 368 to 381 in d91205f
At that point we'll attach a deadline due to leasing to an already committed transaction. If we attach the following sane invariant to the deadline code we get failures right and left:
Worse yet are cases where we cannot use a cached, leased descriptor, like a number of system tables. For example, the following test fails rather surprisingly with this error:
Possible Solution
It seems to me like we should move the conn executor into an open state when binding (or when creating a portal?). It feels weird to me to have a bound portal outside of a transaction. I may not know enough to know though. We do this already for example when processing an
ExecCmd
.Additional context
There are some edge cases where I think we may miss renames or interpret arguments as the wrong type. It's more just a general soundness problem.
Jira issue: CRDB-6916
The text was updated successfully, but these errors were encountered: