Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: correctly type UNION and CASE expressions based on implicit casts #75103

Open
Tracked by #75101
mgartner opened this issue Jan 18, 2022 · 2 comments
Open
Tracked by #75101
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) T-sql-queries SQL Queries Team

Comments

@mgartner
Copy link
Collaborator

mgartner commented Jan 18, 2022

We currently allow branches of UNION and CASE expressions to have different types as long as they are in the same family. This behavior is inconsistent with Postgres. Instead, we should only allow branches of UNION and CASE to have different types if implicit casts are allowed to the type of the first non-NULL branch from all other types.

See the Postgres docs on type conversion for UNION and CASE.

Union

The logic for resolving the types of UNION expressions is in determineUnionType. This logic attempts to mimic Postgres's logic for some important types (see #60560). It should be updated to only allow UNIONs of with different column types if a cast from one type to the other is allowed in an implicit context.

Case

The type checking logic for CASE expressions is here.

Epic CRDB-2474
Jira issue: CRDB-12463

@mgartner mgartner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jan 18, 2022
@mgartner
Copy link
Collaborator Author

Related: #74784 (comment)

mgartner added a commit to mgartner/cockroach that referenced this issue Jan 19, 2022
The newly introduced `castMap` does not contain entries for casts
between all combinations of REG* types, which is consistent with
Postgres, but inconsistent with behavior in versions up to 21.2 where
these casts are allowed.

The `castMap` changes result in more than just backward incompatibility.
We allow branches of CASE statements to be equivalent types (i.e., types
in the same family), like `REGCLASS` and `REGTYPE`, and we automatically
add casts to a query plan to support this. However, because these casts
don't exist in the `castMap`, internal errors are raised when we try to
fetch the volatility of the cast while building logical properties.

According to Postgres's type conversion rules for CASE, we should only
allow branches to be different types if they can be implicitly cast to
the first non-NULL branch. Implicit casts between REG* types are not
allowed, so CASE expressions with branches of different REG* types
should result in a user error like `CASE/WHEN could not convert type
regclass to regtype`. However, this is a much larger project and the
change will not be fully backward compatible. This work is tracked by
issue cockroachdb#75103.

For now, this commit adds casts between REG* types to the `castMap` to
maintain backward compatibility and prevent an internal error.

There is no release note because this bug does not exist in any
releases.

Fixes cockroachdb#74784

Release note: None
craig bot pushed a commit that referenced this issue Jan 19, 2022
74863: import: check readability earlier r=benbardin a=benbardin

Release note (sql change): Import now checks readability earlier for multiple files, to fail sooner if e.g. permissions are invalid.

74914: opt,tree: fix bugs with Next(), Prev(), and histogram calculation for DTimeTZ r=rytaft a=rytaft

**sql/sem/tree: fix Next() and Prev() for DTimeTZ**

Prior to this commit, the `DTimeTZ` functions `Next()` and `Prev()`
could skip over valid values according to the ordering of `DTimeTZ`
values in an index (which matches the ordering defined by the
`TimeTZ` functions `After()` and `Before()`).

This commit fixes these functions so that `Next()` now returns the smallest
valid `DTimeTZ` that is greater than the receiver, and `Prev()` returns
the largest valid `DTimeTZ` that is less than the receiver. This is
an important invariant that the optimizer relies on when building index
constraints.

Fixes #74912

Release note (bug fix): Fixed a bug that could occur when a `TIMETZ`
column was indexed, and a query predicate constrained that column using
a `<` or `>` operator with a `timetz` constant. If the column contained values
with time zones that did not match the time zone of the `timetz` constant,
it was possible that not all matching values could be returned by the
query. Specifically, the results may not have included values within one
microsecond of the predicate's absolute time. This bug was introduced
when the timetz datatype was first added in 20.1. It exists on all
versions of 20.1, 20.2, 21.1, and 21.2 prior to this patch.

**opt: fix bug in histogram calculation for TimeTZ**

This commit fixes a bug in the histogram estimation code for `TimeTZ`
that made the faulty assumption that `TimeTZ` values are ordered by
`TimeOfDay`. This is incorrect since it does not take the `OffsetSecs`
into account. As a result, it was possible to estimate that the size
of a histogram bucket was negative, which caused problems in the
statistics estimation code. This commit fixes the problem by taking
into account both `TimeOfDay` and `OffsetSecs` when estimating the size of
a bucket in a `TimeTZ` histogram.

Fixes #74667

Release note (bug fix): Fixed an internal error, "estimated row count must
be non-zero", that could occur during planning for queries over a table
with a `TimeTZ` column. This error was due to a faulty assumption in the
statistics estimation code about ordering of `TimeTZ` values, which has now
been fixed. The error could occur when `TimeTZ` values used in the query had
a different time zone offset than the `TimeTZ` values stored in the table.

75112: sql: fix casts between REG* types r=mgartner a=mgartner

The newly introduced `castMap` does not contain entries for casts
between all combinations of REG* types, which is consistent with
Postgres, but inconsistent with behavior in versions up to 21.2 where
these casts are allowed.

The `castMap` changes result in more than just backward incompatibility.
We allow branches of CASE statements to be equivalent types (i.e., types
in the same family), like `REGCLASS` and `REGTYPE`, and we automatically
add casts to a query plan to support this. However, because these casts
don't exist in the `castMap`, internal errors are raised when we try to
fetch the volatility of the cast while building logical properties.

According to Postgres's type conversion rules for CASE, we should only
allow branches to be different types if they can be implicitly cast to
the first non-NULL branch. Implicit casts between REG* types are not
allowed, so CASE expressions with branches of different REG* types
should result in a user error like `CASE/WHEN could not convert type
regclass to regtype`. However, this is a much larger project and the
change will not be fully backward compatible. This work is tracked by
issue #75103.

For now, this commit adds casts between REG* types to the `castMap` to
maintain backward compatibility and prevent an internal error.

There is no release note because this bug does not exist in any
releases.

Fixes #74784

Release note: None

75119: sql: deflake TestPerfLogging r=rytaft a=rytaft

This commit deflakes `TestPerfLogging` by ensuring that test cases
that should not produce log entries do not match with unrelated log
entries and thus cause the test to fail. This is ensured by making
the regex more precise for the specific test case.

Fixes #74811

Release note: None

75146: backupccl: "skip" TestChangefeedRestartDuringBackfill.. r=irfansharif a=irfansharif

under span configs. This test flakes pretty reliably after span configs
were enabled (#73876). Investigating this further is being tracked in
\#75080; lets have this test use the old subsystem for now (only down in
KV; we've narrowed down the failure to having something to do with
concurrent range splits, within the tenant keyspace, while a changefeed
is declared).

Release note: None

Co-authored-by: Ben Bardin <bardin@cockroachlabs.com>
Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
@mgartner mgartner added this to Triage in SQL Sessions - Deprecated via automation Jan 20, 2022
@mgartner mgartner added this to Triage in SQL Queries via automation Jan 20, 2022
@blathers-crl blathers-crl bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) T-sql-queries SQL Queries Team labels Jan 20, 2022
@mgartner mgartner moved this from Triage to Backlog in SQL Queries Jan 20, 2022
gtr pushed a commit to gtr/cockroach that referenced this issue Jan 24, 2022
The newly introduced `castMap` does not contain entries for casts
between all combinations of REG* types, which is consistent with
Postgres, but inconsistent with behavior in versions up to 21.2 where
these casts are allowed.

The `castMap` changes result in more than just backward incompatibility.
We allow branches of CASE statements to be equivalent types (i.e., types
in the same family), like `REGCLASS` and `REGTYPE`, and we automatically
add casts to a query plan to support this. However, because these casts
don't exist in the `castMap`, internal errors are raised when we try to
fetch the volatility of the cast while building logical properties.

According to Postgres's type conversion rules for CASE, we should only
allow branches to be different types if they can be implicitly cast to
the first non-NULL branch. Implicit casts between REG* types are not
allowed, so CASE expressions with branches of different REG* types
should result in a user error like `CASE/WHEN could not convert type
regclass to regtype`. However, this is a much larger project and the
change will not be fully backward compatible. This work is tracked by
issue cockroachdb#75103.

For now, this commit adds casts between REG* types to the `castMap` to
maintain backward compatibility and prevent an internal error.

There is no release note because this bug does not exist in any
releases.

Fixes cockroachdb#74784

Release note: None
@misiek08
Copy link

misiek08 commented Feb 8, 2022

If anyone can prepare some passing and failing test cases I can work on implementation :)

@rafiss rafiss moved this from Triage to Longer term backlog in SQL Sessions - Deprecated Feb 14, 2022
@exalate-issue-sync exalate-issue-sync bot removed the T-sql-queries SQL Queries Team label Aug 25, 2022
@mgartner mgartner moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries Jul 20, 2023
rytaft added a commit to rytaft/cockroach that referenced this issue Aug 8, 2023
Fixes cockroachdb#102110
Informs cockroachdb#75103
Informs cockroachdb#108360

Release note (bug fix): Fixed the type resolution logic for CASE statements
to more closely match Postgres' logic. In particular, we now adhere to rule
5 listed in https://www.postgresql.org/docs/current/typeconv-union-case.html,
which requires that we select the first non-unknown input type as the candidate
type, then consider each other non-unknown input type, left to right (CASE
treats its ELSE clause (if any) as the "first" input, with the THEN clauses(s)
considered after that). If the candidate type can be implicitly converted to
the other type, but not vice-versa, select the other type as the new candidate
type. Then continue considering the remaining inputs. If, at any stage of this
process, a preferred type is selected, stop considering additional inputs (note
that CockroachDB does not yet support the concept of a "preferred type").
rytaft added a commit to rytaft/cockroach that referenced this issue Aug 24, 2023
Fixes cockroachdb#102110
Informs cockroachdb#75103
Informs cockroachdb#108360

Release note (bug fix): Fixed the type resolution logic for CASE statements
to more closely match Postgres' logic. In particular, we now adhere to rule
5 listed in https://www.postgresql.org/docs/current/typeconv-union-case.html,
which requires that we select the first non-unknown input type as the candidate
type, then consider each other non-unknown input type, left to right (CASE
treats its ELSE clause (if any) as the "first" input, with the THEN clauses(s)
considered after that). If the candidate type can be implicitly converted to
the other type, but not vice-versa, select the other type as the new candidate
type. Then continue considering the remaining inputs. If, at any stage of this
process, a preferred type is selected, stop considering additional inputs (note
that CockroachDB does not yet support the concept of a "preferred type").
craig bot pushed a commit that referenced this issue Aug 24, 2023
108387: sql: fix typing of CASE statements to match Postgres r=rytaft a=rytaft

Fixes #102110
Informs #75103
Informs #108360

Release note (bug fix): Fixed the type resolution logic for `CASE` statements to more closely match Postgres' logic. In particular, we now adhere to rule 5 listed in https://www.postgresql.org/docs/current/typeconv-union-case.html, which requires that we select the first non-unknown input type as the candidate type, then consider each other non-unknown input type, left to right (`CASE` treats its `ELSE` clause (if any) as the "first" input, with the `THEN` clauses(s) considered after that). If the candidate type can be implicitly converted to the other type, but not vice-versa, select the other type as the new candidate type. Then continue considering the remaining inputs. If, at any stage of this process, a preferred type is selected, stop considering additional inputs (note that CockroachDB does not yet support the concept of a "preferred type").

Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
@yuzefovich yuzefovich added the T-sql-queries SQL Queries Team label May 2, 2024
@exalate-issue-sync exalate-issue-sync bot removed the T-sql-queries SQL Queries Team label May 2, 2024
@yuzefovich yuzefovich added the T-sql-queries SQL Queries Team label May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) T-sql-queries SQL Queries Team
Projects
Status: Backlog
SQL Queries
New Backlog
SQL Sessions - Deprecated
Longer term backlog
Development

No branches or pull requests

3 participants