Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: propagate result types to inputs during set ops with nulls and tuples #60827

Merged
merged 2 commits into from Feb 23, 2021

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Feb 20, 2021

sql: move MergeResultTypes from physicalplan package and unexport

Release note: None

sql: propagate result types to inputs in set ops with nulls and tuples

The physical planning for set operations is special because it allows
for the result types from its inputs to be of different type family in
some cases. Namely, it is ok if one input has types.Unknown whereas
the other input is of some "known" type. In order to handle such case
the execbuilder plans casts; however, currently it is not possible to
plan casts to Tuple type. As a result, the vectorized engine is not
able to execute the query because it expects the data coming from both
inputs to be of the same type.

This commit modifies the logic of reconciling the result types from both
inputs to additionally update unknown types on the inputs if the other
input is of a Tuple type. Such behavior is acceptable because in the
original plan we only expected to have NULL values, and the Tuple type
is able to handle such case too.

The problem can also be reproduced in a logic test with the row-by-row
engine in the fakedist setting. This bug was introduced in
a76ee31, and before that change we had
the following behavior: overwrite plan.ResultTypes to the merged types
and then plan a stage of distinct processors if we have UNION. This
resulted in the input spec for the input stream to distinct having
correctly set types. After the change, that was no longer the case.

Before that change the vectorized engine couldn't handle such queries
due to type mismatch during SupportsVectorized check, so we fell back
to the row-by-row engine. However, this commit makes it so that the
vectorized engine is able to execute such queries.

Fixes: #59611.

Release note (bug fix): CockroachDB previously could encounter an
internal error when performing UNION operation when the first input
resulted only in NULL values and consequent inputs produce tuples, and
this is now fixed. Only 21.1 alpha versions are affected.

@yuzefovich yuzefovich requested review from RaduBerinde and a team February 20, 2021 02:29
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the set-op-unknown branch 2 times, most recently from de86750 to cfd7881 Compare February 20, 2021 03:39
Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the optimizer add casts when one side is Unknown and one side isn't? At the very least, the quoted example SELECT NULL UNION SELECT 1 doesn't have a problem:

demo@127.0.0.1:26257/defaultdb> explain (types) select a from (select null) as t(a) union select 1;
                            info
-------------------------------------------------------------
  distribution: local
  vectorized: false

  • union
  │ columns: (a int)
  │ estimated row count: 2
  │
  ├── • values
  │     columns: (a int)
  │     size: 1 column, 1 row
  │     row 0, expr 0: (CAST((NULL)[unknown] AS INT8))[int]
  │
  └── • values
        columns: ("?column?" int)
        size: 1 column, 1 row
        row 0, expr 0: (1)[int]

The problem in the example is that we can't plan casts to tuple:

// TODO(radu): casts to Tuple are not supported (they can't be serialized

If we want to make this fix until that problem is solved, that's ok, but it should be clearly documented that it only addresses this narrow case.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@yuzefovich yuzefovich force-pushed the set-op-unknown branch 2 times, most recently from f7abd39 to 7b72e9a Compare February 22, 2021 18:19
The physical planning for set operations is special because it allows
for the result types from its inputs to be of different type family in
some cases. Namely, it is ok if one input has `types.Unknown` whereas
the other input is of some "known" type. In order to handle such case
the execbuilder plans casts; however, currently it is not possible to
plan casts to `Tuple` type. As a result, the vectorized engine is not
able to execute the query because it expects the data coming from both
inputs to be of the same type.

This commit modifies the logic of reconciling the result types from both
inputs to additionally update unknown types on the inputs if the other
input is of a Tuple type. Such behavior is acceptable because in the
original plan we only expected to have NULL values, and the Tuple type
is able to handle such case too.

The problem can also be reproduced in a logic test with the row-by-row
engine in the `fakedist` setting. This bug was introduced in
a76ee31, and before that change we had
the following behavior: overwrite `plan.ResultTypes` to the merged types
and then plan a stage of distinct processors if we have UNION. This
resulted in the input spec for the input stream to distinct having
correctly set types. After the change, that was no longer the case.

Before that change the vectorized engine couldn't handle such queries
due to type mismatch during `SupportsVectorized` check, so we fell back
to the row-by-row engine. However, this commit makes it so that the
vectorized engine is able to execute such queries.

Release note (bug fix): CockroachDB previously could encounter an
internal error when performing UNION operation when the first input
resulted only in NULL values and consequent inputs produce tuples, and
this is now fixed. Only 21.1 alpha versions are affected.
@yuzefovich yuzefovich changed the title sql: propagate result types to inputs during set operations sql: propagate result types to inputs during set ops with nulls and tuples Feb 22, 2021
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I think you're right, my initial explanation was too wide, updated.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@yuzefovich
Copy link
Member Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 23, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 23, 2021

Build succeeded:

@craig craig bot merged commit da99e33 into cockroachdb:master Feb 23, 2021
@yuzefovich yuzefovich deleted the set-op-unknown branch February 23, 2021 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: sqlsmith cannot use tuple on datumVec of type unknown
3 participants