Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs #46451

Closed
wants to merge 4 commits into from

Conversation

dtenedor
Copy link
Contributor

@dtenedor dtenedor commented May 7, 2024

What changes were proposed in this pull request?

This PR improves the error message when a table-valued function call has a TABLE argument with a PARTITION BY or ORDER BY clause with more than one associated expression, but forgets parentheses around them.

For example:

SELECT * FROM testUDTF(
  TABLE(SELECT 1 AS device_id, 2 AS data_ds)
  WITH SINGLE PARTITION
  ORDER BY device_id, data_ds)

This query previously returned an obscure, unrelated error:

[UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT] Unsupported subquery expression: Table arguments are used in a function where they are not supported:
'UnresolvedTableValuedFunction [tvf], [table-argument#338 [], 'data_ds](https://issues.apache.org/jira/browse/SPARK-48180#338%20[],%20'data_ds), false
   +- Project [1 AS device_id#336, 2 AS data_ds#337](https://issues.apache.org/jira/browse/SPARK-48180#336,%202%20AS%20data_ds#337)
      +- OneRowRelation

Now it returns a reasonable error:

The table function call includes a table argument with an invalid partitioning/ordering specification: the ORDER BY clause included multiple expressions without parentheses surrounding them; please add parentheses around these expressions and then retry the query again. (line 4, pos 2)

== SQL ==

SELECT * FROM testUDTF(
  TABLE(SELECT 1 AS device_id, 2 AS data_ds)
  WITH SINGLE PARTITION
--^^^
  ORDER BY device_id, data_ds)

Why are the changes needed?

Here we improve error messages for common SQL syntax mistakes.

Does this PR introduce any user-facing change?

Yes, see above.

How was this patch tested?

This PR adds test coverage.

Was this patch authored or co-authored using generative AI tooling?

No

@dtenedor
Copy link
Contributor Author

dtenedor commented May 7, 2024

cc @ueshin @allisonwang-db

@dtenedor
Copy link
Contributor Author

dtenedor commented May 8, 2024

@HyukjinKwon I fixed the test failures, it should work now :)

Copy link
Contributor

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

|SELECT * FROM testUDTF(
| TABLE(SELECT 1 AS device_id, 2 AS data_ds)
| WITH SINGLE PARTITION
| ORDER BY device_id, data_ds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the correct syntax be ORDER BY (device_id, date_ds)? Do we want to add an example in the error message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should be the correct syntax.
Interesting idea...trying this out with some simple and complex cases, I am a bit scared to copy/paste the entire provided ORDER BY clause into the error message since it could be very long if there are many columns/complex expressions. But the new error message specifically indicates to add parentheses around the expressions, it should be pretty clear (see L381-L384 below).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM!

@HyukjinKwon
Copy link
Member

Merged to master.

JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
…s parentheses around multiple PARTITION/ORDER BY exprs

### What changes were proposed in this pull request?

This PR improves the error message when a table-valued function call has a TABLE argument with a PARTITION BY or ORDER BY clause with more than one associated expression, but forgets parentheses around them.

For example:

```
SELECT * FROM testUDTF(
  TABLE(SELECT 1 AS device_id, 2 AS data_ds)
  WITH SINGLE PARTITION
  ORDER BY device_id, data_ds)
```

This query previously returned an obscure, unrelated error:

```
[UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT] Unsupported subquery expression: Table arguments are used in a function where they are not supported:
'UnresolvedTableValuedFunction [tvf], [table-argument#338 [], 'data_ds](https://issues.apache.org/jira/browse/SPARK-48180#338%20[],%20'data_ds), false
   +- Project [1 AS device_id#336, 2 AS data_ds#337](https://issues.apache.org/jira/browse/SPARK-48180#336,%202%20AS%20data_ds#337)
      +- OneRowRelation
```

Now it returns a reasonable error:

```
The table function call includes a table argument with an invalid partitioning/ordering specification: the ORDER BY clause included multiple expressions without parentheses surrounding them; please add parentheses around these expressions and then retry the query again. (line 4, pos 2)

== SQL ==

SELECT * FROM testUDTF(
  TABLE(SELECT 1 AS device_id, 2 AS data_ds)
  WITH SINGLE PARTITION
--^^^
  ORDER BY device_id, data_ds)
```

### Why are the changes needed?

Here we improve error messages for common SQL syntax mistakes.

### Does this PR introduce _any_ user-facing change?

Yes, see above.

### How was this patch tested?

This PR adds test coverage.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#46451 from dtenedor/udtf-analyzer-bug.

Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants