Skip to content

[SPARK-47251][PYTHON] Block invalid types from the args argument for sql command#45361

Closed
ueshin wants to merge 1 commit into
apache:masterfrom
ueshin:issues/SPARK-47251/args
Closed

[SPARK-47251][PYTHON] Block invalid types from the args argument for sql command#45361
ueshin wants to merge 1 commit into
apache:masterfrom
ueshin:issues/SPARK-47251/args

Conversation

@ueshin
Copy link
Copy Markdown
Member

@ueshin ueshin commented Mar 1, 2024

What changes were proposed in this pull request?

Blocks invalid types from the args argument for sql command.

Additionally, uses a PySparkValueError instead of assertions in Spark Connect.

Why are the changes needed?

Currently the vanilla PySpark accepts any value, even set, which could cause unexpected results because set won't guarantee the item order.

>>> spark.sql("select ?, ?, ?", args={2,1,3}).show()
+---+---+---+
|  1|  2|  3|
+---+---+---+
|  1|  2|  3|
+---+---+---+

whereas with list:

>>> spark.sql("select ?, ?, ?", args=[2,1,3]).show()
+---+---+---+
|  2|  1|  3|
+---+---+---+
|  2|  1|  3|
+---+---+---+

Does this PR introduce any user-facing change?

Yes, the args argument for sql command will only accept None, dict, or list.

How was this patch tested?

Added the related tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@HyukjinKwon
Copy link
Copy Markdown
Member

Merged to master.

TakawaAkirayo pushed a commit to TakawaAkirayo/spark that referenced this pull request Mar 4, 2024
…r `sql` command

### What changes were proposed in this pull request?

Blocks invalid types from the `args` argument for `sql` command.

Additionally, uses a `PySparkValueError` instead of assertions in Spark Connect.

### Why are the changes needed?

Currently the vanilla PySpark accepts any value, even `set`, which could cause unexpected results because `set` won't guarantee the item order.

```py
>>> spark.sql("select ?, ?, ?", args={2,1,3}).show()
+---+---+---+
|  1|  2|  3|
+---+---+---+
|  1|  2|  3|
+---+---+---+
```

whereas with list:

```py
>>> spark.sql("select ?, ?, ?", args=[2,1,3]).show()
+---+---+---+
|  2|  1|  3|
+---+---+---+
|  2|  1|  3|
+---+---+---+
```

### Does this PR introduce _any_ user-facing change?

Yes, the `args` argument for `sql` command will only accept `None`, `dict`, or `list`.

### How was this patch tested?

Added the related tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45361 from ueshin/issues/SPARK-47251/args.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Mar 5, 2024
…r `sql` command

### What changes were proposed in this pull request?

Blocks invalid types from the `args` argument for `sql` command.

Additionally, uses a `PySparkValueError` instead of assertions in Spark Connect.

### Why are the changes needed?

Currently the vanilla PySpark accepts any value, even `set`, which could cause unexpected results because `set` won't guarantee the item order.

```py
>>> spark.sql("select ?, ?, ?", args={2,1,3}).show()
+---+---+---+
|  1|  2|  3|
+---+---+---+
|  1|  2|  3|
+---+---+---+
```

whereas with list:

```py
>>> spark.sql("select ?, ?, ?", args=[2,1,3]).show()
+---+---+---+
|  2|  1|  3|
+---+---+---+
|  2|  1|  3|
+---+---+---+
```

### Does this PR introduce _any_ user-facing change?

Yes, the `args` argument for `sql` command will only accept `None`, `dict`, or `list`.

### How was this patch tested?

Added the related tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45361 from ueshin/issues/SPARK-47251/args.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon pushed a commit that referenced this pull request Mar 5, 2024
…entation

### What changes were proposed in this pull request?

This is a follow-up of #45361.

Use `__name__` instead of string representation for the error parameter.

### Why are the changes needed?

To be consistent with other places.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Updated the related tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45393 from ueshin/issues/SPARK-47251/fup.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants