Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-44822][PYTHON][SQL] Make Python UDTFs by default non-deterministic #42519

Conversation

allisonwang-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR changes the default determinism of Python UDTFs from True to False.

Why are the changes needed?

To prevent potential regressions as many Python UDTFs are often used as non-deterministic UDTFs. Users can always mark them as deterministic.

Does this PR introduce any user-facing change?

No. Python UDTF is a new feature that is not yet released.

How was this patch tested?

Existing and new tests

@allisonwang-db
Copy link
Contributor Author

allisonwang-db commented Aug 16, 2023

cc @ueshin @cloud-fan

@cloud-fan
Copy link
Contributor

cloud-fan commented Aug 17, 2023

thanks, merging to master/3.5!

@cloud-fan cloud-fan closed this in fce83d4 Aug 17, 2023
cloud-fan pushed a commit that referenced this pull request Aug 17, 2023
…stic

This PR changes the default determinism of Python UDTFs from `True` to `False`.

To prevent potential regressions as many Python UDTFs are often used as non-deterministic UDTFs. Users can always mark them as deterministic.

No. Python UDTF is a new feature that is not yet released.

Existing and new tests

Closes #42519 from allisonwang-db/spark-44822-non-det-by-default.

Authored-by: allisonwang-db <allison.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit fce83d4)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
wangyum pushed a commit that referenced this pull request Aug 18, 2023
…lts of the udtf tests

### What changes were proposed in this pull request?

This is a follow up for #42517.
We need to re-generate the analyzer results for udtf tests after #42519 is merged. Also updated PythonUDTFSuite after #42520 is merged.

### Why are the changes needed?

To fix test failures

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Test only change

Closes #42543 from allisonwang-db/spark-44834-fix.

Authored-by: allisonwang-db <allison.wang@databricks.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
wangyum pushed a commit that referenced this pull request Aug 18, 2023
…lts of the udtf tests

### What changes were proposed in this pull request?

This is a follow up for #42517.
We need to re-generate the analyzer results for udtf tests after #42519 is merged. Also updated PythonUDTFSuite after #42520 is merged.

### Why are the changes needed?

To fix test failures

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Test only change

Closes #42543 from allisonwang-db/spark-44834-fix.

Authored-by: allisonwang-db <allison.wang@databricks.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(cherry picked from commit bb41cd8)
Signed-off-by: Yuming Wang <yumwang@ebay.com>
valentinp17 pushed a commit to valentinp17/spark that referenced this pull request Aug 24, 2023
…stic

### What changes were proposed in this pull request?

This PR changes the default determinism of Python UDTFs from `True` to `False`.

### Why are the changes needed?

To prevent potential regressions as many Python UDTFs are often used as non-deterministic UDTFs. Users can always mark them as deterministic.

### Does this PR introduce _any_ user-facing change?

No. Python UDTF is a new feature that is not yet released.

### How was this patch tested?

Existing and new tests

Closes apache#42519 from allisonwang-db/spark-44822-non-det-by-default.

Authored-by: allisonwang-db <allison.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
valentinp17 pushed a commit to valentinp17/spark that referenced this pull request Aug 24, 2023
…lts of the udtf tests

### What changes were proposed in this pull request?

This is a follow up for apache#42517.
We need to re-generate the analyzer results for udtf tests after apache#42519 is merged. Also updated PythonUDTFSuite after apache#42520 is merged.

### Why are the changes needed?

To fix test failures

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Test only change

Closes apache#42543 from allisonwang-db/spark-44834-fix.

Authored-by: allisonwang-db <allison.wang@databricks.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
ragnarok56 pushed a commit to ragnarok56/spark that referenced this pull request Mar 2, 2024
…stic

### What changes were proposed in this pull request?

This PR changes the default determinism of Python UDTFs from `True` to `False`.

### Why are the changes needed?

To prevent potential regressions as many Python UDTFs are often used as non-deterministic UDTFs. Users can always mark them as deterministic.

### Does this PR introduce _any_ user-facing change?

No. Python UDTF is a new feature that is not yet released.

### How was this patch tested?

Existing and new tests

Closes apache#42519 from allisonwang-db/spark-44822-non-det-by-default.

Authored-by: allisonwang-db <allison.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
ragnarok56 pushed a commit to ragnarok56/spark that referenced this pull request Mar 2, 2024
…lts of the udtf tests

### What changes were proposed in this pull request?

This is a follow up for apache#42517.
We need to re-generate the analyzer results for udtf tests after apache#42519 is merged. Also updated PythonUDTFSuite after apache#42520 is merged.

### Why are the changes needed?

To fix test failures

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Test only change

Closes apache#42543 from allisonwang-db/spark-44834-fix.

Authored-by: allisonwang-db <allison.wang@databricks.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants