Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-43967][SQL][PYTHON] Add memory limits for Python UDTF analyzer #42328

Closed
wants to merge 4 commits into from

Conversation

ueshin
Copy link
Member

@ueshin ueshin commented Aug 3, 2023

What changes were proposed in this pull request?

Adds memory limits for Python UDTF analyzer.

  • spark.sql.analyzer.pythonUDTF.analyzeInPython.memory (None by default)

The amount of memory to be allocated to PySpark for Python UDTF analyzer, in MiB unless otherwise specified. If set, PySpark memory for Python UDTF analyzer will be limited to this amount. If not set, Spark will not limit Python's memory use and it is up to the application to avoid exceeding the overhead memory space shared with other non-JVM processes.
Note: Windows does not support resource limiting and actual resource is not limited on MacOS.

Why are the changes needed?

Python UDTF analyzer should be able to set a memory limit.

Does this PR introduce any user-facing change?

Users will be able to set the memory limit for Python UDTF analyzer.

How was this patch tested?

Existing tests.

@ueshin
Copy link
Member Author

ueshin commented Aug 3, 2023

cc @allisonwang-db @HyukjinKwon

@zhengruifeng
Copy link
Contributor

merged to master

there is merge conflict, seems need a separate PR for 3.5 (if needed) @ueshin

@zhengruifeng
Copy link
Contributor

oh, this config is added since version("4.0.0"), ignore ^

Copy link
Contributor

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

ragnarok56 pushed a commit to ragnarok56/spark that referenced this pull request Mar 2, 2024
### What changes were proposed in this pull request?

Adds memory limits for Python UDTF analyzer.

- `spark.sql.analyzer.pythonUDTF.analyzeInPython.memory` (`None` by default)

> The amount of memory to be allocated to PySpark for Python UDTF analyzer, in MiB unless otherwise specified. If set, PySpark memory for Python UDTF analyzer will be limited to this amount. If not set, Spark will not limit Python's memory use and it is up to the application to avoid exceeding the overhead memory space shared with other non-JVM processes.
Note: Windows does not support resource limiting and actual resource is not limited on MacOS.

### Why are the changes needed?

Python UDTF analyzer should be able to set a memory limit.

### Does this PR introduce _any_ user-facing change?

Users will be able to set the memory limit for Python UDTF analyzer.

### How was this patch tested?

Existing tests.

Closes apache#42328 from ueshin/issues/SPARK-44648/memory_limits.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants