Skip to content

[SPARK-51688][PYTHON][FOLLOW-UP] Implement UDS in Accumulators#50587

Closed
HyukjinKwon wants to merge 2 commits intoapache:masterfrom
HyukjinKwon:SPARK-51688-followup
Closed

[SPARK-51688][PYTHON][FOLLOW-UP] Implement UDS in Accumulators#50587
HyukjinKwon wants to merge 2 commits intoapache:masterfrom
HyukjinKwon:SPARK-51688-followup

Conversation

@HyukjinKwon
Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This PR is a followup of #50466 that enables Unix Domain Sockets in Accumulators as well. This will be the last PR to complete UDS in PySpark.

Why are the changes needed?

This was not handled in the original PR as it was a bit complicated. Was separated out. See the original PR.

Does this PR introduce any user-facing change?

See #50466

How was this patch tested?

CI in this PR with enabling it by default. Also will set up a scheduled build through #50585

Was this patch authored or co-authored using generative AI tooling?

No.

@HyukjinKwon HyukjinKwon changed the title Spark 51688 followup [SPARK-51688][PYTHON][FOLLOW-UP] Implement UDS in Accumulators Apr 15, 2025
@HyukjinKwon
Copy link
Copy Markdown
Member Author

cc @ueshin

@HyukjinKwon HyukjinKwon force-pushed the SPARK-51688-followup branch from f906f50 to 384a3ae Compare April 15, 2025 04:21
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will disable this back when I merge. It will be disabled by default.

@HyukjinKwon HyukjinKwon force-pushed the SPARK-51688-followup branch from 384a3ae to 2817ba4 Compare April 15, 2025 04:34
@HyukjinKwon HyukjinKwon force-pushed the SPARK-51688-followup branch from 2817ba4 to 489e564 Compare April 15, 2025 04:36
@HyukjinKwon
Copy link
Copy Markdown
Member Author

HyukjinKwon commented Apr 15, 2025

Tests with the configuration enabled: https://github.com/HyukjinKwon/spark/actions/runs/14461375675

Copy link
Copy Markdown
Contributor

@zhengruifeng zhengruifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to also add jobs core/sql/etc in the new scheduled job https://github.com/apache/spark/pull/50585/files ?

seems core is already tested https://github.com/apache/spark/actions/runs/14462775938/job/40558388704

@HyukjinKwon
Copy link
Copy Markdown
Member Author

Merged to master.

Copy link
Copy Markdown
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Late LGTM.

HyukjinKwon added a commit that referenced this pull request Apr 17, 2025
…variable in SparkContext initialization

### What changes were proposed in this pull request?

This PR is a followup of #50587 that makes the SparkContext initialization to respect `PYSPARK_UDS_MODE` environment variable being used in the scheduled build.

### Why are the changes needed?

To make the scheduled build passing (https://github.com/apache/spark/actions/workflows/build_uds.yml)

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Manually tested via:

```bash
PYSPARK_UDS_MODE=true ./python/run-tests --python-executables=python3 --testnames "pyspark.sql.tests.test_udf UDFTests.test_same_accumulator_in_udfs"
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #50616 from HyukjinKwon/SPARK-51800-followup.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
if (socket == null || !socket.isOpen) {
if (isUnixDomainSock) {
socket = SocketChannel.open(UnixDomainSocketAddress.of(socketPath.get))
logInfo(log"Connected to AccumulatorServer at socket: ${MDC(SOCKET_ADDRESS, serverHost)}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serverHost.get

logInfo(log"Connected to AccumulatorServer at socket: ${MDC(SOCKET_ADDRESS, serverHost)}")
} else {
socket = SocketChannel.open(new InetSocketAddress(serverHost.get, serverPort.get))
logInfo(log"Connected to AccumulatorServer at host: ${MDC(HOST, serverHost)}" +
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serverHost.get

} else {
socket = SocketChannel.open(new InetSocketAddress(serverHost.get, serverPort.get))
logInfo(log"Connected to AccumulatorServer at host: ${MDC(HOST, serverHost)}" +
log" port: ${MDC(PORT, serverPort)}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serverPort.get

@HyukjinKwon
Copy link
Copy Markdown
Member Author

Thanks, made a Pr at #50648

HyukjinKwon added a commit that referenced this pull request Apr 21, 2025
### What changes were proposed in this pull request?

This PR is a followup of #50587 that logs strings properly instead of `Option`s

### Why are the changes needed?

To log strings properly instead of option instances.

### Does this PR introduce _any_ user-facing change?

No the main change has not been released yet.

### How was this patch tested?

Manually.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #50648 from HyukjinKwon/SPARK-51688-followup3.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants