Skip to content

[SPARK-50718][PYTHON] Support addArtifact(s) for PySpark#49572

Closed
itholic wants to merge 5 commits intoapache:masterfrom
itholic:add_artifacts
Closed

[SPARK-50718][PYTHON] Support addArtifact(s) for PySpark#49572
itholic wants to merge 5 commits intoapache:masterfrom
itholic:add_artifacts

Conversation

@itholic
Copy link
Contributor

@itholic itholic commented Jan 20, 2025

What changes were proposed in this pull request?

This PR proposes to support addArtifact(s) for PySpark

Why are the changes needed?

For feature parity with Spark Connect

Does this PR introduce any user-facing change?

No API changes, but adding new API addArtifact(s)

How was this patch tested?

Added corresponding UTs with Spark Connect

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added DOCS and removed CONNECT labels Jan 20, 2025
import os
import tempfile

from pyspark.sql.tests.connect.client.test_artifact import ArtifactTestsMixin
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pyspark.sql.tests.connect.client.test_artifact.ArtifactTestsMixin here to ensure the same testing as Spark Connect

Copy link
Member

@HyukjinKwon HyukjinKwon Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ArtifactTestsMixin has to be located to here.

Copy link
Contributor Author

@itholic itholic Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could I address the relocation as a follow-up? I have some plan to restructure overall test_artifact.py across Classic and Connect.

with open(pyfile_path, "w+") as f:
f.write("my_func = lambda: 11")

with self.assertRaises(PySparkRuntimeError) as pe:
Copy link
Contributor Author

@itholic itholic Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Spark Connect raises SparkConnectGrpcException instead of PySparkRuntimeError here.

I think it would be good to find more cases like this and capture them as a same base error to keep consistency between Classic and Connect.

if os.path.exists(target_dir):
# Compare the contents of the files. If identical, skip adding this file.
# If different, raise an exception.
if filecmp.cmp(normalized_path, target_dir, shallow=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, is this the same behaviour with Spark Classic, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this matches behavior with Spark Connect Python client and Spark Core

@itholic itholic closed this in 48acded Jan 21, 2025
itholic added a commit to itholic/spark that referenced this pull request Jan 21, 2025
### What changes were proposed in this pull request?

This PR proposes to support `addArtifact(s)` for PySpark

### Why are the changes needed?

For feature parity with Spark Connect

### Does this PR introduce _any_ user-facing change?

No API changes, but adding new API `addArtifact(s)`

### How was this patch tested?

Added corresponding UTs with Spark Connect

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#49572 from itholic/add_artifacts.

Authored-by: Haejoon Lee <haejoon.lee@databricks.com>
Signed-off-by: Haejoon Lee <haejoon.lee@databricks.com>
@itholic
Copy link
Contributor Author

itholic commented Jan 21, 2025

Merged to master and created separate PR for branch-4.0: #49583.

Thanks @HyukjinKwon for the review.

itholic added a commit that referenced this pull request Jan 21, 2025
### What changes were proposed in this pull request?

This PR proposes to support `addArtifact(s)` for PySpark

Cherry-pick #49572 for 4.0

### Why are the changes needed?

For feature parity with Spark Connect

### Does this PR introduce _any_ user-facing change?

No API changes, but adding new API `addArtifact(s)`

### How was this patch tested?

Added corresponding UTs with Spark Connect

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #49583 from itholic/add_artifacts_4.0.

Authored-by: Haejoon Lee <haejoon.lee@databricks.com>
Signed-off-by: Haejoon Lee <haejoon.lee@databricks.com>
HyukjinKwon pushed a commit that referenced this pull request Apr 16, 2025
…t_artifact`

### What changes were proposed in this pull request?
this test was added in #49572, but never enabled in ci

### Why are the changes needed?
test coverage

### Does this PR introduce _any_ user-facing change?
no, test-only

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #50601 from zhengruifeng/add_missing_test_artifact.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 14, 2025
### What changes were proposed in this pull request?

This PR proposes to support `addArtifact(s)` for PySpark

Cherry-pick apache#49572 for 4.0

### Why are the changes needed?

For feature parity with Spark Connect

### Does this PR introduce _any_ user-facing change?

No API changes, but adding new API `addArtifact(s)`

### How was this patch tested?

Added corresponding UTs with Spark Connect

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#49583 from itholic/add_artifacts_4.0.

Authored-by: Haejoon Lee <haejoon.lee@databricks.com>
Signed-off-by: Haejoon Lee <haejoon.lee@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants