[SPARK-50718][PYTHON] Support addArtifact(s) for PySpark#49572
[SPARK-50718][PYTHON] Support addArtifact(s) for PySpark#49572itholic wants to merge 5 commits intoapache:masterfrom
addArtifact(s) for PySpark#49572Conversation
| import os | ||
| import tempfile | ||
|
|
||
| from pyspark.sql.tests.connect.client.test_artifact import ArtifactTestsMixin |
There was a problem hiding this comment.
Use pyspark.sql.tests.connect.client.test_artifact.ArtifactTestsMixin here to ensure the same testing as Spark Connect
There was a problem hiding this comment.
I think ArtifactTestsMixin has to be located to here.
There was a problem hiding this comment.
Could I address the relocation as a follow-up? I have some plan to restructure overall test_artifact.py across Classic and Connect.
| with open(pyfile_path, "w+") as f: | ||
| f.write("my_func = lambda: 11") | ||
|
|
||
| with self.assertRaises(PySparkRuntimeError) as pe: |
There was a problem hiding this comment.
FYI: Spark Connect raises SparkConnectGrpcException instead of PySparkRuntimeError here.
I think it would be good to find more cases like this and capture them as a same base error to keep consistency between Classic and Connect.
| if os.path.exists(target_dir): | ||
| # Compare the contents of the files. If identical, skip adding this file. | ||
| # If different, raise an exception. | ||
| if filecmp.cmp(normalized_path, target_dir, shallow=False): |
There was a problem hiding this comment.
To be clear, is this the same behaviour with Spark Classic, right?
There was a problem hiding this comment.
Yes, this matches behavior with Spark Connect Python client and Spark Core
### What changes were proposed in this pull request? This PR proposes to support `addArtifact(s)` for PySpark ### Why are the changes needed? For feature parity with Spark Connect ### Does this PR introduce _any_ user-facing change? No API changes, but adding new API `addArtifact(s)` ### How was this patch tested? Added corresponding UTs with Spark Connect ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#49572 from itholic/add_artifacts. Authored-by: Haejoon Lee <haejoon.lee@databricks.com> Signed-off-by: Haejoon Lee <haejoon.lee@databricks.com>
|
Merged to master and created separate PR for branch-4.0: #49583. Thanks @HyukjinKwon for the review. |
### What changes were proposed in this pull request? This PR proposes to support `addArtifact(s)` for PySpark Cherry-pick #49572 for 4.0 ### Why are the changes needed? For feature parity with Spark Connect ### Does this PR introduce _any_ user-facing change? No API changes, but adding new API `addArtifact(s)` ### How was this patch tested? Added corresponding UTs with Spark Connect ### Was this patch authored or co-authored using generative AI tooling? No Closes #49583 from itholic/add_artifacts_4.0. Authored-by: Haejoon Lee <haejoon.lee@databricks.com> Signed-off-by: Haejoon Lee <haejoon.lee@databricks.com>
…t_artifact` ### What changes were proposed in this pull request? this test was added in #49572, but never enabled in ci ### Why are the changes needed? test coverage ### Does this PR introduce _any_ user-facing change? no, test-only ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #50601 from zhengruifeng/add_missing_test_artifact. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request? This PR proposes to support `addArtifact(s)` for PySpark Cherry-pick apache#49572 for 4.0 ### Why are the changes needed? For feature parity with Spark Connect ### Does this PR introduce _any_ user-facing change? No API changes, but adding new API `addArtifact(s)` ### How was this patch tested? Added corresponding UTs with Spark Connect ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#49583 from itholic/add_artifacts_4.0. Authored-by: Haejoon Lee <haejoon.lee@databricks.com> Signed-off-by: Haejoon Lee <haejoon.lee@databricks.com>
What changes were proposed in this pull request?
This PR proposes to support
addArtifact(s)for PySparkWhy are the changes needed?
For feature parity with Spark Connect
Does this PR introduce any user-facing change?
No API changes, but adding new API
addArtifact(s)How was this patch tested?
Added corresponding UTs with Spark Connect
Was this patch authored or co-authored using generative AI tooling?
No