Skip to content

[SPARK-57192][PYSPARK] Fix classic addArtifacts with multiple paths#56245

Closed
wbo4958 wants to merge 1 commit into
apache:masterfrom
wbo4958:fix-addartifacts
Closed

[SPARK-57192][PYSPARK] Fix classic addArtifacts with multiple paths#56245
wbo4958 wants to merge 1 commit into
apache:masterfrom
wbo4958:fix-addartifacts

Conversation

@wbo4958
Copy link
Copy Markdown
Contributor

@wbo4958 wbo4958 commented Jun 1, 2026

What changes were proposed in this pull request?

This PR fixes classic PySpark SparkSession.addArtifacts to handle multiple paths in
one call.

The classic implementation previously forwarded all paths as positional arguments to
SparkContext.addPyFile, SparkContext.addArchive, or SparkContext.addFile. Those
APIs accept one path at a time, so this PR updates classic addArtifacts to call the
underlying SparkContext API once per path.

This PR also adds regression coverage for adding multiple Python files in a single
addArtifacts(..., pyfile=True) call.

Why are the changes needed?

Classic PySpark currently fails for useful multi-path calls such as:

spark.addArtifacts("a.py", "b.py", "c.py", pyfile=True)

with:

TypeError: SparkContext.addPyFile() takes 2 positional arguments but 4 were given

The public API accepts *path, so classic Spark should support multiple artifacts
consistently.

Does this PR introduce any user-facing change?

Yes. Classic PySpark users can now add multiple artifacts in one
SparkSession.addArtifacts call when using pyfile=True, archive=True, or file=True.

How was this patch tested?

Added a regression test for adding multiple Python files in one call.

Also manually ran:

env -u SPARK_CONF_DIR PYTHONPATH="python:python/lib/py4j-0.10.9.9-src.zip"
SPARK_HOME="$PWD" \
    python3 -m unittest pyspark.sql.tests.test_artifact.ArtifactTests.test_add_multiple_pyfiles

and a standalone local Spark repro covering pyfile=True, file=True, and archive=True.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: OpenAI Codex (GPT-5.5)

@wbo4958
Copy link
Copy Markdown
Contributor Author

wbo4958 commented Jun 1, 2026

Hi @HyukjinKwon, @zhengruifeng, @cloud-fan, Could you help review this PR? Thx

@HyukjinKwon
Copy link
Copy Markdown
Member

Merged to master and branch-4.x.

HyukjinKwon pushed a commit that referenced this pull request Jun 1, 2026
### What changes were proposed in this pull request?

  This PR fixes classic PySpark `SparkSession.addArtifacts` to handle multiple paths in
one call.

  The classic implementation previously forwarded all paths as positional arguments to
`SparkContext.addPyFile`, `SparkContext.addArchive`, or `SparkContext.addFile`. Those
APIs accept one path at a time, so this PR updates classic `addArtifacts` to call the
underlying SparkContext API once per path.

  This PR also adds regression coverage for adding multiple Python files in a single
`addArtifacts(..., pyfile=True)` call.

  ### Why are the changes needed?

  Classic PySpark currently fails for useful multi-path calls such as:

```python
spark.addArtifacts("a.py", "b.py", "c.py", pyfile=True)
```
  with:

```
TypeError: SparkContext.addPyFile() takes 2 positional arguments but 4 were given
```
  The public API accepts *path, so classic Spark should support multiple artifacts
  consistently.

  ### Does this PR introduce any user-facing change?

  Yes. Classic PySpark users can now add multiple artifacts in one
  SparkSession.addArtifacts call when using `pyfile=True, archive=True, or file=True`.

  ### How was this patch tested?

  Added a regression test for adding multiple Python files in one call.

  Also manually ran:

```
env -u SPARK_CONF_DIR PYTHONPATH="python:python/lib/py4j-0.10.9.9-src.zip"
SPARK_HOME="$PWD" \
    python3 -m unittest pyspark.sql.tests.test_artifact.ArtifactTests.test_add_multiple_pyfiles
```

  and a standalone local Spark repro covering pyfile=True, file=True, and archive=True.

  ### Was this patch authored or co-authored using generative AI tooling?

  Generated-by: OpenAI Codex (GPT-5.5)

Closes #56245 from wbo4958/fix-addartifacts.

Authored-by: Bobby Wang <wbo4958@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 2242318)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@wbo4958 wbo4958 deleted the fix-addartifacts branch June 2, 2026 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants