Skip to content

[SPARK-55203][PYTHON] Support PathLike in readwriter paths#55660

Open
201573 wants to merge 1 commit intoapache:masterfrom
201573:codex/pathlike-readwriter-55203
Open

[SPARK-55203][PYTHON] Support PathLike in readwriter paths#55660
201573 wants to merge 1 commit intoapache:masterfrom
201573:codex/pathlike-readwriter-55203

Conversation

@201573
Copy link
Copy Markdown

@201573 201573 commented May 3, 2026

What changes were proposed in this pull request?

This PR allows os.PathLike path objects, such as pathlib.Path, to be passed to PySpark readwriter path APIs.

The change normalizes path-like objects with os.fsdecode before sending paths to the JVM or Spark Connect plans.

Why are the changes needed?

Currently, several PySpark readwriter methods accept only str or list[str] paths. Python users commonly use pathlib.Path, and these objects should work for file-system backed data sources.

Closes #55203.

Does this PR introduce any user-facing change?

Yes. Users can pass pathlib.Path / os.PathLike objects to supported readwriter APIs.

How was this patch tested?

  • ./dev/lint-python --compile
  • git diff --check
  • Added PySpark readwriter tests for pathlib.Path
  • Added Spark Connect plan coverage for path-like path lists

Full PySpark runtime tests were not run locally because this machine does not have a Java Runtime installed.

Was this patch authored or co-authored using generative AI tooling?

Yes. I used OpenAI Codex to help implement and test this change. I have reviewed the changes and take responsibility for them. This contribution is my original work and I license the work to the project under the project's open source license.

@201573
Copy link
Copy Markdown
Author

201573 commented May 3, 2026

Additional local verification:\n\n- Built Spark with Hive profile: ./build/mvn -DskipTests -Phive package\n- Passed: pyspark.sql.tests.test_readwriter ReadwriterTests.test_pathlike_paths\n\nNote: I ran the PySpark test through a no-space local symlink for SPARK_HOME because my local checkout path contains spaces.

@201573 201573 force-pushed the codex/pathlike-readwriter-55203 branch 4 times, most recently from 9c2e3a5 to 5fbe99d Compare May 3, 2026 16:35
@201573 201573 force-pushed the codex/pathlike-readwriter-55203 branch from 5fbe99d to fa3943c Compare May 3, 2026 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[PYTHON] Allow PathLike path objects as input to readwriter

1 participant