Skip to content

[Python] ParquetWriter doesn't support all valid s3 paths #46924

Open
@graydenshand

Description

@graydenshand

Describe the bug, including details regarding any error messages, version, and platform.

It appears pyarrow.parquet.ParquetWriter doesn't accept s3 path that contain # despite that being a valid character in s3 paths.

>>> import pyarrow as pa
>>> import pyarrow.parquet as pq
>>> schema = pa.schema([pa.field("foobar", pa.string())])
>>> writer = pq.ParquetWriter('s3://my-bucket/a#b#c.parquet', schema=schema)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/project/.venv/lib/python3.12/site-packages/pyarrow/parquet/core.py", line 1014, in __init__
    sink = self.file_handle = filesystem.open_output_stream(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/_fs.pyx", line 885, in pyarrow._fs.FileSystem.open_output_stream
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Expected a local filesystem path, got a URI: 's3://my-bucket/a#b#c.parquet'

Other special characters ([$*&@!])seem to be acceptable, so classifying this as a bug rather than a feature request.

via pyarrow v20.0.0

Possibly related to #41365.

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions