-
Notifications
You must be signed in to change notification settings - Fork 4.1k
[Python] S3FileSystem ignores AWS_ENDPOINT_URL / AWS_ENDPOINT_URL_S3 environment variables #49643
Description
Describe the bug, including details regarding any error messages, version, and platform.
I am trying to use PyArrow S3FileSystem with a custom S3-compatible endpoint (MinIO).
When using endpoint_override explicitly, everything works:
fs.S3FileSystem(endpoint_override="http://10.148.0.2:9000")
However, when relying on environment variables:
AWS_ENDPOINT_URL_S3=http://10.148.0.2:9000
and initializing:
fs.S3FileSystem()
the request is still sent to AWS S3 instead of the custom endpoint, resulting in:
AWS Error ACCESS_DENIED during HeadObject operation
This suggests that environment-based endpoint configuration is not being honored.
Reproducible Example:
import pyarrow.fs as fs
import os
os.environ["AWS_ACCESS_KEY_ID"] = "xxx"
os.environ["AWS_SECRET_ACCESS_KEY"] = "xxx"
os.environ["AWS_ENDPOINT_URL_S3"] = "http://10.148.0.2:9000"
os.environ["AWS_S3_ADDRESSING_STYLE"] = "path"
s3 = fs.S3FileSystem()
print(s3.get_file_info("bucket/key"))
It works if I pass the endpoint
s3 = fs.S3FileSystem(endpoint_override="http://10.148.0.2:9000")
Expected Behavior:
S3FileSystem should connect to the configured endpoint (10.148.0.2:9000), using the ENV vars
Actual Behavior:
Requests are sent to AWS S3 (s3.amazonaws.com), ignoring endpoint env vars
Environment:
PyArrow version: 23.0.0
Python version: 3.11.11
Deployment: Kubernetes / Ray worker
S3 backend: MinIO
I understand AWS SDK may not officially support AWS_ENDPOINT_URL, but PyArrow provides S3-specific environment variables. It is unclear which are supported and how endpoint resolution is intended to work without endpoint_override.
Component(s)
Python