Skip to content

Commit

Permalink
ARROW-18238: [Docs][Python] Improve docs for S3FileSystem (#14599)
Browse files Browse the repository at this point in the history
Will fix [ARROW-18238](https://issues.apache.org/jira/browse/ARROW-18238)

Authored-by: Miles Granger <miles59923@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
  • Loading branch information
milesgranger committed Nov 9, 2022
1 parent 92c9f94 commit 94cf74f
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 2 deletions.
23 changes: 23 additions & 0 deletions docs/source/python/filesystems.rst
Expand Up @@ -156,6 +156,7 @@ addition, the constructor will also inspect configured S3 credentials as
supported by AWS (for example the ``AWS_ACCESS_KEY_ID`` and
``AWS_SECRET_ACCESS_KEY`` environment variables).


Example how you can read contents from a S3 bucket::

>>> from pyarrow import fs
Expand All @@ -178,11 +179,33 @@ Example how you can read contents from a S3 bucket::
>>> f.readall()
b'some data'


Note that it is important to configure :class:`S3FileSystem` with the correct
region for the bucket being used. If `region` is not set, the AWS SDK will
choose a value, defaulting to 'us-east-1' if the SDK version is <1.8.
Otherwise it will try to use a variety of heuristics (environment variables,
configuration profile, EC2 metadata server) to resolve the region.

It is also possible to resolve the region from the bucket name for
:class:`S3FileSystem` by using :func:`pyarrow.fs.resolve_s3_region` or
:func:`pyarrow.fs.S3FileSystem.from_uri`.

Here are a couple examples in code::

>>> from pyarrow import fs
>>> s3 = fs.S3FileSystem(region=fs.resolve_s3_region('my-test-bucket'))

# Or via URI:
>>> s3, path = fs.S3FileSystem.from_uri('s3://[access_key:secret_key@]bucket/path]')


.. seealso::

See the `AWS docs <https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/credentials.html>`__
for the different ways to configure the AWS credentials.

:func:`pyarrow.fs.resolve_s3_region` for resolving region from a bucket name.


.. _filesystem-gcs:

Expand Down
8 changes: 6 additions & 2 deletions python/pyarrow/_s3fs.pyx
Expand Up @@ -173,8 +173,12 @@ cdef class S3FileSystem(FileSystem):
load_frequency : int, default 900
The frequency (in seconds) with which temporary credentials from an
assumed role session will be refreshed.
region : str, default 'us-east-1'
AWS region to connect to.
region : str, default None
AWS region to connect to. If not set, the AWS SDK will attempt to
determine the region using heuristics such as environment variables,
configuration profile, EC2 metadata, or default to 'us-east-1' when SDK
version <1.8. One can also use :func:`pyarrow.fs.resolve_s3_region` to
automatically resolve the region from a bucket name.
request_timeout : double, default None
Socket read timeouts on Windows and macOS, in seconds.
If omitted, the AWS SDK default value is used (typically 3 seconds).
Expand Down

0 comments on commit 94cf74f

Please sign in to comment.