-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Add translation of IOConfig to PyArrow filesystem arguments #1592
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #1592 +/- ##
==========================================
- Coverage 85.29% 83.17% -2.12%
==========================================
Files 54 54
Lines 5161 5189 +28
==========================================
- Hits 4402 4316 -86
- Misses 759 873 +114
|
…ethods in table_io
836e043
to
4e54944
Compare
…t to use_native_downloader=False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, LGTM!
_set_if_not_none(translated_kwargs, "access_key", s3_config.key_id) | ||
_set_if_not_none(translated_kwargs, "secret_key", s3_config.access_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This naming asymmetry is a bit spooky to me, any particular reason for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we just have it named differently :(
PyArrow:
$AWS_ACCESS_KEY_ID
:access_key
$AWS_SECRET_ACCESS_KEY
:secret_key
Daft:
$AWS_ACCESS_KEY_ID
:key_id
$AWS_SECRET_ACCESS_KEY
:access_key
Maybe @samster25 knows where we inherited our naming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaychia Ah yeah, I based it off the aws sdk and boto3 suffix rather than what pyarrow did. What do you think we should use instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I wasn't proposing a change, just wondering where we got it from 😛
The change in #1592 introduced a bug where the inferred PyArrow S3FileSystem doesn't correctly infer the region of the bucket To work-around this, we can manually specify the `region_name` in our IOConfig. This problem should be less common once we move towards all-native reads. Co-authored-by: Jay Chia <jaychia94@gmail.com@users.noreply.github.com>
Closes: #1589
fsspec
filesystems inPythonStorageConfig
. This was actually never used already since we deprecated the user-facing fsspec APIs in Daft 0.2.0filesystem.py
:_resolve_paths_and_filesystem
utility now takes in an optionalIOConfig
_infer_filesystem
) we now inmplement our own logic to translate IOConfig information into pyarrow filesystem kwargsio_config
into_resolve_paths_and_filesystem
for all cases where we are still usingPythonStorageConfig
in our read APIs (i.e. when users opt to useuse_native_downloader=False
IOConfig
in ourwrite_*
APIs as well