-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-8220: [Python] Make dataset FileFormat objects serializable #6720
Conversation
I also don't like the I think we should rather give a better API in a parquet / reading specific API like |
@jorisvandenbossche Agreed. May we defer your suggestion to a follow-up? |
Well, my comment is kind of: we need to keep |
|
I can wire these options, but it's not entirely clear because we don't have a read() method on the datasets. Once we add support for writing we can refine the API. |
Yes, it is still bound to the format, but it splits its keywords in two groups:
I think
Yeah, I fully agree much of this discussion is a bit "up in the air", since we don't yet have writing, so don't yet know how we would want to make the API for writing. |
@jorisvandenbossche updated as you requested |
python/pyarrow/_dataset.pyx
Outdated
uint32_t buffer_size | ||
set dictionary_columns | ||
|
||
def __init__(self, bint use_buffered_stream=False, buffer_size=8192, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, see my earlier comment about this default. But when setting it like this on the options class, it becomes more difficult to do that (unless not typing the attribute as an uint)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Build failure is unrelated. |
Also did some refactoring for a more pleasant user API.