-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing Partitioned Datasets recipe for Python #47
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that you are using the dummy "chunk" column to be able to have a partitioned dataset without having a natural column to partition on, but my feeling is that this is a bit of a workaround because we don't give a better alternative (yet).
Personally, I would maybe use a more useful partitioning column as the main example, and then mention the "chunk" column as a way to do it if there is no column already present that can be used for partitioning (and then we can easily update this when a better API becomes available)
Switched to a real column ( |
@jorisvandenbossche can you re-review? Should have addressed your concerns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! One small language remark
python/source/io.rst
Outdated
|
||
Arrow will partition datasets in subdirectories by default, which will | ||
result in 10 different directories named with the value of the partitioning | ||
column and with file containing the data partition inside: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the "and with file containing the data partition inside" reads a bit strange. Maybe something like "each with a file containing the subset of the data for that partition"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 reworded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready for merge?
Fixes #46