You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As an Arrow developer/user, I'd like to be able to read and write Feather files.
The current I/O story in Rust isn't great, we don't yet fully support reading and writing between Parquet, we can only read CSV but not yet writing. This is an inconvenience (at least for me).
I propose supporting the Feather format in Rust, initially with the following limitations:
No date/time support until ARROW-4386 (and potentially more work) lands
Reading categorical data (from other languages) but not writing them
Reading and writing from and to single record batches. We don't yet support slicing of arrays ARROW-3954
If the above are accept(ed|able), we can enhance the Feather support as the dependencies on the above limitations are lifted.
We can also refactor the Feather code as we work on more IPC in Rust.
It's a bit confusing because when I looked at wesm/feather, the position seems to be that once R bindings for Arrow are created, Feather-based IPC will live in Arrow, and perhaps be improved or superseded by Arrow File format. At the same time, from looking at pyarrow, it doesn't seem like Arrow File is used, because 'Tables' are saved as parquet files.
I need(ed) a way to read my existing Feather files, and be able to write whatever changes I make, back. I think there's still quite a bit outstanding re Arrow + Parquet before one could use them together in Rust. So I was stuck and decided to go down the Feather route in the interim.
I'm also interested in Arrow IPC, and will contribute to making it happen once we've added some currently blocking functionality. If Feather in Rust doesn't make sense, I can keep the code out of tree, and continue using it for my needs.
I'm waiting for the R community to sort out the packaging issues so that users can install an Arrow-based Feather package instead of the current small prototype that we built in 2016. Once that is possible, then we can look at replacing the internal detail of Feather files with the Arrow IPC binary protocol.
Please note that Feather files should not be used for long term data storage. So once a transition to "Feather v2" happens (i.e. the "feather" name will live on but be based on the standard IPC protocol) then what you can do is:
Read Feather file with pyarrow X.Y.Z
Write data to Arrow IPC file
Read IPC file, write Feather v2 format (if that's what you want)
I do not think it is a great use of time for Rust or any other language to support Feather until after first supporting the IPC protocol. The latter is more general than the former
If you have any uncertainty or questions about Feather please don't hesitate to ask
As an Arrow developer/user, I'd like to be able to read and write Feather files.
The current I/O story in Rust isn't great, we don't yet fully support reading and writing between Parquet, we can only read CSV but not yet writing. This is an inconvenience (at least for me).
I propose supporting the Feather format in Rust, initially with the following limitations:
No date/time support until ARROW-4386 (and potentially more work) lands
Reading categorical data (from other languages) but not writing them
Reading and writing from and to single record batches. We don't yet support slicing of arrays ARROW-3954
If the above are accept(ed|able), we can enhance the Feather support as the dependencies on the above limitations are lifted.
We can also refactor the Feather code as we work on more IPC in Rust.
Reporter: Neville Dipale / @nevi-me
Assignee: Neville Dipale / @nevi-me
PRs and other links:
Note: This issue was originally created as ARROW-4463. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: