Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rust] Support read:write of Feather files #21020

Closed
asfimport opened this issue Feb 3, 2019 · 3 comments
Closed

[Rust] Support read:write of Feather files #21020

asfimport opened this issue Feb 3, 2019 · 3 comments

Comments

@asfimport
Copy link

As an Arrow developer/user, I'd like to be able to read and write Feather files.

The current I/O story in Rust isn't great, we don't yet fully support reading and writing between Parquet, we can only read CSV but not yet writing. This is an inconvenience (at least for me).

I propose supporting the Feather format in Rust, initially with the following limitations:

  • No date/time support until ARROW-4386 (and potentially more work) lands

  • Reading categorical data (from other languages) but not writing them

  • Reading and writing from and to single record batches. We don't yet support slicing of arrays ARROW-3954

    If the above are accept(ed|able), we can enhance the Feather support as the dependencies on the above limitations are lifted. 

    We can also refactor the Feather code as we work on more IPC in Rust.

Reporter: Neville Dipale / @nevi-me
Assignee: Neville Dipale / @nevi-me

PRs and other links:

Note: This issue was originally created as ARROW-4463. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Andy Grove / @andygrove:
Hi [~nevi_me] .. I went down this same path a while back ... Feather is an old format and not really supported now. It would be better to implement the IPC file format instead which is defined in https://github.com/apache/arrow/blob/master/format/File.fbs

 

Here is a PR where I had started work on this:

#2986

 

@asfimport
Copy link
Author

Neville Dipale / @nevi-me:
Thanks Andy, i nearly missed your reply.

It's a bit confusing because when I looked at wesm/feather, the position seems to be that once R bindings for Arrow are created, Feather-based IPC will live in Arrow, and perhaps be improved or superseded by Arrow File format. At the same time, from looking at pyarrow, it doesn't seem like Arrow File is used, because 'Tables' are saved as parquet files.

I need(ed) a way to read my existing Feather files, and be able to write whatever changes I make, back. I think there's still quite a bit outstanding re Arrow + Parquet before one could use them together in Rust. So I was stuck and decided to go down the Feather route in the interim.

I'm also interested in Arrow IPC, and will contribute to making it happen once we've added some currently blocking functionality. If Feather in Rust doesn't make sense, I can keep the code out of tree, and continue using it for my needs.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
See http://wesmckinney.com/blog/feather-arrow-future/

I'm waiting for the R community to sort out the packaging issues so that users can install an Arrow-based Feather package instead of the current small prototype that we built in 2016. Once that is possible, then we can look at replacing the internal detail of Feather files with the Arrow IPC binary protocol.

Please note that Feather files should not be used for long term data storage. So once a transition to "Feather v2" happens (i.e. the "feather" name will live on but be based on the standard IPC protocol) then what you can do is:

  • Read Feather file with pyarrow X.Y.Z

  • Write data to Arrow IPC file

  • Read IPC file, write Feather v2 format (if that's what you want)

    I do not think it is a great use of time for Rust or any other language to support Feather until after first supporting the IPC protocol. The latter is more general than the former

    If you have any uncertainty or questions about Feather please don't hesitate to ask

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants