-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrow Support #78
Comments
Hi, Lundez! Currently Until now we were mostly focused on frontend part: typesafe Kotlin API, code generation, schema inference and other tricks that provide great experience when you work with data in Kotlin. But now API and overall model are getting stable, so it's time to do more performance tuning and scalability, including Arrow support as a backend. Currently the project has only two active contributors, so any help will be very much appreciated! |
Hi, do you have any headers on how to start? Do you think the java arrow API can work with your "typing" (or whatever to call the typing is used in data frames)? 😊 I think adding arrow would give this project a big boost. |
I have some experience with arrow (as an arrow committer) so let me try to set this up. Current plan is to split into two parts:
Subsequent features can come into more tangible forms when reading is done. Eg arrow file writing, streaming, predicate push down, etc. |
@jimexist incredibly excited to hear this! |
Hello @nikitinas, what do you think about my last PR-s? Also I have made some code writing to Arrow but it does not cover all DataFrame-supported column types (was made for Krangl originally) |
Hello again. |
@koperagen, @nikitinas, I want your opinion about the next detail. In Arrow schema we have So, we can:
What behavior is the best and should we support different of them, in your point of view? |
Could we support different read-modes? Defaulting to first or third makes sense, but a strict-mode would be great (second) through a flag/read-mode IMO |
Hm, i would prefer 1 as a default, because in REPL it can help avoid unnecessary null handling when there are no nulls. But we also need 3 for Gradle plugin which generates schema declaration from data sample. Do i understand the second option right? Something like this would be possible?
I think we shouldn't have this mode unless there is very strong evidence that it is very useful for someone :) Or do you mean this?
All that reminds me of "Infer" that is used as a flat for some operations. |
Thank you for highlighting
OK, thanks for sharing.
when callnig
but actually we have
now. I will fix that. Where can I read more about the Gradle plugin? How do you use it? |
I suggest next mapping if use
|
https://kotlin.github.io/dataframe/gradle.html
I'm not sure about it anymore. Because edit. Colleagues suggested |
Implemented in #129 |
Hi, I can't find that
dataframe
supports Arrow as internal serialization / backend.Is this something which you're working on?
The text was updated successfully, but these errors were encountered: