-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs] List the logical types in Columnar.rst for searchability #14752
Comments
Thanks for opening this! In particular David's comment:
was particularly helpful. The other types that I could not find in the columnar spec when I was looking for them were the interval types, which might be worth mentioning. |
Interval types are also logical types, so yeah, a listing of all the logical types might be useful. |
Arrow has no notion of logical types. But, yes, making the Columnar format spec more readable would be useful. |
Ok, unfortunately, Columnar.rst does use the wording "logical type". Which is contradicted by the fact that there's no separate set of "physical types" (only layouts). The whole thing has always been confusing to me. |
I think it would be nice if all the types were in Columnar.rst (with the corresponding layouts and any parameters). These don't change frequently and so I don't think that maintaining a sync between the .fbs file and the documentation will be prohibitively complicated? |
I agree it would be nice, at least as a synthetic table. |
Semi-related issue: #33958 |
I'd like to see this part of the format docs improved and would be happy to submit a PR for review. I read the comments above and in other issues and it seems like there's:
I could start with a PR for (2). Looking at the high level sections in https://arrow.apache.org/docs/format/Columnar.html, I think a comprehensive table of types would be best right after Terminology and before Physical Memory Layout as I think people would generally want to know the types before their physical layouts. Does this sound reasonable? |
I would be excited to see that PR! I think that a type listing at the start ("this is what Arrow can do") followed by layouts ("this is what it looks like in memory") makes a lot of sense and would have helped me a lot when I was trying to implement them. I like types + layouts rather than logical + physical but I don't have strong feelings about it as long as it's consistent. |
The "logical" vs. "physical" distinction is actually extra confusing, because nowadays we do have logical types (aka semantic variations of existing types), but they are called... extension types. |
Gotcha. That brings up another point... should the newly-added Tensor types be in the aforementioned table of Types. I'd think yes. |
I don't think so. They're extension types, not part of the columnar spec itself. You may instead add a |
Okay, that makes sense. |
I opened a dedicated issue for this, as it requires a more comprehensive update of the docs than just adding a table of all (logical) type: #41691 |
Also address apacheGH-14752 by adding a table of data types with their respective parameters and the corresponding layouts.
Also address apacheGH-14752 by adding a table of data types with their respective parameters and the corresponding layouts.
Also address apacheGH-14752 by adding a table of data types with their respective parameters and the corresponding layouts.
Also address apacheGH-14752 by adding a table of data types with their respective parameters and the corresponding layouts.
In several places in the Arrow specification and documentation we use the term "logical types", but we don't use it consistently and we don't actually have physical types (only physical layouts) to contrast it with. This creates confusion for readers as it is not immediately clear whether all data types are "logical" and if there is a meaningful distinction behind our usage of this term. Also address GH-14752 by adding a table of data types with their respective parameters and the corresponding layouts. * GitHub Issue: #41691 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Describe the enhancement requested
See apache/arrow-nanoarrow#74 (review)
Columnar.rst doesn't describe the logical types, since Schema.fbs is considered authoritative. But it is probably worth at least listing the types in this document so that they can be easily searched for (and possibly even summarize those types).
Component(s)
Documentation
The text was updated successfully, but these errors were encountered: