-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Format][Docs] Clarify (remove?) usage of the term "logical types" #41691
Comments
I was definitely confused by the term "logical type" when I became involved with Arrow several years ago and came slowly to the understanding that every system seems to use the terms logical and physical in a slightly different way and that it is more of a spectrum than a dichotomy. We also have the term "encoding" used in two different ways: "run end encoded" is a "type" but "dictionary" is basically a special case in Schema.fbs and in the Arrow C Data interface. Implementations typically present these both as "types" (e.g., I agree that it is confusing but I am not sure what it should be replaced with. I suppose they could just be called "types" and "layouts", perhaps with dictionary encoding being a layout rather than a type? |
I would maybe generally talk about "data types" to not just have "types". For the layouts I think the adjective "physical" is still useful, or otherwise more consistently use "memory" as adjective. |
Also address apacheGH-14752 by adding a table of data types with their respective parameters and the corresponding layouts.
Also address apacheGH-14752 by adding a table of data types with their respective parameters and the corresponding layouts.
Also address apacheGH-14752 by adding a table of data types with their respective parameters and the corresponding layouts.
Also address apacheGH-14752 by adding a table of data types with their respective parameters and the corresponding layouts.
In several places in the Arrow specification and documentation we use the term "logical types", but we don't use it consistently and we don't actually have physical types (only physical layouts) to contrast it with. This creates confusion for readers as it is not immediately clear whether all data types are "logical" and if there is a meaningful distinction behind our usage of this term. Also address GH-14752 by adding a table of data types with their respective parameters and the corresponding layouts. * GitHub Issue: #41691 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Issue resolved by pull request 41958 |
In several places in the Arrow specification and documentation we use the term "logical types", although we don't use it consistently and we don't actually have physical types (only physical layouts) to contrast it with.
Current usage
The Columnar Format doc page has a section called "Logical Types" (https://arrow.apache.org/docs/15.0/format/Columnar.html#logical-types) to contrast those types from the physical layouts:
It explains an Array as having a logical data type, where "Each logical data type has a well-defined physical layout."
The authoritative Schema.fbs also uses the term:
arrow/format/Schema.fbs
Line 18 in 07a30d9
although it uses the term also in a "correct" way (but incorrect in the way we define the term currently):
arrow/format/Schema.fbs
Lines 101 to 105 in 07a30d9
The Python docs (https://arrow.apache.org/docs/15.0/python/data.html#type-metadata):
Further, in various implementations the term is obviously used as well.
In the Terminology section of the Columnar Format docs (https://arrow.apache.org/docs/15.0/format/Columnar.html#terminology), we define it as:
which is mostly correct with our current usage ("using some physical layout"), but it is also confusing that it explains strings as
List<1-byte>
as we have a different physical layout used for stringsPrevious discussion
Generally we use the term relatively consistently to contrast "logical types" from the "physical layouts", but confusion around the terminology has come up regularly (what are "physical types" then? And extension types are essentially "logical types", but annotating our own logical types). This was specifically discussed in #14752.
@amoeba proposed (#14752 (comment)):
The text was updated successfully, but these errors were encountered: