Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Format] Physical representation of columnar format not well documented #39569

Open
rongcuid opened this issue Jan 11, 2024 · 1 comment
Open

Comments

@rongcuid
Copy link

Describe the enhancement requested

Currently the columnar format is only documented at this page: https://arrow.apache.org/docs/format/Columnar.html. However, when I try to actually implement the format, I find the physical representation underdocumented.

Particularly, the encoding of primitive types is unclear. The only info given is an example int32 layout, but no other layouts are given, while other type are unclear. How are booleans represented, for example? Do implementation choose what representation they use? I suppose that's not the case as it will defeat Arrow's goal.

I was pointed to https://github.com/apache/arrow/blob/main/format/Schema.fbs for reference. However, as far as I understand, this specification is only for the IPC schema. It includes specification of type information, but when it comes to physical representation, there's only struct Buffer with a length and offset.

I would like a clear documentation of the memory layout of every type supported by Arrow. An example specification I can think of is CTF, which provides not only layouts of all types, but also side-by-side examples of schema, layout, and values. Similar documentation will be immensely helpful for Arrow, especially showing layouts of various array types.

Component(s)

Format

@pitrou pitrou changed the title Physical representation of columnar format not well documented [Format] Physical representation of columnar format not well documented Jan 11, 2024
@AlenkaF AlenkaF self-assigned this May 8, 2024
@AlenkaF
Copy link
Member

AlenkaF commented May 8, 2024

Thank you for creating the issue @rongcuid!

I am attempting to add a general introductory page to the documentation that would list all the physical layouts with diagrams and basic explanations here: #41593. Reviews welcome!

@AlenkaF AlenkaF removed their assignment May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants