New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-5342: [Format] Formalize "extension types" in Arrow protocol metadata #4332
Conversation
docs/source/format/Metadata.rst
Outdated
``custom_metadata`` field of the ``Field`` metadata structure. These | ||
special extension keys are: | ||
|
||
* ``'ARROW:extension_name'`` for the string name identifying the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we formalize that "ARROW:" is a reserved key prefix for metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Codecov Report
@@ Coverage Diff @@
## master #4332 +/- ##
==========================================
+ Coverage 87.78% 89.47% +1.69%
==========================================
Files 747 648 -99
Lines 85081 91701 +6620
Branches 1253 0 -1253
==========================================
+ Hits 74690 82052 +7362
+ Misses 10127 9649 -478
+ Partials 264 0 -264
Continue to review full report at Codecov.
|
docs/source/format/Metadata.rst
Outdated
|
||
* ``'ARROW:extension_name'`` for the string name identifying the | ||
custom data type | ||
* ``'ARROW:extension_data'`` for a serialized representation of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be called metadata, to be clear it should be small? I asked this on the ML but could you provide an example for how this would be used for Tensors? (and clarify UUIDs wouldn't use it?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding some examples
docs/source/format/Metadata.rst
Outdated
``KeyValue`` pairs in ``custom_metadata` in the ``Field`` metadata | ||
structure. These extension keys are: | ||
|
||
* ``'ARROW:extension_name'`` for the string name identifying the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also commented on the ML, but I think we should recommend this is namespaced in some way so different projects can plugin to the extension system without conflcting. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
LGTM, but I guess we wait for a vote? |
cpp/src/arrow/extension_type-test.cc
Outdated
key_value_metadata({{"arrow_extension_name", "uuid"}, | ||
{"arrow_extension_data", "uuid-type-unique-code"}}); | ||
key_value_metadata({{"ARROW:extension_name", "uuid"}, | ||
{"ARROW:extension_data", "uuid-type-unique-code"}}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be extension_metadata
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, will fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a nit.
e33240c
to
10b66bd
Compare
@@ -192,13 +189,74 @@ categories: | |||
|
|||
Refer to `Schema.fbs`_ for up-to-date descriptions of each built-in | |||
logical type. | |||
Custom Application Metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An empty new line is missing before this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
docs/source/format/Metadata.rst
Outdated
--------------- | ||
|
||
User-defined "extension" types can be defined setting certain | ||
``KeyValue`` pairs in ``custom_metadata` in the ``Field`` metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"`" is missing after "custom_metadata`".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
…ta key names to use ARROW: prefix Add section about application metadata and state that ARROW: is a reserved prefix for internal Arrow use Code review comments, add more extension type examples Use ARROW:extension namespace for extension type metadata remove '(colon included)' that is not needed anymore since the colon namespace separator is implicit
+1. Merging on account of mailing list vote passing |
This patch proposes a language-independent scheme for annotating built-in Arrow types with a custom type name and serialized representation, per previous discussions on the mailing list.
I am starting a mailing list discussion to hold a vote about this and see if there are other ideas about how to proceed.