-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-10514: [C++][Parquet] Make the column name the same for both output formats of parquet reader #9649
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This looks reasonable to me. @emkornfield what do you think? |
Yes |
LGTM to me as well. |
Ok, I'll merge this PR then. Thank you @FawnD2 ! |
GeorgeAp
pushed a commit
to sirensolutions/arrow
that referenced
this pull request
Jun 7, 2021
…tput formats of parquet reader In parquet-reader there are two ways to output the schema for a Parquet file: DebugPrint and JSONPrint. When output in JSON format, the Column name is short name instead of full-qualified name. For example, for schema (1), there will be 2 Columns with `"Name": "key"`. That's very confusing. In this PR we start using full-qualified name for Column in JSONPrint instead of short name, similar to DebugPrint. (1): ``` required group field_id=0 spark_schema { optional group field_id=1 a (Map) { repeated group field_id=2 key_value { required binary field_id=3 key (String); optional group field_id=4 value (Map) { repeated group field_id=5 key_value { required int32 field_id=6 key; required boolean field_id=7 value; } } } } } ``` Closes apache#9649 from FawnD2/patch-1 Authored-by: FawnD2 <zzosimova@ya.ru> Signed-off-by: Antoine Pitrou <antoine@python.org>
michalursa
pushed a commit
to michalursa/arrow
that referenced
this pull request
Jun 13, 2021
…tput formats of parquet reader In parquet-reader there are two ways to output the schema for a Parquet file: DebugPrint and JSONPrint. When output in JSON format, the Column name is short name instead of full-qualified name. For example, for schema (1), there will be 2 Columns with `"Name": "key"`. That's very confusing. In this PR we start using full-qualified name for Column in JSONPrint instead of short name, similar to DebugPrint. (1): ``` required group field_id=0 spark_schema { optional group field_id=1 a (Map) { repeated group field_id=2 key_value { required binary field_id=3 key (String); optional group field_id=4 value (Map) { repeated group field_id=5 key_value { required int32 field_id=6 key; required boolean field_id=7 value; } } } } } ``` Closes apache#9649 from FawnD2/patch-1 Authored-by: FawnD2 <zzosimova@ya.ru> Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In parquet-reader there are two ways to output the schema for a Parquet file: DebugPrint and JSONPrint. When output in JSON format, the Column name is short name instead of full-qualified name. For example, for schema (1), there will be 2 Columns with
"Name": "key"
. That's very confusing.In this PR we start using full-qualified name for Column in JSONPrint instead of short name, similar to DebugPrint.
(1):