Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-25553: Support Map data-type natively in Arrow format #2751

Merged
merged 1 commit into from Oct 27, 2021

Conversation

warriersruthi
Copy link
Contributor

@warriersruthi warriersruthi commented Oct 26, 2021

This covers the following sub-tasks as well:
HIVE-25554: Upgrade arrow version to 0.15
HIVE-25555: ArrowColumnarBatchSerDe should store map natively instead of converting to list

What changes were proposed in this pull request?
a. Upgrading arrow version to version 0.15.0 (where map data-type is supported)
b. Modifying ArrowColumnarBatchSerDe and corresponding Serializer/Deserializer to not use list as a workaround for map and use the arrow map data-type instead
c. Taking care of creating non-nullable struct and non-nullable key type for the map data-type in ArrowColumnarBatchSerDe

Why are the changes needed?
Currently, ArrowColumnarBatchSerDe converts map datatype as a list of structs data-type (where the struct is containing the key-value pair of the map).
This causes issues when reading Map datatype using llap-ext-client as it reads a list of structs instead.
HiveWarehouseConnector which uses the llap-ext-client throws exception when the schema (containing Map data type) is different from actual data (list of structs).
This change includes the fix for this issue.

Does this PR introduce any user-facing change?
No

How was this patch tested?
Enabled back the Arrow specific tests in Hive code

This covers the following sub-tasks as well:
HIVE-25554: Upgrade arrow version to 0.15
HIVE-25555: ArrowColumnarBatchSerDe should store map natively instead of converting to list

What changes were proposed in this pull request?
a. Upgrading arrow version to version 0.15.0 (where map data-type is supported)
b. Modifying ArrowColumnarBatchSerDe and corresponding Serializer/Deserializer to not use list as a workaround for map and use the arrow map data-type instead
c. Taking care of creating non-nullable struct and non-nullable key type for the map data-type in ArrowColumnarBatchSerDe

Why are the changes needed?
Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs data-type (where struct is containing the key-value pair of the map).
This causes issues when reading Map datatype using llap-ext-client as it reads a list of structs instead.
HiveWarehouseConnector which uses the llap-ext-client throws exception when the schema (containing Map data type) is different from actual data (list of structs).
This change includes the fix for this issue.

Does this PR introduce any user-facing change?
No

How was this patch tested?
Enabled back the Arrow specific tests in Hive code
@sankarh sankarh merged commit 9ed1d1e into apache:master Oct 27, 2021
HarshitGupta11 pushed a commit to HarshitGupta11/hive that referenced this pull request Dec 12, 2021
…oriyathvariam, reviewed by Sankar Hariappan)

This covers the following sub-tasks:
HIVE-25554: Upgrade arrow version to 0.15
HIVE-25555: ArrowColumnarBatchSerDe should store map natively instead of converting to list

a. Upgrading arrow version to version 0.15.0 (where map data-type is supported)
b. Modifying ArrowColumnarBatchSerDe and corresponding Serializer/Deserializer to not use list as a workaround for map and use the arrow map data-type instead
c. Taking care of creating non-nullable struct and non-nullable key type for the map data-type in ArrowColumnarBatchSerDe

Signed-off-by: Sankar Hariappan <sankarh@apache.org>
Closes (apache#2751)
dengzhhu653 pushed a commit to dengzhhu653/hive that referenced this pull request Dec 15, 2022
…oriyathvariam, reviewed by Sankar Hariappan)

This covers the following sub-tasks:
HIVE-25554: Upgrade arrow version to 0.15
HIVE-25555: ArrowColumnarBatchSerDe should store map natively instead of converting to list

a. Upgrading arrow version to version 0.15.0 (where map data-type is supported)
b. Modifying ArrowColumnarBatchSerDe and corresponding Serializer/Deserializer to not use list as a workaround for map and use the arrow map data-type instead
c. Taking care of creating non-nullable struct and non-nullable key type for the map data-type in ArrowColumnarBatchSerDe

Signed-off-by: Sankar Hariappan <sankarh@apache.org>
Closes (apache#2751)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants