HIVE-24883 : Add support for complex types columns in Hive Joins #2071

maheshk114 · 2021-03-14T02:21:07Z

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

zabetak

Thanks for pushing this forward @maheshk114 .

First, I have some high-level questions regarding the scope of this work:

Which join operators are we targeting ? Checking the CommonJoinOperator hierarchy I see a few classes that were not affected by your changes (e.g., MapJoinOperator, JoinOperator, and VectorXXX) and I am wondering if that is normal. Do they already support complex types? Should they support complex types in the future?
Which kind of joins are we tackling? Apart from equality joins (=) there are more operators that can appear such as (<>,<,>,<=,>=, etc), what happens with them?
What are the semantics of the comparisons? Are we following the SQL standard?

For the above it may be worth enriching/modifying the JIRA case to tighten the scope.

Next in terms of testing, I think we should have a few cases covering:

comparisons with null values;
comparisons of collections with different sizes;
comparisons with different types (negative?);
more operators & predicates (depends on the answers to the questions above)

zabetak · 2021-04-08T12:04:33Z

ql/src/java/org/apache/hadoop/hive/ql/exec/HiveWritableComparator.java

+import java.util.ArrayList;
+import java.util.LinkedHashMap;
+
+class HiveListComparator extends HiveWritableComparator {


Having multiple top-level classes in a single source file does not provide any big advantage and on the contrary may cause problems (check Item 25: Limit source files to a single top-level class Effective Java).

zabetak · 2021-04-08T12:12:23Z

ql/src/java/org/apache/hadoop/hive/ql/exec/HiveWritableComparator.java

+    }
+}
+
+public class HiveWritableComparator extends WritableComparator {


Why is it necessary to introduce a new API? As far as I can see HiveWritableComparator does not add any new behavior to WritableComparator. It only contains some factory methods and these would fit much better in a ComplexWritableComparatorFactory class that is final and immutable.

The non-public top-level classes above could become private static members classes of the factory class.

ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java

zabetak · 2021-04-08T12:16:56Z

ql/src/java/org/apache/hadoop/hive/ql/exec/HiveWritableComparator.java

+    public int compare(Object key1, Object key2) {
+        ArrayList a1 = (ArrayList) key1;
+        ArrayList a2 = (ArrayList) key2;
+        if (a1.size() != a2.size()) {


Is it possible to get an NPE?

Yes, added null check for all.

maheshk114 · 2021-04-09T06:04:27Z

3. What are the semantics of the comparisons? Are we following the SQL standard?

I am not aware of any SQL standards for complex type comparison. The join ordering used follows the normal comparison, equality is check from left to right fields.

maheshk114 · 2021-04-09T06:05:50Z

2. Which kind of joins are we tackling? Apart from equality joins (=) there are more operators that can appear such as (<>,<,>,<=,>=, etc), what happens with them?

Thanks for pointing this out. I will create a separate Jira as currently only equal operator is supported.

maheshk114 · 2021-04-09T06:06:56Z

Which join operators are we targeting ? Checking the CommonJoinOperator hierarchy I see a few classes that were not affected by your changes (e.g., MapJoinOperator, JoinOperator, and VectorXXX) and I am wondering if that is normal. Do they already support complex types? Should they support complex types in the future?

As of now hash based joins are working fine. This patch fixes the issue with SMB and Common merge join.

…xed review commnets

…xed review commnets1

…xed review commnets3

…mmon merge joins 1. Support added only for equal operator. 2. Not supported for map type.

zabetak

Left a minor comment in the PR, and one in the JIRA about supporting UNION types in comparisons. Apart from that the PR is in very good shape and can be merged as soon as we agree on the support of UNION type.

Suggestion for squash commit msg:
HIVE-24883: Support ARRAY/STRUCT types in equality sort-merge joins
include also UNION if we decide to go this way.

zabetak · 2021-04-28T09:02:59Z

ql/src/test/queries/clientnegative/test_merge_join_map_type.q

+create table table_map_types (id int, c1 map<int,int>, c2 map<int,int>);
+insert into table_map_types VALUES (1, map(1,1), map(2,1));
+insert into table_map_types VALUES (2, map(1,2), map(2,2));
+insert into table_map_types VALUES (3, map(1,3), map(2,3));
+insert into table_map_types VALUES (4, map(1,4), map(1,4));
+insert into table_map_types VALUES (1, map(1,1,2,2,3,3,4,4), map(2,1,1,4));
+select * from table_map_types;
+
+create table table_map_types1 (id int, c1 map<int,int>, c2 map<int,int>);
+insert into table_map_types1 VALUES (1, map(1,1), map(2,1));
+insert into table_map_types1 VALUES (2, map(1,2), map(2,2));
+insert into table_map_types1 VALUES (3, map(1,4), map(1,3));
+insert into table_map_types1 VALUES (1, map(1,1,2,2,3,3,4,4), map(2,1,1,5));
+insert into table_map_types1 VALUES (1, map(1,1,2,2,3,3,4,5), map(2,1,1,4));
+select * from table_map_types1;
+
+set hive.cbo.enable=false;
+set hive.auto.convert.join=false;
+set hive.optimize.ppd=false;
+
+explain select * from table_map_types t1 inner join table_map_types1 t2 on t1.c1 = t2.c1;
+select * from table_map_types t1 inner join table_map_types1 t2 on t1.c1 = t2.c1;
+
+explain select * from table_map_types t1 inner join table_map_types1 t2 on t1.c2 = t2.c2;
+select * from table_map_types t1 inner join table_map_types1 t2 on t1.c2 = t2.c2;


Since this is a negative test I guess you only need the following lines to make sure that the exception is raised:

create table table_map_types (id int, c1 map<int,int>, c2 map<int,int>); set hive.cbo.enable=false; set hive.auto.convert.join=false; set hive.optimize.ppd=false; select * from table_map_types t1 inner join table_map_types t2 on t1.c1 = t2.c1;

It is better to keep test cases minimal.

…mon merge join

…rge join

zabetak

Thanks for pushing this forward @maheshk114
I have no more comments, LGTM!

kgyrtkirk added tests pending tests unstable and removed tests pending labels Mar 14, 2021

maheshk114 force-pushed the HIVE-24883 branch from f966e6c to 9733e06 Compare April 8, 2021 04:35

github-actions bot requested a review from jcamachor April 8, 2021 04:36

kgyrtkirk added tests pending tests failed and removed tests unstable tests pending labels Apr 8, 2021

maheshk114 force-pushed the HIVE-24883 branch from 9733e06 to 25fada1 Compare April 8, 2021 05:46

kgyrtkirk added tests pending tests unstable and removed tests failed tests pending labels Apr 8, 2021

zabetak reviewed Apr 8, 2021

View reviewed changes

kgyrtkirk added tests pending and removed tests unstable labels Apr 9, 2021

kgyrtkirk added tests unstable tests pending tests failed and removed tests pending tests unstable labels Apr 9, 2021

maheshk114 force-pushed the HIVE-24883 branch from d44ff33 to 1ece1d5 Compare April 9, 2021 12:26

kgyrtkirk added tests pending and removed tests failed tests pending labels Apr 9, 2021

kgyrtkirk added the tests unstable label Apr 22, 2021

maheshk114 force-pushed the HIVE-24883 branch from 2d8c75e to b7d056f Compare April 22, 2021 03:03

kgyrtkirk added tests pending tests failed and removed tests unstable tests pending labels Apr 22, 2021

maheshk114 and others added 5 commits April 22, 2021 20:08

HIVE-24883 : Add support for complex types columns in Hive Joins

efad4a7

HIVE-24883 : Add support for complex types columns in Hive Joins : Fi…

3f50379

…xed review commnets

HIVE-24883 : Add support for complex types columns in Hive Joins : Fi…

c199ab4

…xed review commnets1

HIVE-24883 : Add support for complex types columns in Hive Joins : Fi…

314c91c

…xed review commnets3

HIVE-24883 : Add support for complex types columns in Hive SMB and Co…

22521ee

…mmon merge joins 1. Support added only for equal operator. 2. Not supported for map type.

maheshk114 force-pushed the HIVE-24883 branch from b7d056f to 22521ee Compare April 22, 2021 14:38

kgyrtkirk added tests pending tests passed and removed tests failed tests pending labels Apr 22, 2021

zabetak reviewed Apr 28, 2021

View reviewed changes

HIVE-24883 : Support ARRAY/STRUCT/UNION types in equality SMB and Com…

34f970e

…mon merge join

kgyrtkirk added tests pending tests passed and removed tests passed tests pending labels May 3, 2021

HIVE-24883 : Support ARRAY/STRUCT types in equality SMB and Common me…

9a6e6ca

…rge join

kgyrtkirk added tests pending tests passed and removed tests passed tests pending labels May 4, 2021

zabetak approved these changes May 7, 2021

View reviewed changes

jcamachor approved these changes May 10, 2021

View reviewed changes

maheshk114 merged commit 37f13b0 into apache:master May 11, 2021

HIVE-24883 : Add support for complex types columns in Hive Joins #2071

HIVE-24883 : Add support for complex types columns in Hive Joins #2071

Uh oh!

Conversation

maheshk114 commented Mar 14, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

zabetak left a comment

Choose a reason for hiding this comment

Uh oh!

zabetak Apr 8, 2021

Choose a reason for hiding this comment

Uh oh!

maheshk114 Apr 9, 2021

Choose a reason for hiding this comment

Uh oh!

zabetak Apr 8, 2021

Choose a reason for hiding this comment

Uh oh!

maheshk114 Apr 9, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zabetak Apr 8, 2021

Choose a reason for hiding this comment

Uh oh!

maheshk114 Apr 9, 2021

Choose a reason for hiding this comment

Uh oh!

maheshk114 commented Apr 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maheshk114 commented Apr 9, 2021

Uh oh!

maheshk114 commented Apr 9, 2021

Uh oh!

zabetak left a comment

Choose a reason for hiding this comment

Uh oh!

zabetak Apr 28, 2021

Choose a reason for hiding this comment

Uh oh!

zabetak left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

maheshk114 commented Apr 9, 2021 •

edited

Loading