[CARBONDATA-3649] Hive expression is pushed down to carbon #3557

xiaohui0318 · 2020-01-02T15:29:55Z

Why is this PR needed?

With more and more scenarios requiring hive to read the carbon format, data filtering improvements are needed

What changes were proposed in this PR?

When set hive.optimize.index.filter = true, hive expression can be pushed down to carbon to filter the data

Does this PR introduce any user interface change?

No

Is any new testcase added?

Yes

CarbonDataQA1 · 2020-01-02T15:46:55Z

Build Failed with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1405/

CarbonDataQA1 · 2020-01-02T16:39:21Z

Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1414/

CarbonDataQA1 · 2020-01-02T16:48:44Z

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1427/

CarbonDataQA1 · 2020-01-03T00:31:40Z

Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1415/

CarbonDataQA1 · 2020-01-03T00:32:08Z

Build Failed with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1406/

CarbonDataQA1 · 2020-01-03T00:34:11Z

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1428/

CarbonDataQA1 · 2020-01-03T01:35:14Z

Build Failed with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1407/

CarbonDataQA1 · 2020-01-03T02:18:05Z

Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1416/

CarbonDataQA1 · 2020-01-03T02:34:23Z

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1429/

CarbonDataQA1 · 2020-01-03T06:36:06Z

Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1423/

CarbonDataQA1 · 2020-01-03T06:41:59Z

Build Failed with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1416/

zzcclp · 2020-01-03T10:21:52Z

retest this please

CarbonDataQA1 · 2020-01-03T10:42:26Z

Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1424/

CarbonDataQA1 · 2020-01-03T11:37:13Z

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1444/

CarbonDataQA1 · 2020-01-03T11:53:55Z

Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1432/

xiaohui0318 · 2020-01-03T12:20:11Z

retest this please

jackylk · 2020-01-03T12:27:25Z

hadoop/pom.xml

@@ -30,6 +30,7 @@
  <name>Apache CarbonData :: Hadoop</name>

  <properties>
+    <hive.version>1.2.1</hive.version>


Add this in parent pom.xml

Indhumathi27 · 2020-01-03T12:35:38Z

hadoop/src/main/java/org/apache/carbondata/hadoop/util/Hive2CarbonExpression.java

+    } else if (type.toUpperCase().endsWith("VARCHAR")) {
+      return DataTypes.VARCHAR;
+    }
+    return null;


Is hive pushdown unsupported for complex data types?

not support

CarbonDataQA1 · 2020-01-03T12:58:29Z

Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1434/

CarbonDataQA1 · 2020-01-03T13:41:24Z

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1446/

QiangCai · 2020-01-04T00:22:26Z

retest this please

CarbonDataQA1 · 2020-01-04T01:42:29Z

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1458/

xiaohui0318 · 2020-01-04T14:52:01Z

retest this please

CarbonDataQA1 · 2020-01-04T16:13:57Z

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1460/

jackylk · 2020-01-05T05:33:15Z

hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java

+        LOG.debug("hive expression:" + exprNodeGenericFuncDesc.getGenericUDF());
+        LOG.debug("hive expression string:" + exprNodeGenericFuncDesc.getExprString());
+        Expression expression =
+            Hive2CarbonExpression.convertExprHive2Carbon(exprNodeGenericFuncDesc);


Can we do this filter expression conversion in MapredCarbonInputFormat.java (in carbondata-hive module) and set the FILTER_PREDICATE in the hadoop configuration, so that we donot need to add hive-exec dependency in carbondata-hadoop/flink/presto module.

jackylk · 2020-01-05T05:36:09Z

hadoop/src/main/java/org/apache/carbondata/hadoop/util/Hive2CarbonExpression.java

+    return null;
+  }
+
+  public static DataType getDateType(String type) {


Can you use DataTypeUtil.valueOf instead of creating a this func?

jackylk · 2020-01-05T05:41:09Z

hadoop/src/main/java/org/apache/carbondata/hadoop/util/Hive2CarbonExpression.java

+/**
+ * @description: hive expression to carbon expression
+ */
+public class Hive2CarbonExpression {


Can we add this in carbondata-hive module

jackylk · 2020-01-05T05:41:53Z

hadoop/src/main/java/org/apache/carbondata/hadoop/util/Hive2CarbonExpression.java

+      } else if (udf instanceof GenericUDFOPEqual) {
+        ColumnExpression columnExpression = null;
+        if (ll.get(left) instanceof ExprNodeFieldDesc) {
+          LOG.debug("Complex types are not supported");


Better to throw exception to indicate it is not supported

dhatchayani · 2020-01-07T08:50:54Z

hadoop/src/test/java/org/apache/carbondata/hadoop/ft/Hive2CarbonExpressionTest.java

+import org.apache.carbondata.core.util.CarbonProperties;
+import org.apache.carbondata.hadoop.api.CarbonFileInputFormat;
+import org.apache.carbondata.hadoop.testutil.StoreCreator;
+import org.apache.carbondata.processing.loading.model.CarbonLoadModel;


can you please move this test class and the utility Hive2CarbonExpression to Hive module

dhatchayani · 2020-01-07T08:58:58Z

hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java

+        }
+        ExprNodeGenericFuncDesc exprNodeGenericFuncDesc =
+            Utilities.deserializeObject(expr, ExprNodeGenericFuncDesc.class);
+        LOG.debug("hive expression:" + exprNodeGenericFuncDesc.getGenericUDF());


please add isDebugEnabled() check, here and all other places

dhatchayani · 2020-01-07T09:09:17Z

hadoop/src/main/java/org/apache/carbondata/hadoop/util/Hive2CarbonExpression.java

+    if (exprNodeDesc instanceof ExprNodeGenericFuncDesc) {
+      ExprNodeGenericFuncDesc exprNodeGenericFuncDesc = (ExprNodeGenericFuncDesc) exprNodeDesc;
+      GenericUDF udf = exprNodeGenericFuncDesc.getGenericUDF();
+      List<ExprNodeDesc> ll = exprNodeGenericFuncDesc.getChildren();


please rename l1 and add comments for better readability.
add comments to the method also

dhatchayani · 2020-01-07T09:23:04Z

hadoop/src/test/java/org/apache/carbondata/hadoop/ft/Hive2CarbonExpressionTest.java

+  private static StoreCreator creator;
+  private static CarbonLoadModel loadModel;
+  private static CarbonTable table;
+  static {


can you please add the usage of the property "hive.optimize.index.filter" in the hive document?

dhatchayani · 2020-01-07T09:25:02Z

hadoop/src/test/java/org/apache/carbondata/hadoop/ft/Hive2CarbonExpressionTest.java

+    CarbonProperties.getInstance()
+        .addProperty(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME, "Hive2CarbonExpressionTest");
+    try {
+      creator = new StoreCreator(new File("target/store").getAbsolutePath(),


where exactly we have used hive.optimize.index.filter this property while testing?

dhatchayani · 2020-01-07T09:47:41Z

hadoop/src/main/java/org/apache/carbondata/hadoop/util/Hive2CarbonExpression.java

+
+      } else if (udf instanceof GenericUDFOPEqual) {
+        ColumnExpression columnExpression = null;
+        if (ll.get(left) instanceof ExprNodeFieldDesc) {


will this handle for all the complex date types like STRUCT, ARRAY and MAP?
Please add test cases where it tries to create filter expression for all the complex data types.

CarbonDataQA1 · 2020-01-07T16:11:17Z

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1509/

xiaohui0318 · 2020-01-07T17:20:59Z

retest this please

CarbonDataQA1 · 2020-01-07T18:46:02Z

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1514/

jackylk · 2020-01-08T07:50:45Z

LGTM

hive expression is pushed down to carbon

5f56907

jackylk reviewed Jan 3, 2020

View reviewed changes

Indhumathi27 reviewed Jan 3, 2020

View reviewed changes

jackylk reviewed Jan 5, 2020

View reviewed changes

dhatchayani reviewed Jan 7, 2020

View reviewed changes

only add in carbon-hive module,del old class

ca91a20

asfgit closed this in b992571 Jan 8, 2020

[CARBONDATA-3649] Hive expression is pushed down to carbon #3557

[CARBONDATA-3649] Hive expression is pushed down to carbon #3557

Conversation

xiaohui0318 commented Jan 2, 2020

Why is this PR needed?

What changes were proposed in this PR?

Does this PR introduce any user interface change?

Is any new testcase added?

CarbonDataQA1 commented Jan 2, 2020

CarbonDataQA1 commented Jan 2, 2020

CarbonDataQA1 commented Jan 2, 2020

CarbonDataQA1 commented Jan 3, 2020

CarbonDataQA1 commented Jan 3, 2020

CarbonDataQA1 commented Jan 3, 2020

CarbonDataQA1 commented Jan 3, 2020

CarbonDataQA1 commented Jan 3, 2020

CarbonDataQA1 commented Jan 3, 2020

CarbonDataQA1 commented Jan 3, 2020

CarbonDataQA1 commented Jan 3, 2020

zzcclp commented Jan 3, 2020

CarbonDataQA1 commented Jan 3, 2020

CarbonDataQA1 commented Jan 3, 2020

CarbonDataQA1 commented Jan 3, 2020

xiaohui0318 commented Jan 3, 2020

jackylk Jan 3, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarbonDataQA1 commented Jan 3, 2020

CarbonDataQA1 commented Jan 3, 2020

QiangCai commented Jan 4, 2020

CarbonDataQA1 commented Jan 4, 2020

xiaohui0318 commented Jan 4, 2020

CarbonDataQA1 commented Jan 4, 2020

jackylk Jan 5, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhatchayani Jan 7, 2020 • edited

Choose a reason for hiding this comment

dhatchayani Jan 7, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarbonDataQA1 commented Jan 7, 2020

xiaohui0318 commented Jan 7, 2020

CarbonDataQA1 commented Jan 7, 2020

jackylk commented Jan 8, 2020

jackylk Jan 3, 2020 •

edited

jackylk Jan 5, 2020 •

edited

dhatchayani Jan 7, 2020 •

edited

dhatchayani Jan 7, 2020 •

edited