Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-3649] Hive expression is pushed down to carbon #3557

Closed
wants to merge 2 commits into from
Closed

[CARBONDATA-3649] Hive expression is pushed down to carbon #3557

wants to merge 2 commits into from

Conversation

xiaohui0318
Copy link
Contributor

Why is this PR needed?

With more and more scenarios requiring hive to read the carbon format, data filtering improvements are needed

What changes were proposed in this PR?

When set hive.optimize.index.filter = true, hive expression can be pushed down to carbon to filter the data

Does this PR introduce any user interface change?

  • No

Is any new testcase added?

  • Yes

@CarbonDataQA1
Copy link

Build Failed with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1405/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1414/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1427/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1415/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1406/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1428/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1407/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1416/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1429/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1423/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1416/

@zzcclp
Copy link
Contributor

zzcclp commented Jan 3, 2020

retest this please

@CarbonDataQA1
Copy link

Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1424/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1444/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1432/

@xiaohui0318
Copy link
Contributor Author

retest this please

hadoop/pom.xml Outdated
@@ -30,6 +30,7 @@
<name>Apache CarbonData :: Hadoop</name>

<properties>
<hive.version>1.2.1</hive.version>
Copy link
Contributor

@jackylk jackylk Jan 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this in parent pom.xml

} else if (type.toUpperCase().endsWith("VARCHAR")) {
return DataTypes.VARCHAR;
}
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is hive pushdown unsupported for complex data types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not support

@CarbonDataQA1
Copy link

Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1434/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1446/

@QiangCai
Copy link
Contributor

QiangCai commented Jan 4, 2020

retest this please

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1458/

@xiaohui0318
Copy link
Contributor Author

retest this please

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1460/

LOG.debug("hive expression:" + exprNodeGenericFuncDesc.getGenericUDF());
LOG.debug("hive expression string:" + exprNodeGenericFuncDesc.getExprString());
Expression expression =
Hive2CarbonExpression.convertExprHive2Carbon(exprNodeGenericFuncDesc);
Copy link
Contributor

@jackylk jackylk Jan 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this filter expression conversion in MapredCarbonInputFormat.java (in carbondata-hive module) and set the FILTER_PREDICATE in the hadoop configuration, so that we donot need to add hive-exec dependency in carbondata-hadoop/flink/presto module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

return null;
}

public static DataType getDateType(String type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use DataTypeUtil.valueOf instead of creating a this func?

/**
* @description: hive expression to carbon expression
*/
public class Hive2CarbonExpression {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this in carbondata-hive module

} else if (udf instanceof GenericUDFOPEqual) {
ColumnExpression columnExpression = null;
if (ll.get(left) instanceof ExprNodeFieldDesc) {
LOG.debug("Complex types are not supported");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to throw exception to indicate it is not supported

import org.apache.carbondata.core.util.CarbonProperties;
import org.apache.carbondata.hadoop.api.CarbonFileInputFormat;
import org.apache.carbondata.hadoop.testutil.StoreCreator;
import org.apache.carbondata.processing.loading.model.CarbonLoadModel;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please move this test class and the utility Hive2CarbonExpression to Hive module

}
ExprNodeGenericFuncDesc exprNodeGenericFuncDesc =
Utilities.deserializeObject(expr, ExprNodeGenericFuncDesc.class);
LOG.debug("hive expression:" + exprNodeGenericFuncDesc.getGenericUDF());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add isDebugEnabled() check, here and all other places

if (exprNodeDesc instanceof ExprNodeGenericFuncDesc) {
ExprNodeGenericFuncDesc exprNodeGenericFuncDesc = (ExprNodeGenericFuncDesc) exprNodeDesc;
GenericUDF udf = exprNodeGenericFuncDesc.getGenericUDF();
List<ExprNodeDesc> ll = exprNodeGenericFuncDesc.getChildren();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rename l1 and add comments for better readability.
add comments to the method also

private static StoreCreator creator;
private static CarbonLoadModel loadModel;
private static CarbonTable table;
static {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add the usage of the property "hive.optimize.index.filter" in the hive document?

CarbonProperties.getInstance()
.addProperty(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME, "Hive2CarbonExpressionTest");
try {
creator = new StoreCreator(new File("target/store").getAbsolutePath(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where exactly we have used hive.optimize.index.filter this property while testing?


} else if (udf instanceof GenericUDFOPEqual) {
ColumnExpression columnExpression = null;
if (ll.get(left) instanceof ExprNodeFieldDesc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this handle for all the complex date types like STRUCT, ARRAY and MAP?
Please add test cases where it tries to create filter expression for all the complex data types.

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1509/

@xiaohui0318
Copy link
Contributor Author

retest this please

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1514/

@jackylk
Copy link
Contributor

jackylk commented Jan 8, 2020

LGTM

@asfgit asfgit closed this in b992571 Jan 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants