-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-3649] Hive expression is pushed down to carbon #3557
Conversation
Build Failed with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1405/ |
Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1414/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1427/ |
Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1415/ |
Build Failed with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1406/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1428/ |
Build Failed with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1407/ |
Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1416/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1429/ |
Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1423/ |
Build Failed with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1416/ |
retest this please |
Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1424/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1444/ |
Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1432/ |
retest this please |
hadoop/pom.xml
Outdated
@@ -30,6 +30,7 @@ | |||
<name>Apache CarbonData :: Hadoop</name> | |||
|
|||
<properties> | |||
<hive.version>1.2.1</hive.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this in parent pom.xml
} else if (type.toUpperCase().endsWith("VARCHAR")) { | ||
return DataTypes.VARCHAR; | ||
} | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is hive pushdown unsupported for complex data types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not support
Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1434/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1446/ |
retest this please |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1458/ |
retest this please |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1460/ |
LOG.debug("hive expression:" + exprNodeGenericFuncDesc.getGenericUDF()); | ||
LOG.debug("hive expression string:" + exprNodeGenericFuncDesc.getExprString()); | ||
Expression expression = | ||
Hive2CarbonExpression.convertExprHive2Carbon(exprNodeGenericFuncDesc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this filter expression conversion in MapredCarbonInputFormat.java (in carbondata-hive module) and set the FILTER_PREDICATE in the hadoop configuration, so that we donot need to add hive-exec dependency in carbondata-hadoop/flink/presto module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
return null; | ||
} | ||
|
||
public static DataType getDateType(String type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use DataTypeUtil.valueOf
instead of creating a this func?
/** | ||
* @description: hive expression to carbon expression | ||
*/ | ||
public class Hive2CarbonExpression { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add this in carbondata-hive module
} else if (udf instanceof GenericUDFOPEqual) { | ||
ColumnExpression columnExpression = null; | ||
if (ll.get(left) instanceof ExprNodeFieldDesc) { | ||
LOG.debug("Complex types are not supported"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to throw exception to indicate it is not supported
import org.apache.carbondata.core.util.CarbonProperties; | ||
import org.apache.carbondata.hadoop.api.CarbonFileInputFormat; | ||
import org.apache.carbondata.hadoop.testutil.StoreCreator; | ||
import org.apache.carbondata.processing.loading.model.CarbonLoadModel; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please move this test class and the utility Hive2CarbonExpression to Hive module
} | ||
ExprNodeGenericFuncDesc exprNodeGenericFuncDesc = | ||
Utilities.deserializeObject(expr, ExprNodeGenericFuncDesc.class); | ||
LOG.debug("hive expression:" + exprNodeGenericFuncDesc.getGenericUDF()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add isDebugEnabled() check, here and all other places
if (exprNodeDesc instanceof ExprNodeGenericFuncDesc) { | ||
ExprNodeGenericFuncDesc exprNodeGenericFuncDesc = (ExprNodeGenericFuncDesc) exprNodeDesc; | ||
GenericUDF udf = exprNodeGenericFuncDesc.getGenericUDF(); | ||
List<ExprNodeDesc> ll = exprNodeGenericFuncDesc.getChildren(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rename l1 and add comments for better readability.
add comments to the method also
private static StoreCreator creator; | ||
private static CarbonLoadModel loadModel; | ||
private static CarbonTable table; | ||
static { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please add the usage of the property "hive.optimize.index.filter" in the hive document?
CarbonProperties.getInstance() | ||
.addProperty(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME, "Hive2CarbonExpressionTest"); | ||
try { | ||
creator = new StoreCreator(new File("target/store").getAbsolutePath(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where exactly we have used hive.optimize.index.filter this property while testing?
|
||
} else if (udf instanceof GenericUDFOPEqual) { | ||
ColumnExpression columnExpression = null; | ||
if (ll.get(left) instanceof ExprNodeFieldDesc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this handle for all the complex date types like STRUCT, ARRAY and MAP?
Please add test cases where it tries to create filter expression for all the complex data types.
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1509/ |
retest this please |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1514/ |
LGTM |
Why is this PR needed?
With more and more scenarios requiring hive to read the carbon format, data filtering improvements are needed
What changes were proposed in this PR?
When set hive.optimize.index.filter = true, hive expression can be pushed down to carbon to filter the data
Does this PR introduce any user interface change?
Is any new testcase added?