New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-31239][hive] Fix native sum function can't get the corrected value when the argument type is string #22031
Conversation
…nt type is string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution, I left some comments
@@ -100,6 +100,7 @@ the option `table.exec.hive.native-agg-function.enabled`, which brings significa | |||
</table> | |||
|
|||
<span class="label label-danger">Attention</span> The ability of the native aggregation functions doesn't fully align with Hive built-in aggregation functions now, for example, some data types are not supported. If performance is not a bottleneck, you don't need to turn on this option. | |||
In addition, `table.exec.hive.native-agg-function.enabled` option can't be turned on per job when using it via SqlClient, currently, only the module level is supported. Users should turn on this option first and then load HiveModule. This issue will be fixed in [FLINK-31193](https://issues.apache.org/jira/browse/FLINK-31193). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do not need add the issue number in the doc.
}; | ||
} | ||
|
||
@Override | ||
public Expression getValueExpression() { | ||
return sum; | ||
return ifThenElse(isTrue(isEmpty), nullOf(getResultType()), sum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the hive behavior when input is empty?
btw, please add a test to cover this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hive code as following:
@AggregationType(estimable = true)
static class SumLongAgg extends SumAgg<Long> {
@Override
public int estimate() { return JavaDataModel.PRIMITIVES1 + JavaDataModel.PRIMITIVES2; }
}
@Override
public AggregationBuffer getNewAggregationBuffer() throws HiveException {
SumLongAgg result = new SumLongAgg();
reset(result);
return result;
}
@Override
public void reset(AggregationBuffer agg) throws HiveException {
SumLongAgg myagg = (SumLongAgg) agg;
myagg.empty = true;
myagg.sum = 0L;
myagg.uniqueObjects = new HashSet<ObjectInspectorObject>();
}
private boolean warned = false;
@Override
public void iterate(AggregationBuffer agg, Object[] parameters) throws HiveException {
assert (parameters.length == 1);
try {
if (isEligibleValue((SumLongAgg) agg, parameters[0])) {
((SumLongAgg)agg).empty = false;
((SumLongAgg)agg).sum += PrimitiveObjectInspectorUtils.getLong(parameters[0], inputOI);
}
} catch (NumberFormatException e) {
if (!warned) {
warned = true;
LOG.warn(getClass().getSimpleName() + " "
+ StringUtils.stringifyException(e));
}
}
}
@Override
public void merge(AggregationBuffer agg, Object partial) throws HiveException {
if (partial != null) {
SumLongAgg myagg = (SumLongAgg) agg;
myagg.empty = false;
if (isWindowingDistinct()) {
throw new HiveException("Distinct windowing UDAF doesn't support merge and terminatePartial");
} else {
myagg.sum += PrimitiveObjectInspectorUtils.getLong(partial, inputOI);
}
}
}
@Override
public Object terminate(AggregationBuffer agg) throws HiveException {
SumLongAgg myagg = (SumLongAgg) agg;
if (myagg.empty) {
return null;
}
result.set(myagg.sum);
return result;
}
It returns a null value if all elements are null.
…native-agg-function.enabled option can't turned on per job when using it via SqlClient
@godfreyhe Thanks for your reviewing, I've addressed your comments. Could you help retain two commits when you merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…native-agg-function.enabled option can't turned on per job when using it via SqlClient This closes #22031
…alue when the argument type is string This closes apache#22031 (cherry picked from commit 263555c)
…native-agg-function.enabled option can't turned on per job when using it via SqlClient This closes apache#22031 (cherry picked from commit 62a3b99)
What is the purpose of the change
Currently, for the following case:
The native sum function return
[+I[1,null], +I[2, null], +I[4, 1.0]]
, but hive sum function return[+I[1,0.0], +I[2,0.0], +I[4, 1.0]]
. The native function return result is not consistent with hive, this is a bug, so we should fix it.Brief change log
Verifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation