AggregatorFactory: Use guessAggregatorHeapFootprint when factorizeWithSize is not implemented.#14567
Conversation
…hSize is not implemented. There are two ways of estimating heap footprint of an Aggregator: 1) AggregatorFactory#guessAggregatorHeapFootprint 2) AggregatorFactory#factorizeWithSize + Aggregator#aggregateWithSize When the second path is used, the default implementation of factorizeWithSize is now updated to delegate to guessAggregatorHeapFootprint, making these equivalent. The old logic used getMaxIntermediateSize, which is less accurate. Also fixes a bug where, when using the second path, calling factorizeWithSize on PassthroughAggregatorFactory would fail because getMaxIntermediateSize was not implemented. (There is no buffer aggregator, so there would be no need.)
LakshSingla
left a comment
There was a problem hiding this comment.
LGTM!
Unrelated to the PR, however, related to the factory which sees this regression, in the PassthroughAggregatorFactorywe through UOEs, which can bubble up to the user.
If we know those shouldn't be encountered by the user during normal operations, then we should improve the error message like:
private static DruidException generateUnsupportedMethodException(final String methodName)
{
return DruidException.defensive(
"PassthroughAggregatorFactory does not support the method [%s]. PassthroughAggregatorFactory is Druid's "
+ "way of storing complex types into segments without finalizing them. Treat this exception as a bug if it is "
+ "encountered while using Druid",
methodName
);
}
throw generateUnsupportedMethodException("finalizeComputation");Or if they can be encountered by the user, then an appropriate user-facing message, as to why this can happen.
|
@LakshSingla I think you're right that the exception should ideally be a |
…hSize is not implemented. (apache#14567) There are two ways of estimating heap footprint of an Aggregator: 1) AggregatorFactory#guessAggregatorHeapFootprint 2) AggregatorFactory#factorizeWithSize + Aggregator#aggregateWithSize When the second path is used, the default implementation of factorizeWithSize is now updated to delegate to guessAggregatorHeapFootprint, making these equivalent. The old logic used getMaxIntermediateSize, which is less accurate. Also fixes a bug where, when using the second path, calling factorizeWithSize on PassthroughAggregatorFactory would fail because getMaxIntermediateSize was not implemented. (There is no buffer aggregator, so there would be no need.)
There are two ways of estimating heap footprint of an Aggregator:
When the second path is used, the default implementation of factorizeWithSize is now updated to delegate to guessAggregatorHeapFootprint, making these equivalent. The old logic used getMaxIntermediateSize, which is less accurate.
Also fixes a bug where, when using the second path, calling factorizeWithSize on PassthroughAggregatorFactory would fail because getMaxIntermediateSize was not implemented. (There is no buffer aggregator, so there would be no need.)