Skip to content

AggregatorFactory: Use guessAggregatorHeapFootprint when factorizeWithSize is not implemented.#14567

Merged
gianm merged 1 commit intoapache:masterfrom
gianm:fix-passthrough-fws
Jul 12, 2023
Merged

AggregatorFactory: Use guessAggregatorHeapFootprint when factorizeWithSize is not implemented.#14567
gianm merged 1 commit intoapache:masterfrom
gianm:fix-passthrough-fws

Conversation

@gianm
Copy link
Contributor

@gianm gianm commented Jul 11, 2023

There are two ways of estimating heap footprint of an Aggregator:

  1. AggregatorFactory#guessAggregatorHeapFootprint
  2. AggregatorFactory#factorizeWithSize + Aggregator#aggregateWithSize

When the second path is used, the default implementation of factorizeWithSize is now updated to delegate to guessAggregatorHeapFootprint, making these equivalent. The old logic used getMaxIntermediateSize, which is less accurate.

Also fixes a bug where, when using the second path, calling factorizeWithSize on PassthroughAggregatorFactory would fail because getMaxIntermediateSize was not implemented. (There is no buffer aggregator, so there would be no need.)

…hSize is not implemented.

There are two ways of estimating heap footprint of an Aggregator:

1) AggregatorFactory#guessAggregatorHeapFootprint
2) AggregatorFactory#factorizeWithSize + Aggregator#aggregateWithSize

When the second path is used, the default implementation of factorizeWithSize
is now updated to delegate to guessAggregatorHeapFootprint, making these equivalent.
The old logic used getMaxIntermediateSize, which is less accurate.

Also fixes a bug where, when using the second path, calling factorizeWithSize
on PassthroughAggregatorFactory would fail because getMaxIntermediateSize was
not implemented. (There is no buffer aggregator, so there would be no need.)
@gianm gianm added the Bug label Jul 11, 2023
Copy link
Contributor

@LakshSingla LakshSingla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Unrelated to the PR, however, related to the factory which sees this regression, in the PassthroughAggregatorFactorywe through UOEs, which can bubble up to the user.

If we know those shouldn't be encountered by the user during normal operations, then we should improve the error message like:

  private static DruidException generateUnsupportedMethodException(final String methodName)
  {
    return DruidException.defensive(
        "PassthroughAggregatorFactory does not support the method [%s]. PassthroughAggregatorFactory is Druid's "
        + "way of storing complex types into segments without finalizing them. Treat this exception as a bug if it is "
        + "encountered while using Druid",
        methodName
    );
  }

throw generateUnsupportedMethodException("finalizeComputation");

Or if they can be encountered by the user, then an appropriate user-facing message, as to why this can happen.

@gianm
Copy link
Contributor Author

gianm commented Jul 12, 2023

@LakshSingla I think you're right that the exception should ideally be a DEFENSIVE. We expect this never to happen, which is classic defensive. The fact that it was seen at one point is a bug that this PR fixes. My only note really would be that, IMO, the "Treat this exception as a bug…" is unnecessary. I'm not sure if we want the message, but if we do, it would actually make sense on every DEFENSIVE exception so would make more sense to bake into the DEFENSIVE exception generator or mapper itself.

@gianm gianm merged commit cc8b210 into apache:master Jul 12, 2023
@gianm gianm deleted the fix-passthrough-fws branch July 12, 2023 14:33
sergioferragut pushed a commit to sergioferragut/druid that referenced this pull request Jul 21, 2023
…hSize is not implemented. (apache#14567)

There are two ways of estimating heap footprint of an Aggregator:

1) AggregatorFactory#guessAggregatorHeapFootprint
2) AggregatorFactory#factorizeWithSize + Aggregator#aggregateWithSize

When the second path is used, the default implementation of factorizeWithSize
is now updated to delegate to guessAggregatorHeapFootprint, making these equivalent.
The old logic used getMaxIntermediateSize, which is less accurate.

Also fixes a bug where, when using the second path, calling factorizeWithSize
on PassthroughAggregatorFactory would fail because getMaxIntermediateSize was
not implemented. (There is no buffer aggregator, so there would be no need.)
@LakshSingla LakshSingla added this to the 28.0 milestone Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants