-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make double sum/min/max agg work on string columns #8243
Conversation
@clintropolis @jon-wei @jihoonson can one of you please review this PR ? |
👍, I should be able to have a look sometime today |
I will take a look this PR as well as the discussion issue this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall. Left some trivial comments.
|
||
import javax.annotation.Nullable; | ||
import java.util.Collections; | ||
import java.util.Comparator; | ||
import java.util.List; | ||
import java.util.Objects; | ||
|
||
public abstract class SimpleDoubleAggregatorFactory extends NullableAggregatorFactory<BaseDoubleColumnValueSelector> | ||
public abstract class SimpleDoubleAggregatorFactory extends NullableAggregatorFactory<ColumnValueSelector> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please add a javadoc explaining why the type for NullableAggregatorFactory
is ColumnValueSelector
instead of BaseDoubleColumnValueSelector
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
protected Aggregator factorize(ColumnSelectorFactory metricFactory, ColumnValueSelector selector) | ||
{ | ||
if (shouldUseStringColumnAggregatorWrapper(metricFactory)) { | ||
return new StringColumnDoubleAggregatorWrapper(selector, SimpleDoubleAggregatorFactory.this::buildAggregator, nullValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line exceeds 120 characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
) | ||
{ | ||
if (shouldUseStringColumnAggregatorWrapper(metricFactory)) { | ||
return new StringColumnDoubleBufferAggregatorWrapper(selector, SimpleDoubleAggregatorFactory.this::buildBufferAggregator, nullValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line exceeds 120 characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
} | ||
} | ||
|
||
public static double tryParseDouble(Object val, double nullValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be moved to Numbers
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please annotate val
with @Nullable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed and moved to Numbers which is a better home for this.
import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector; | ||
import org.apache.druid.segment.BaseDoubleColumnValueSelector; | ||
|
||
public class SettableValueDoubleColumnValueSelector implements BaseDoubleColumnValueSelector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, is it possible to use existing SettableDoubleColumnValueSelector
? Looks feasible if the value type conversion is done in some other selector. Not sure this way is better though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are somewhat different , and that is harder to use in this context. that said, I have added comments on this class to explain its use.
@jihoonson thanks for quick review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest change looks good to me. +1 after CI. Left a trivial comment which doesn't block this PR.
import java.util.Collections; | ||
import java.util.List; | ||
|
||
public class StringColumnAggregationTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe it would be more clear if the class name indicates that it's testing double aggregation on string columns, such as DoubleAggregationOnStringColumnTest
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there would be followup PR to aggregate string columns correctly in long/float versions as well and I plan to modify the test here with those additional aggregators . so , later it wouldn't be so double aggregation specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Partial work towards #8148 . There would be follow up PR to do same for other core aggregators once this is merged.
Description
This patch adds handling of single/multi value column handling by double sum/min/max aggregators to do a best effort parsing string as double.
StringColumnDoubleAggregatorWrapper
andStringColumnDoubleBufferAggregatorWrapper
classes are introduced that can wrap existing double aggregators to handle string columns. Both of the classes are used bySimpleDoubleAggregatorFactory
to be used when input column is known to be of String type.Currently it does not work if
fieldExpression
instead offieldName
is provided. (Related #8242 )This PR has: