Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make double sum/min/max agg work on string columns #8243

Merged
merged 7 commits into from
Aug 13, 2019

Conversation

himanshug
Copy link
Contributor

@himanshug himanshug commented Aug 5, 2019

Partial work towards #8148 . There would be follow up PR to do same for other core aggregators once this is merged.

Description

This patch adds handling of single/multi value column handling by double sum/min/max aggregators to do a best effort parsing string as double.

StringColumnDoubleAggregatorWrapper and StringColumnDoubleBufferAggregatorWrapper classes are introduced that can wrap existing double aggregators to handle string columns. Both of the classes are used by SimpleDoubleAggregatorFactory to be used when input column is known to be of String type.

Currently it does not work if fieldExpression instead of fieldName is provided. (Related #8242 )


This PR has:

  • been self-reviewed.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added unit tests or modified existing tests to cover new code paths.

@himanshug
Copy link
Contributor Author

@clintropolis @jon-wei @jihoonson can one of you please review this PR ?

@himanshug himanshug added this to the 0.16.0 milestone Aug 12, 2019
@clintropolis
Copy link
Member

@clintropolis @jon-wei @jihoonson can one of you please review this PR ?

👍, I should be able to have a look sometime today

@jihoonson
Copy link
Contributor

I will take a look this PR as well as the discussion issue this week.

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall. Left some trivial comments.


import javax.annotation.Nullable;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import java.util.Objects;

public abstract class SimpleDoubleAggregatorFactory extends NullableAggregatorFactory<BaseDoubleColumnValueSelector>
public abstract class SimpleDoubleAggregatorFactory extends NullableAggregatorFactory<ColumnValueSelector>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please add a javadoc explaining why the type for NullableAggregatorFactory is ColumnValueSelector instead of BaseDoubleColumnValueSelector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

protected Aggregator factorize(ColumnSelectorFactory metricFactory, ColumnValueSelector selector)
{
if (shouldUseStringColumnAggregatorWrapper(metricFactory)) {
return new StringColumnDoubleAggregatorWrapper(selector, SimpleDoubleAggregatorFactory.this::buildAggregator, nullValue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line exceeds 120 characters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

)
{
if (shouldUseStringColumnAggregatorWrapper(metricFactory)) {
return new StringColumnDoubleBufferAggregatorWrapper(selector, SimpleDoubleAggregatorFactory.this::buildBufferAggregator, nullValue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line exceeds 120 characters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

}
}

public static double tryParseDouble(Object val, double nullValue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be moved to Numbers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please annotate val with @Nullable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed and moved to Numbers which is a better home for this.

import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
import org.apache.druid.segment.BaseDoubleColumnValueSelector;

public class SettableValueDoubleColumnValueSelector implements BaseDoubleColumnValueSelector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is it possible to use existing SettableDoubleColumnValueSelector? Looks feasible if the value type conversion is done in some other selector. Not sure this way is better though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are somewhat different , and that is harder to use in this context. that said, I have added comments on this class to explain its use.

@himanshug
Copy link
Contributor Author

@jihoonson thanks for quick review

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest change looks good to me. +1 after CI. Left a trivial comment which doesn't block this PR.

import java.util.Collections;
import java.util.List;

public class StringColumnAggregationTest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe it would be more clear if the class name indicates that it's testing double aggregation on string columns, such as DoubleAggregationOnStringColumnTest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there would be followup PR to aggregate string columns correctly in long/float versions as well and I plan to modify the test here with those additional aggregators . so , later it wouldn't be so double aggregation specific.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants