Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string column handling for long/float min/max/sum aggregators #8319

Merged
merged 6 commits into from
Aug 27, 2019

Conversation

himanshug
Copy link
Contributor

Fixes #8148

Same as #8243 but for long/float min/max/sum aggregators

Description

This patch adds handling of single/multi value column handling by long/float sum/min/max aggregators to do a best effort parsing string as long/float.

StringColumn[Long/Float]AggregatorWrapper and StringColumn[Long/Float]BufferAggregatorWrapper classes are introduced that can wrap existing long/float aggregators to handle string columns. Both of the classes are used by Simple[Long/Float]AggregatorFactory to be used when input column is known to be of String type.


This PR has:

  • been self-reviewed.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added unit tests or modified existing tests to cover new code paths.

@jihoonson
Copy link
Contributor

@himanshug I will take a look probably sometime this week. In the meantime, would you check the CI failure? It looks legit.

[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.029 s <<< FAILURE! - in org.apache.druid.query.groupby.epinephelinae.ParallelCombinerTest
[ERROR] testCombine(org.apache.druid.query.groupby.epinephelinae.ParallelCombinerTest)  Time elapsed: 0.02 s  <<< ERROR!
java.lang.UnsupportedOperationException
	at org.apache.druid.query.groupby.epinephelinae.ParallelCombiner$SettableColumnSelectorFactory.getColumnCapabilities(ParallelCombiner.java:502)
	at org.apache.druid.query.aggregation.SimpleLongAggregatorFactory.shouldUseStringColumnAggregatorWrapper(SimpleLongAggregatorFactory.java:222)
	at org.apache.druid.query.aggregation.SimpleLongAggregatorFactory.factorizeBuffered(SimpleLongAggregatorFactory.java:97)
	at org.apache.druid.query.aggregation.SimpleLongAggregatorFactory.factorizeBuffered(SimpleLongAggregatorFactory.java:48)
	at org.apache.druid.query.aggregation.NullableAggregatorFactory.factorizeBuffered(NullableAggregatorFactory.java:52)
	at org.apache.druid.query.groupby.epinephelinae.StreamingMergeSortedGrouper.<init>(StreamingMergeSortedGrouper.java:145)
	at org.apache.druid.query.groupby.epinephelinae.ParallelCombiner.runCombiner(ParallelCombiner.java:392)
	at org.apache.druid.query.groupby.epinephelinae.ParallelCombiner.buildCombineTree(ParallelCombiner.java:348)
	at org.apache.druid.query.groupby.epinephelinae.ParallelCombiner.combine(ParallelCombiner.java:164)
	at org.apache.druid.query.groupby.epinephelinae.ParallelCombinerTest.testCombine(ParallelCombinerTest.java:117)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.junit.runners.Suite.runChild(Suite.java:128)
	at org.junit.runners.Suite.runChild(Suite.java:27)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.apache.maven.surefire.junitcore.JUnitCore.run(JUnitCore.java:55)
	at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.createRequestAndRun(JUnitCoreWrapper.java:137)
	at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.executeEager(JUnitCoreWrapper.java:107)
	at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:83)
	at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:75)
	at org.apache.maven.surefire.junitcore.JUnitCoreProvider.invoke(JUnitCoreProvider.java:158)
	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)

@himanshug
Copy link
Contributor Author

@jihoonson thanks, I will fix the build tomorrow.

@himanshug himanshug closed this Aug 27, 2019
@himanshug himanshug reopened this Aug 27, 2019
@himanshug
Copy link
Contributor Author

@jihoonson fyi: build is actually fixed , coveralls failure is unrelated.

@jihoonson
Copy link
Contributor

Oh sorry. Forgot about this PR. Will take a look this week.

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍

@himanshug
Copy link
Contributor Author

@clintropolis thanks for the review

@jihoonson you can skip this one as the changes here are simple and don't need multiple reviews. there will be more PRs soon, so save yourself some time :)

@himanshug himanshug added this to the 0.16.0 milestone Aug 27, 2019
@himanshug himanshug merged commit 5c3db41 into apache:master Aug 27, 2019
@himanshug himanshug deleted the agg_on_string branch August 27, 2019 23:11
@jihoonson
Copy link
Contributor

@himanshug cool, I will try to review your follow-up prs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[double/long/float][sum/min/max] aggregator behavior on multi-value string columns
3 participants