use datasketches-java 4.2.0 by AlexanderSaydakov · Pull Request #15257 · apache/druid

AlexanderSaydakov · 2023-10-25T17:16:58Z

This is to use the latest datasketches-java version 4.2.0.
This was supposed to be a minor version change, but inadvertently some API changes were introduced. Therefore I had to implement a few new required methods in the custom ArrayOfStringTuplesSerDe. They will need a careful review since I am not entirely sure I understood the serial format correctly.
Also one test is currently failing. I don’t understand the purpose of this test. It is called preservesMinAndMaxWhenAssumeGroupedFalse. I have no idea what does this mean. However it asks a quantile sketch to partition 66 items into 66 partitions and expects exactly one item in each. If we allow even slightest error (and sketches are approximate) we can get some partitions with 2 items and some empty ones. So with deduplication it leads to fewer partitions.
This change in behavior from 4.1.0 to 4.2.0 is unfortunate, but not incorrect. This is a degenerate use case. I would think that a better test could generate, say, 1000 items, ask for 10 partitions and assert that partitions have 100+-2 items or something like that. Perhaps this behavior with very small partitions can be improved in the next version, but for now I would suggest using 4.2.0 and changing this test somehow.

...-query/src/main/java/org/apache/druid/msq/statistics/QuantilesSketchKeyCollectorFactory.java

cryptoe

Changes LGTM.
Thanks @AlexanderSaydakov @gianm @adarshsanjeev for pitching in for this druid 28 blocker.

* use datasketches-java 4.2.0 * use exclusive mode * fixed issues raised by CodeQL * fixed issue raised by spotbugs * fixed issues raised by intellij * added missing import * Update QuantilesSketchKeyCollector search mode and adjust tests. * Update sizeOf functions and add unit tests * Add unit tests --------- Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com> Co-authored-by: Gian Merlino <gianmerlino@gmail.com> Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com>

Backport of : #15267 --------- Co-authored-by: Alexander Saydakov <13126686+AlexanderSaydakov@users.noreply.github.com> Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com> Co-authored-by: Gian Merlino <gianmerlino@gmail.com> Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com>

* use datasketches-java 4.2.0 * use exclusive mode * fixed issues raised by CodeQL * fixed issue raised by spotbugs * fixed issues raised by intellij * added missing import * Update QuantilesSketchKeyCollector search mode and adjust tests. * Update sizeOf functions and add unit tests * Add unit tests --------- Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com> Co-authored-by: Gian Merlino <gianmerlino@gmail.com> Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com>

use datasketches-java 4.2.0

fa5daf5

github-actions bot added Area - Batch Ingestion Area - Dependencies Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Oct 25, 2023

abhishekagarwal87 added this to the 28.0 milestone Oct 25, 2023

github-advanced-security bot found potential problems Oct 25, 2023

View reviewed changes

...-query/src/main/java/org/apache/druid/msq/statistics/QuantilesSketchKeyCollectorFactory.java Fixed Show fixed Hide fixed

...-query/src/main/java/org/apache/druid/msq/statistics/QuantilesSketchKeyCollectorFactory.java Fixed Show fixed Hide fixed

AlexanderSaydakov and others added 8 commits October 25, 2023 13:47

use exclusive mode

b423257

fixed issues raised by CodeQL

f51b7dc

fixed issue raised by spotbugs

982a35c

fixed issues raised by intellij

fc223cf

added missing import

25234ab

Update QuantilesSketchKeyCollector search mode and adjust tests.

83ae7ba

Update sizeOf functions and add unit tests

f1ebb07

Add unit tests

35c2b35

cryptoe approved these changes Oct 26, 2023

View reviewed changes

AlexanderSaydakov merged commit f1132d2 into master Oct 26, 2023

AlexanderSaydakov deleted the datasketches-4.2.0 branch October 26, 2023 23:28

LakshSingla mentioned this pull request Oct 27, 2023

[Backport] use datasketches-java 4.2.0 #15267

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use datasketches-java 4.2.0#15257

use datasketches-java 4.2.0#15257
AlexanderSaydakov merged 9 commits intomasterfrom
datasketches-4.2.0

AlexanderSaydakov commented Oct 25, 2023

Uh oh!

Uh oh!

Uh oh!

cryptoe left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

AlexanderSaydakov commented Oct 25, 2023

Uh oh!

Uh oh!

Uh oh!

cryptoe left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants