Skip to content

use datasketches-java 4.2.0#15257

Merged
AlexanderSaydakov merged 9 commits intomasterfrom
datasketches-4.2.0
Oct 26, 2023
Merged

use datasketches-java 4.2.0#15257
AlexanderSaydakov merged 9 commits intomasterfrom
datasketches-4.2.0

Conversation

@AlexanderSaydakov
Copy link
Contributor

This is to use the latest datasketches-java version 4.2.0.
This was supposed to be a minor version change, but inadvertently some API changes were introduced. Therefore I had to implement a few new required methods in the custom ArrayOfStringTuplesSerDe. They will need a careful review since I am not entirely sure I understood the serial format correctly.
Also one test is currently failing. I don’t understand the purpose of this test. It is called preservesMinAndMaxWhenAssumeGroupedFalse. I have no idea what does this mean. However it asks a quantile sketch to partition 66 items into 66 partitions and expects exactly one item in each. If we allow even slightest error (and sketches are approximate) we can get some partitions with 2 items and some empty ones. So with deduplication it leads to fewer partitions.
This change in behavior from 4.1.0 to 4.2.0 is unfortunate, but not incorrect. This is a degenerate use case. I would think that a better test could generate, say, 1000 items, ask for 10 partitions and assert that partitions have 100+-2 items or something like that. Perhaps this behavior with very small partitions can be improved in the next version, but for now I would suggest using 4.2.0 and changing this test somehow.

@github-actions github-actions bot added Area - Batch Ingestion Area - Dependencies Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Oct 25, 2023
@abhishekagarwal87 abhishekagarwal87 added this to the 28.0 milestone Oct 25, 2023
Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM.
Thanks @AlexanderSaydakov @gianm @adarshsanjeev for pitching in for this druid 28 blocker.

@AlexanderSaydakov AlexanderSaydakov merged commit f1132d2 into master Oct 26, 2023
@AlexanderSaydakov AlexanderSaydakov deleted the datasketches-4.2.0 branch October 26, 2023 23:28
LakshSingla pushed a commit to LakshSingla/druid that referenced this pull request Oct 27, 2023
* use datasketches-java 4.2.0

* use exclusive mode

* fixed issues raised by CodeQL

* fixed issue raised by spotbugs

* fixed issues raised by intellij

* added missing import

* Update QuantilesSketchKeyCollector search mode and adjust tests.

* Update sizeOf functions and add unit tests

* Add unit tests

---------

Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com>
cryptoe pushed a commit that referenced this pull request Oct 27, 2023
Backport of : #15267
---------

Co-authored-by: Alexander Saydakov <13126686+AlexanderSaydakov@users.noreply.github.com>
Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com>
CaseyPan pushed a commit to CaseyPan/druid that referenced this pull request Nov 17, 2023
* use datasketches-java 4.2.0

* use exclusive mode

* fixed issues raised by CodeQL

* fixed issue raised by spotbugs

* fixed issues raised by intellij

* added missing import

* Update QuantilesSketchKeyCollector search mode and adjust tests.

* Update sizeOf functions and add unit tests

* Add unit tests

---------

Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - Dependencies Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants