Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dependency com.tdunning:t-digest to v3.3 #2136

Merged
merged 2 commits into from
Mar 20, 2024

Conversation

solrbot
Copy link
Collaborator

@solrbot solrbot commented Dec 8, 2023

This PR contains the following updates:

Package Type Update Change
com.tdunning:t-digest dependencies minor 3.1 -> 3.3

Release Notes

tdunning/t-digest (com.tdunning:t-digest)

v3.2

===========
In release 3.2, the goal is to produce an update to the code given the large number of improvements since the previous release.

There are a few bugs that will survive this release, most notably in the AVLTreeDigest. These have to do with large numbers of repeated data points and are not new bugs.

There is also a lot of work going on with serialization. I need to hear from people about what they are doing with serialization so that we can build some test cases to allow an appropriate migration strategy to future serialization.

The paper continues to be updated. The algorithmic descriptions are getting reasonably clear, but the speed and accuracy sections need a complete revamp with current implementations.

Bugs, fixed and known

Fixed

The following important issues are fixed in this release

Issue #​90 Serialization for MergingDigest

Issue #​92 Serialization for AVLTreeDigest

Maybe fixed

This issue has substantial progress, but lacks a definitive test to determine whether it should be closed.

Issue 78 Stability under merging.

Pushed

The following issues are pushed beyond this release

Issue #​87 Future proof and extensible serialization

Issue #​89 Bad handling for duplicate values in AVLTreeDigest

All fixed issues

Here is a complete list of issues resolved in this release:

Issue #​55 Add time
decay to t-digest

Issue #​52 General
factory method for "fromBytes"

Issue #​90
Deserialization of MergingDigest BufferUnderflowException in 3.1

Issue #​92 Error in
AVLTreeDigest.fromBytes

Issue #​93 high
centroid frequency causes overflow - giving incorrect results

Issue #​67 Release of
version 3.2

Issue #​81
AVLTreeDigest with a lot of datas : integer overflow

Issue #​75 Adjusting
the centroid threshold values to obtain better accuracy at interesting
values

Issue #​74 underlying
distribution : powerlaw

Issue #​72 Inverse
quantile algorithm is non-contiguous

Issue #​65
totalDigest add spark dataframe column / array

Issue #​60 Getting
IllegalArgumentException when adding digests

Issue #​53
smallByteSize methods are very trappy in many classes -- should be
changed or have warnings in javadocs

Issue #​82 TDigest
class does not implement Serializable interface in last release.

Issue #​42 Histogram

Issue #​40 Improved
constraint on centroid sizes

Issue #​37 Allow
arbitrary scaling laws for centroid sizes

Issue #​29 Test
method testScaling() always adds values in ascending order

Issue #​84 Remove
deprecated kinds of t-digest

Issue #​76 Add
serializability

Issue #​77 Question:
Proof of bounds on merging digest size

Issue #​71 Simple
alternate algorithm using maxima, ranks and fixed cumulative weighting

Issue #​61 Possible
improvement to the speed of the algorithm

Issue #​58 jdk8
doclint incompatibility

Issue #​48 Build is
unstable under some circumstances

Issue #​63 Which
TDigest do you recommend?

Issue #​62 Very slow
performance; what am I missing?

Issue #​47 Make
TDigest serializable

Issue #​49
MergingDigest.centroids is wrong on an empty digest


Configuration

📅 Schedule: Branch creation - "* * * * *" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR has been generated by Renovate Bot

@solrbot solrbot force-pushed the renovate/com.tdunning-t-digest-3.x branch 3 times, most recently from 38913d0 to ddb0858 Compare December 13, 2023 22:25
@solrbot solrbot force-pushed the renovate/com.tdunning-t-digest-3.x branch from ddb0858 to 3e17b87 Compare January 4, 2024 12:36
Copy link

github-actions bot commented Mar 7, 2024

This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the dev@solr.apache.org mailing list. Thank you for your contribution!

@github-actions github-actions bot added the stale PR not updated in 60 days label Mar 7, 2024
@janhoy janhoy force-pushed the renovate/com.tdunning-t-digest-3.x branch from 3e17b87 to eb2cd9b Compare March 7, 2024 09:13
@github-actions github-actions bot added the dependencies Dependency upgrades label Mar 7, 2024
@janhoy janhoy removed the stale PR not updated in 60 days label Mar 7, 2024
@janhoy
Copy link
Contributor

janhoy commented Mar 7, 2024

So turns out the Streaming tests fail consistently with this upgrade. Have not dived into it, but there must be some incompatible behavior between these versions causing this. Anyone knows why?

@janhoy janhoy requested a review from risdenk March 7, 2024 11:31
@risdenk
Copy link
Contributor

risdenk commented Mar 8, 2024

t-digest 3.2 looks fine but 3.3 seems to have different stats assertions? The changes look to be around percentile.

@risdenk
Copy link
Contributor

risdenk commented Mar 8, 2024

The nitty gritty details look to be here:

tdunning/t-digest#171

and

opensearch-project/OpenSearch#3634

basically for small sample sizes there used to be a bug and now its fixed. That seems to be my rough interpretation so its not surprising that our tests are failing.

@risdenk
Copy link
Contributor

risdenk commented Mar 8, 2024

Pushed e0d3696 updating the assertions in the percentiles for tests.

@solrbot
Copy link
Collaborator Author

solrbot commented Mar 8, 2024

Edited/Blocked Notification

Renovate will not automatically rebase this PR, because it does not recognize the last commit author and assumes somebody else may have edited the PR.

You can manually request rebase by checking the rebase/retry box above.

Warning: custom changes will be lost.

Copy link
Contributor

@janhoy janhoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @risdenk

@risdenk risdenk merged commit faa1001 into apache:main Mar 20, 2024
3 checks passed
risdenk added a commit that referenced this pull request Mar 20, 2024
* Update dependency com.tdunning:t-digest to v3.3

* Update percentile assertions based on bug fixes in t-digest

---------

Co-authored-by: Kevin Risden <krisden@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants