Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functions to generate random values according to the distribution #42411

Merged
merged 13 commits into from Oct 20, 2022

Conversation

nikitamikhaylov
Copy link
Member

@nikitamikhaylov nikitamikhaylov commented Oct 17, 2022

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Added functions (randUniform, randNormal, randLogNormal, randExponential, randChiSquared, randStudentT, randFisherF, randBernoulli, randBinomial, randNegativeBinomial, randPoisson ) to generate random values according to the specified distributions. This closes #21834.

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-feature Pull request with new product feature label Oct 17, 2022
@nikitamikhaylov
Copy link
Member Author

These functions could be slow because standard library is used and a distribution object is initialized per each block. But they are really useful for testing other statistical functions

@antonio2368 antonio2368 self-assigned this Oct 18, 2022
src/Functions/distribution.cpp Outdated Show resolved Hide resolved
src/Functions/distribution.cpp Outdated Show resolved Hide resolved
src/Functions/distribution.cpp Outdated Show resolved Hide resolved
src/Functions/distribution.cpp Outdated Show resolved Hide resolved
src/Functions/distribution.cpp Outdated Show resolved Hide resolved
@alexey-milovidov
Copy link
Member

The names are slightly strange.
Should be similar to rand, rand64, or generateUUIDv4.

Maybe name them like

randNormal, randStudentT, etc.?

@nikitamikhaylov
Copy link
Member Author

No problem, will do

@alexey-milovidov
Copy link
Member

Can we add support for an additional "tag" argument for disambiguation, similarly to the existing rand, rand64?

@nikitamikhaylov
Copy link
Member Author

nikitamikhaylov commented Oct 18, 2022

Yes we also can. Initially thought about it, but since this functions will be used only for testing (from my perspective) then typical usecase won't involve multiple functions in the same query..

And it could be hard since these functions already accepts some arguments. We can add String argument as first in case if we want to differ functions.

@nikitamikhaylov
Copy link
Member Author

Have problem reproducing failures found by fuzzer.

@nikitamikhaylov nikitamikhaylov merged commit 9a73eb2 into master Oct 20, 2022
@nikitamikhaylov nikitamikhaylov deleted the function-distribution branch October 20, 2022 15:25
mrcrypster added a commit to mrcrypster/ClickHouse that referenced this pull request Dec 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature Pull request with new product feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generate random data with boundaries
4 participants