Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add balls and bins simulator #2001

Merged
merged 2 commits into from Oct 10, 2023
Merged

chore: add balls and bins simulator #2001

merged 2 commits into from Oct 10, 2023

Conversation

romange
Copy link
Collaborator

@romange romange commented Oct 8, 2023

No description provided.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
kostasrim
kostasrim previously approved these changes Oct 10, 2023
import matplotlib.pyplot as plt


def simulate_balls_into_bins(balls: int, N, threshold: int, exact, trials=10000):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit, if you plan to expand this in the future, replace N with bins since you already use balls instead of M

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

tools/balls_bins.py Show resolved Hide resolved
print(
f"Histogram of the difference between the most and least populated bins for {args.trials} trials"
)
plt.hist(deltas, bins=30, color="steelblue", edgecolor="none")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you limiting to 30 bins? I guess if we experiment with a large size of args.bins we will get large deltas and a lot of the data points will end up on the last bucket (bin 30). Is this intented?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

30 bins is 30 buckets that I assume would be defined according to the data distribution (auto bins).

@kostasrim
Copy link
Contributor

pre-approved so I don't block you, if you think some of my comments are applicable let me know :)

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Copy link
Contributor

@dranikpg dranikpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 like for using numpy

@romange romange merged commit c6f8f38 into main Oct 10, 2023
10 checks passed
@romange romange deleted the BallsBins branch October 10, 2023 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants