-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-6693] replace mmh3 with default hash function #8799
Conversation
R: @pabloem |
label='assert:global_by_error_with_samll_population') | ||
pipeline.run() | ||
|
||
def test_approximate_unique_global_by_error_with_big_population(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you removing this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is more for estimation algorithm performance testing, rather than for testing functionality. With py3 default hash algorithm, it would fail sometimes (2%) because it is out of estimation error range, so I decided to remove it.
Thanks Hannah. This LGTM. I just left one question. Thanks! |
@@ -214,7 +212,7 @@ def create_accumulator(self, *args, **kwargs): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion (Maybe add a JIRA (does not have to be fixed in this PR) for future):
- Pass a hash_fn argument to ApproximateUniqueCombineFn, that can be passed by users to customize the hash_fn they would like to use. It can default to
hash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ticket is created: https://issues.apache.org/jira/browse/BEAM-7525
Replace mmh3 with default python hash function, because it's not reasonable to bring this inconvenience to production with only a little bit improvement with estimation for a loose estimation algorithm.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.