New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-5995] Add hot key to Python Synthetic Sources and use it in Load Tests #8664
Conversation
1e3b437
to
a7a8c49
Compare
SUCCESS --none-- |
ec0aee2
to
d18e77f
Compare
Run seed job |
Run Load Tests Python GBK reiterate Dataflow Batch |
R: @pabloem would you find time to take a look at it? |
@@ -238,13 +241,25 @@ def get_range_tracker(self, start_position, stop_position): | |||
tracker = range_trackers.UnsplittableRangeTracker(tracker) | |||
return tracker | |||
|
|||
def _gen_kv_pair(self, index): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pabloem In Java SDK there is position (somewhere else called offset
). I assumed I can use index here in the same way, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kasia. I just have a question / suggestion, but looks generally fine.
@@ -238,13 +241,25 @@ def get_range_tracker(self, start_position, stop_position): | |||
tracker = range_trackers.UnsplittableRangeTracker(tracker) | |||
return tracker | |||
|
|||
def _gen_kv_pair(self, index): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that sounds good to me.
# Generate hot key. | ||
# An integer is randomly selected from the range [0, numHotKeys-1] | ||
# with equal probability. | ||
r_hot = np.random.RandomState(self._num_hot_keys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this generate the key from the same seed always? So it's always a single key? Maybe using something like index % num_hot_keys
? Or something like that..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Or maybe just index
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use index
, then we will have many different seeds, so we'll jave many different 'hot keys' instead of restricting to a total of num_hot_keys
- I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That totally makes sense, thanks a lot! And it also answers my second comment here, I guess :) I'll change it this way.
# An integer is randomly selected from the range [0, numHotKeys-1] | ||
# with equal probability. | ||
r_hot = np.random.RandomState(self._num_hot_keys) | ||
return r_hot.bytes(self._key_size), r.bytes(self._value_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok,, I think this is ready to merge then. Thanks Kasia! |
Added two new parameters to Synthetic Sources in Python: hotkey number and hotkey fraction. They are used to produce hot keys in key-value production.
Added Python GBK load tests in Jenkins to use hot key in practice.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.