-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-7550] Missing pipeline parameters in ParDo Load Test #8847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
R: @kkucharc |
kkucharc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Kamil for this PR, I added my comments
| """Prevents metrics from namespace other than specified in pipeline | ||
| options from being published.""" | ||
| if self.metrics_monitor is not None: | ||
| self.metrics_monitor.filters = MetricsFilter().with_namespace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little a bit afraid of using this mainly because of this PR. We aren't sure if we won't cut off some useful metrics with this.
An idea that comes to my mind is to create global variable METRICS_NAMESPACES as it is in Java SDK. Then specify namespaces to be saved here and use with_namespaces. Also it will require to change variable saved as pipeline option. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the _apply_filters() method to the base class. It takes a list of namespaces as a parameter. The author of a derived class will decide whether to call it or not.
|
|
||
| def _get_option_or_default(self, opt_name, default=1): | ||
| option = self.pipeline.get_option(opt_name) | ||
| return int(option) if option is not None else default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful to have this in load_test_metrics_utils, WDYT? There may be also exception thrown in casting to int.
| self.nb_of_operations = nb_of_operations | ||
| self.counters = [] | ||
| for i in range(nb_of_counters): | ||
| self.counters.append(Metrics.counter('do-not-publish-me', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe do-not-publish sounds better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright.
| super(ParDoTest, self).setUp() | ||
| self._apply_filter() | ||
| self.iterations = self._get_option_or_default('iterations') | ||
| self.nb_of_counter = self._get_option_or_default('number_of_counters') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would go with full name of number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nb is a quite popular abbreviation of number. What's more, shorter variable names are better, so I'd rather keep nb in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem to be popular abbreviation across Beam (sometimes new_block was called nb as well). But ok until it's just local variable for this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Kasia - number_of_counters is easier to digest. As a non python dev and a person that didn't know this code, I had to think for a little while what does nb mean. numbersays it explicitly and costs nothing
| num_runs = 1 | ||
| else: | ||
| num_runs = int(self.iterations) | ||
| class CounterOperation(beam.DoFn): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to have CountBytes metrics in load_test_metrics_utils. WDYT about making this class parametrised also by name and namespace so it can be reused in future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this class can be reusable. Take a note that we don't count bytes here, we just increment counters by 1 in order to simulate stressful conditions (pipeline with a lot of metric counters)
| self.metrics_monitor.filters = MetricsFilter().with_namespace( | ||
| self.metrics_namespace) | ||
|
|
||
| def _get_option_or_default(self, opt_name, default=1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the 1 actually a default value? Maybe it would be better to have 0 and then omit loops as default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All options will be overridden in a Jenkins job anyway. But I think it makes sense to skip loops as default.
|
Thanks @kkucharc for review! Fixes are ready |
kkucharc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @lgajowy can you double check if it's similar to Java test case?
lgajowy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. Thanks!
Side note: could you squash commits and provide a more descriptive name and description, eg:
Title: [BEAM-7550] Reimplement Python ParDo load test according to the proposal`
Description: The proposal can be found here: https://s.apache.org/load-test-basic-operations
| This is ParDo load test with Synthetic Source. Besides of the standard | ||
| input options there are additional options: | ||
| * number_of_counter_operations - number of pardo operations | ||
| * iterations - number of ParDo operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be more descriptive, eg. Number of subsequent ParDo operations to be performed
| input options there are additional options: | ||
| * number_of_counter_operations - number of pardo operations | ||
| * iterations - number of ParDo operations | ||
| * number_of_counters - number of counter metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. I suggest: Number of counter metrics to be created for one ParDo operation
| * number_of_counter_operations - number of pardo operations | ||
| * iterations - number of ParDo operations | ||
| * number_of_counters - number of counter metrics | ||
| * number_of_counter_operations - number of times all counters are incremented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Number of operations on counters to be performed in one ParDo - wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
| super(ParDoTest, self).setUp() | ||
| self._apply_filter() | ||
| self.iterations = self._get_option_or_default('iterations') | ||
| self.nb_of_counter = self._get_option_or_default('number_of_counters') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Kasia - number_of_counters is easier to digest. As a non python dev and a person that didn't know this code, I had to think for a little while what does nb mean. numbersays it explicitly and costs nothing
| num_runs = 1 | ||
| else: | ||
| num_runs = int(self.iterations) | ||
| class CounterOperation(beam.DoFn): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall algorithm looks good to me
|
Thanks @lgajowy. I squahed commits as you suggested. |
4449767 to
245d59f
Compare
The proposal can be found here: https://s.apache.org/load-test-basic-operations
245d59f to
92eab27
Compare
|
Run Python PreCommit |
lgajowy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Without some pipeline parameters in ParDo Load Test in Python, it is impossible to create all required test cases (see proposal: https://s.apache.org/load-test-basic-operations).
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.