[BEAM-7550] Missing pipeline parameters in ParDo Load Test #8847

kamilwu · 2019-06-13T12:22:26Z

Without some pipeline parameters in ParDo Load Test in Python, it is impossible to create all required test cases (see proposal: https://s.apache.org/load-test-basic-operations).

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang	Apex	Dataflow	Gearpump	Samza
Go	---	---	---	---
Java
Python	---		---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website
Non-portable
Portable	---		---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

kamilwu · 2019-06-13T12:23:10Z

R: @kkucharc

kkucharc

Thank you Kamil for this PR, I added my comments

kkucharc · 2019-06-24T14:48:57Z

sdks/python/apache_beam/testing/load_tests/pardo_test.py

+    """Prevents metrics from namespace other than specified in pipeline
+    options from being published."""
+    if self.metrics_monitor is not None:
+      self.metrics_monitor.filters = MetricsFilter().with_namespace(


I am a little a bit afraid of using this mainly because of this PR. We aren't sure if we won't cut off some useful metrics with this.

An idea that comes to my mind is to create global variable METRICS_NAMESPACES as it is in Java SDK. Then specify namespaces to be saved here and use with_namespaces. Also it will require to change variable saved as pipeline option. WDYT?

I moved the _apply_filters() method to the base class. It takes a list of namespaces as a parameter. The author of a derived class will decide whether to call it or not.

kkucharc · 2019-06-24T14:58:34Z

sdks/python/apache_beam/testing/load_tests/pardo_test.py

+
+  def _get_option_or_default(self, opt_name, default=1):
+    option = self.pipeline.get_option(opt_name)
+    return int(option) if option is not None else default


It might be useful to have this in load_test_metrics_utils, WDYT? There may be also exception thrown in casting to int.

kkucharc · 2019-06-25T08:43:47Z

sdks/python/apache_beam/testing/load_tests/pardo_test.py

+        self.nb_of_operations = nb_of_operations
+        self.counters = []
+        for i in range(nb_of_counters):
+          self.counters.append(Metrics.counter('do-not-publish-me',


Maybe do-not-publish sounds better?

kkucharc · 2019-06-25T09:45:51Z

sdks/python/apache_beam/testing/load_tests/pardo_test.py

+    super(ParDoTest, self).setUp()
+    self._apply_filter()
+    self.iterations = self._get_option_or_default('iterations')
+    self.nb_of_counter = self._get_option_or_default('number_of_counters')


I would go with full name of number.

nb is a quite popular abbreviation of number. What's more, shorter variable names are better, so I'd rather keep nb in this case.

It doesn't seem to be popular abbreviation across Beam (sometimes new_block was called nb as well). But ok until it's just local variable for this test.

I agree with Kasia - number_of_counters is easier to digest. As a non python dev and a person that didn't know this code, I had to think for a little while what does nb mean. numbersays it explicitly and costs nothing

kkucharc · 2019-06-25T09:56:59Z

sdks/python/apache_beam/testing/load_tests/pardo_test.py

-      num_runs = 1
-    else:
-      num_runs = int(self.iterations)
+    class CounterOperation(beam.DoFn):


I think it would be good to have CountBytes metrics in load_test_metrics_utils. WDYT about making this class parametrised also by name and namespace so it can be reused in future?

I don't think this class can be reusable. Take a note that we don't count bytes here, we just increment counters by 1 in order to simulate stressful conditions (pipeline with a lot of metric counters)

kkucharc · 2019-06-25T11:18:24Z

sdks/python/apache_beam/testing/load_tests/pardo_test.py

+      self.metrics_monitor.filters = MetricsFilter().with_namespace(
+          self.metrics_namespace)
+
+  def _get_option_or_default(self, opt_name, default=1):


Should the 1 actually a default value? Maybe it would be better to have 0 and then omit loops as default?

All options will be overridden in a Jenkins job anyway. But I think it makes sense to skip loops as default.

kamilwu · 2019-06-25T13:23:48Z

Thanks @kkucharc for review! Fixes are ready

kkucharc

LGTM, @lgajowy can you double check if it's similar to Java test case?

lgajowy

Left some comments. Thanks!

Side note: could you squash commits and provide a more descriptive name and description, eg:

Title: [BEAM-7550] Reimplement Python ParDo load test according to the proposal`
Description: The proposal can be found here: https://s.apache.org/load-test-basic-operations

lgajowy · 2019-07-03T13:55:29Z

sdks/python/apache_beam/testing/load_tests/pardo_test.py

 This is ParDo load test with Synthetic Source. Besides of the standard
 input options there are additional options:
-* number_of_counter_operations - number of pardo operations
+* iterations - number of ParDo operations


I think this could be more descriptive, eg. Number of subsequent ParDo operations to be performed

lgajowy · 2019-07-03T13:56:21Z

sdks/python/apache_beam/testing/load_tests/pardo_test.py

 input options there are additional options:
-* number_of_counter_operations - number of pardo operations
+* iterations - number of ParDo operations
+* number_of_counters - number of counter metrics


Same here. I suggest: Number of counter metrics to be created for one ParDo operation

lgajowy · 2019-07-03T13:56:43Z

sdks/python/apache_beam/testing/load_tests/pardo_test.py

-* number_of_counter_operations - number of pardo operations
+* iterations - number of ParDo operations
+* number_of_counters - number of counter metrics
+* number_of_counter_operations - number of times all counters are incremented


Number of operations on counters to be performed in one ParDo - wdyt?

Sounds good to me.

lgajowy · 2019-07-03T14:00:15Z

sdks/python/apache_beam/testing/load_tests/pardo_test.py

+    super(ParDoTest, self).setUp()
+    self._apply_filter()
+    self.iterations = self._get_option_or_default('iterations')
+    self.nb_of_counter = self._get_option_or_default('number_of_counters')


I agree with Kasia - number_of_counters is easier to digest. As a non python dev and a person that didn't know this code, I had to think for a little while what does nb mean. numbersays it explicitly and costs nothing

lgajowy · 2019-07-03T14:01:07Z

sdks/python/apache_beam/testing/load_tests/pardo_test.py

-      num_runs = 1
-    else:
-      num_runs = int(self.iterations)
+    class CounterOperation(beam.DoFn):


Overall algorithm looks good to me

kamilwu · 2019-07-04T07:40:08Z

Thanks @lgajowy. I squahed commits as you suggested.
Moreover, I extracted get_option_or_default method to the LoadTest class, because I'm going to reuse it in other load tests.

The proposal can be found here: https://s.apache.org/load-test-basic-operations

kamilwu · 2019-07-04T09:43:23Z

Run Python PreCommit

lgajowy

lgtm

kkucharc requested changes Jun 25, 2019

View reviewed changes

kkucharc reviewed Jun 25, 2019

View reviewed changes

kkucharc approved these changes Jul 2, 2019

View reviewed changes

lgajowy requested changes Jul 3, 2019

View reviewed changes

kamilwu force-pushed the python-pardo-refactor branch from 4449767 to 245d59f Compare July 4, 2019 07:40

[BEAM-7550] Reimplement Python ParDo load test according to the proposal

92eab27

The proposal can be found here: https://s.apache.org/load-test-basic-operations

kamilwu force-pushed the python-pardo-refactor branch from 245d59f to 92eab27 Compare July 4, 2019 08:35

lgajowy approved these changes Jul 4, 2019

View reviewed changes

lgajowy merged commit 2d5e493 into apache:master Jul 4, 2019

kamilwu deleted the python-pardo-refactor branch July 4, 2019 11:45

[BEAM-7550] Missing pipeline parameters in ParDo Load Test #8847

[BEAM-7550] Missing pipeline parameters in ParDo Load Test #8847

Uh oh!

Conversation

kamilwu commented Jun 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

Uh oh!

kamilwu commented Jun 13, 2019

Uh oh!

kkucharc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kkucharc Jun 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kamilwu commented Jun 25, 2019

Uh oh!

kkucharc left a comment

Choose a reason for hiding this comment

Uh oh!

lgajowy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kamilwu commented Jul 4, 2019

Uh oh!

kamilwu commented Jul 4, 2019

Uh oh!

lgajowy left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kamilwu commented Jun 13, 2019 •

edited

Loading

kkucharc Jun 25, 2019 •

edited

Loading