[BEAM-8258] basic metric feature for nexmark by leiyiz · Pull Request #12674 · apache/beam

leiyiz · 2020-08-24T19:27:07Z

added beam-metric based performance monitoring
performance are logged to console after the query is done or canceled by nexmark suite

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang	Dataflow	Samza	Twister2
Go	---	---	---
Java
Python		---	---
XLang	---	---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website	Whitespace
Non-portable
Portable	---		---	---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

leiyiz · 2020-08-24T19:27:42Z

R: @y1chi
R: @pabloem

codecov · 2020-08-24T19:57:47Z

Codecov Report

Merging #12674 into master will decrease coverage by 0.18%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #12674      +/-   ##
==========================================
- Coverage   34.48%   34.29%   -0.19%     
==========================================
  Files         685      696      +11     
  Lines       81519    82307     +788     
  Branches     9185     9300     +115     
==========================================
+ Hits        28109    28228     +119     
- Misses      52987    53656     +669     
  Partials      423      423

Impacted Files	Coverage Δ
typehints/typecheck_test_py3.py	`31.54% <0.00%> (-16.00%)`	⬇️
typehints/typecheck.py	`29.44% <0.00%> (-6.18%)`	⬇️
testing/load_tests/load_test_metrics_utils.py	`34.98% <0.00%> (-1.39%)`	⬇️
runners/worker/opcounters.py	`33.81% <0.00%> (-0.87%)`	⬇️
pipeline.py	`22.04% <0.00%> (-0.28%)`	⬇️
dataframe/transforms_test.py	`25.00% <0.00%> (-0.21%)`	⬇️
io/gcp/bigquery_test.py	`27.39% <0.00%> (-0.18%)`	⬇️
options/pipeline_options.py	`52.99% <0.00%> (-0.16%)`	⬇️
transforms/ptransform_test.py	`18.37% <0.00%> (-0.09%)`	⬇️
transforms/core.py	`36.79% <0.00%> (-0.05%)`	⬇️
... and 30 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 66055db...f7a7ca7. Read the comment docs.

sdks/python/apache_beam/testing/benchmarks/nexmark/monitor.py

sdks/python/apache_beam/testing/benchmarks/nexmark/nexmark_util.py

leiyiz · 2020-08-24T23:52:40Z

Run Portable_Python PreCommit

pabloem

LGTM. Just one comment about the monitoring DoFn. LMK what you thinka bout that

pabloem · 2020-08-25T21:23:45Z

sdks/python/apache_beam/testing/benchmarks/nexmark/monitor.py

+  def process(self, element):
+    self.element_count.inc()
+    self.event_time.update(int(time() * 1000))
+    yield element


The reason we collect the event_time metric is to know the start and end time of certain processing, right? If so, we only care about the beginning, and the end, right?
Updating metrics are a bit of a slow operation to perform (not incredibly slow, but since this DoFn does nothing else), I think it may be a good idea to perform these updates in finish_bundle and start_bundle (for event_time, update only when the bundle started and ended, and for event_count, you can keep a member variable that counts the number of elements per bundle

e.g.:

start_bundle(self): self.element_counter = 0 self.event_time.update(now) process(self, elm): self.element_counter += 1 yield elm finish_bundle(self): self.event_time.update(now) self.element_count.inc(self.element_counter)

wdyt?

Yeah I think this makes sense, but I think I would need to keep updating some metric in the process method because I made a new metric for logging the timestamp of the events other than the now() metric

I see. In that case, I think then there's not a big gain from using start_bundle and finish_bundle. I'll just approve it for now.

It is a bit confusing between self.event_time and self.event_timestamp, I thought the timestamp was for debugging purpose that should be removed once it's not needed?

@leiyiz thoughts?

y1chi

LGTM, thanks!

add metric feature to output the performance of pipeline

bb9f2aa

probot-autolabeler bot added the python label Aug 24, 2020

changes for formatting and 2.7 compatability

8c7eeb5

y1chi reviewed Aug 24, 2020

View reviewed changes

sdks/python/apache_beam/testing/benchmarks/nexmark/monitor.py Show resolved Hide resolved

sdks/python/apache_beam/testing/benchmarks/nexmark/nexmark_util.py Outdated Show resolved Hide resolved

sdks/python/apache_beam/testing/benchmarks/nexmark/nexmark_util.py Outdated Show resolved Hide resolved

code review issue resolve

0668d46

pabloem reviewed Aug 25, 2020

View reviewed changes

put wall-clock recording into start and finish of bundle

f7a7ca7

y1chi approved these changes Aug 26, 2020

View reviewed changes

pabloem merged commit f3a8b5c into apache:master Aug 26, 2020

leiyiz deleted the nexmark_metric_py branch August 27, 2020 22:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-8258] basic metric feature for nexmark#12674

[BEAM-8258] basic metric feature for nexmark#12674
pabloem merged 4 commits intoapache:masterfrom
leiyiz:nexmark_metric_py

leiyiz commented Aug 24, 2020 •

edited

Loading

Uh oh!

leiyiz commented Aug 24, 2020

Uh oh!

codecov bot commented Aug 24, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leiyiz commented Aug 24, 2020

Uh oh!

pabloem left a comment

Uh oh!

pabloem Aug 25, 2020

Uh oh!

leiyiz Aug 26, 2020

Uh oh!

pabloem Aug 26, 2020

Uh oh!

y1chi Aug 26, 2020

Uh oh!

pabloem Aug 27, 2020

Uh oh!

y1chi left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

leiyiz commented Aug 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

GitHub Actions Tests Status (on master branch)

Uh oh!

leiyiz commented Aug 24, 2020

Uh oh!

codecov bot commented Aug 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leiyiz commented Aug 24, 2020

Uh oh!

pabloem left a comment

Choose a reason for hiding this comment

Uh oh!

pabloem Aug 25, 2020

Choose a reason for hiding this comment

Uh oh!

leiyiz Aug 26, 2020

Choose a reason for hiding this comment

Uh oh!

pabloem Aug 26, 2020

Choose a reason for hiding this comment

Uh oh!

y1chi Aug 26, 2020

Choose a reason for hiding this comment

Uh oh!

pabloem Aug 27, 2020

Choose a reason for hiding this comment

Uh oh!

y1chi left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

leiyiz commented Aug 24, 2020 •

edited

Loading

codecov bot commented Aug 24, 2020 •

edited

Loading