[BEAM-8258] basic metric feature for nexmark#12674
[BEAM-8258] basic metric feature for nexmark#12674pabloem merged 4 commits intoapache:masterfrom leiyiz:nexmark_metric_py
Conversation
Codecov Report
@@ Coverage Diff @@
## master #12674 +/- ##
==========================================
- Coverage 34.48% 34.29% -0.19%
==========================================
Files 685 696 +11
Lines 81519 82307 +788
Branches 9185 9300 +115
==========================================
+ Hits 28109 28228 +119
- Misses 52987 53656 +669
Partials 423 423
Continue to review full report at Codecov.
|
sdks/python/apache_beam/testing/benchmarks/nexmark/nexmark_util.py
Outdated
Show resolved
Hide resolved
sdks/python/apache_beam/testing/benchmarks/nexmark/nexmark_util.py
Outdated
Show resolved
Hide resolved
|
Run Portable_Python PreCommit |
pabloem
left a comment
There was a problem hiding this comment.
LGTM. Just one comment about the monitoring DoFn. LMK what you thinka bout that
| def process(self, element): | ||
| self.element_count.inc() | ||
| self.event_time.update(int(time() * 1000)) | ||
| yield element |
There was a problem hiding this comment.
The reason we collect the event_time metric is to know the start and end time of certain processing, right? If so, we only care about the beginning, and the end, right?
Updating metrics are a bit of a slow operation to perform (not incredibly slow, but since this DoFn does nothing else), I think it may be a good idea to perform these updates in finish_bundle and start_bundle (for event_time, update only when the bundle started and ended, and for event_count, you can keep a member variable that counts the number of elements per bundle
e.g.:
start_bundle(self):
self.element_counter = 0
self.event_time.update(now)
process(self, elm):
self.element_counter += 1
yield elm
finish_bundle(self):
self.event_time.update(now)
self.element_count.inc(self.element_counter)
wdyt?
There was a problem hiding this comment.
Yeah I think this makes sense, but I think I would need to keep updating some metric in the process method because I made a new metric for logging the timestamp of the events other than the now() metric
There was a problem hiding this comment.
I see. In that case, I think then there's not a big gain from using start_bundle and finish_bundle. I'll just approve it for now.
There was a problem hiding this comment.
It is a bit confusing between self.event_time and self.event_timestamp, I thought the timestamp was for debugging purpose that should be removed once it's not needed?
added beam-metric based performance monitoring
performance are logged to console after the query is done or canceled by nexmark suite
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.