[BEAM-6374] Emit PCollection metrics from GoSDK #10942

lostluck · 2020-02-23T04:55:57Z

This adds PCollection metrics to the Go SDK, in particular, Element Count, and Sampled Size.

New exec.PCollection nodes are added between every processing node in the bundle execution graph.

The new metrics are only added as MonitoringInfos, not the legacy protos.
There's about ~10ns added per element per PCollection node due to the atomic additions for every element.
Elements for sizes are selected randomly, then encoded to count their bytes (w/o window headers).
- An initial index is selected form the first [0,1,2] at bundle start up, and then pre-select the next index from somewhere later on, proportional to the bundle so far.
- As currently set up, it will take around 200-300 samples for the first 1M elements, so encoded overhead is limited
PCollections from a DataSource do 100% "sampling", since they're reading the bytes directly anyway. The PCollection node that would have been added after the DataSource is elided from the graph during construction, but re-used to avoid duplicating the logic for concurrently manipulating the size distribution.
- DataSources can properly handle CoGBKs as well, counting non-header bytes for iterables, and state backed iterables.
- This still involves a mutex Lock for every update, so we may want to find a lighter weight mechanism to handle the distribution samples from DataSources, or simply opt for the same random sampling.
- A similar method could be used for DataSinks as well, but not handled in this PR.
- It's important to note that the runner is already aware of the number of bytes sent and received from the SDK side, so we may opt to remove that this entirely.
Counts and Samples are yet not made for SideInputs, which would better account for data consumed by DoFns.

Thank you @ajamato for reminding me of the pre-select method for sampling, and @lukecwik for pointing out the DataSource can avoid separate additional encoding costs when measuring elements.

Performance impact:
I have two jobs I use for benchmarking this: Pipeline A uses int64s as elements and does simple passthroughs and sums, and Pipeline B where it's using large protocol buffers as elements, which spends a fair amount of CPU time decoding them.

For small "fast" elements, the overhead is about ~19.5% of the Go side processing (which makes sense if elements are just being passed around or incremented).
For large "heavy" elements, the overhead is about ~0.125% of the Go side of processing.

Specifically, this is only taking into account the Go SDK worker, and not any runner side costs. This feels acceptable for the time being, though it's possible we can improve this later, especially for "lighter" jobs.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang	SDK	Apex	Dataflow	Gearpump	Samza
Go		---	---	---	---
Java
Python		---		---	---
XLang	---	---	---	---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website
Non-portable
Portable	---		---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

lostluck · 2020-02-23T04:56:38Z

R: @youngoli
cc: @ajamato @lukecwik

youngoli

Looks good. I have a clarification question and a comment nit, but nothing worth blocking approval over.

sdks/go/pkg/beam/core/runtime/exec/plan.go

sdks/go/pkg/beam/core/runtime/exec/datasource_test.go

This reverts commit ded686a.

…1061) This reverts commit ded686a.

@ajamato

Restoring #10942 to narrow down where the post submits failed previously. Expands should also not have a PCollection node after them,since the CoGBK coder is handled by the Datasource. ------- This adds PCollection metrics to the Go SDK, in particular, Element Count, and Sampled Size. New exec.PCollection nodes are added between every processing node in the bundle execution graph. * The new metrics are only added as MonitoringInfos, not the legacy protos. * There's about ~10ns added per element per PCollection node due to the atomic additions for every element. * Elements for sizes are selected randomly, then encoded to count their bytes (w/o window headers). * The first three elements are always re-encoded and collected, and then a random pre-select method picks the next index from somewhere later on, proportional to the bundle so far. * As currently set up, it will take around 200-300 samples for the first 1M elements, so encoded overhead is limited * PCollections from a DataSource do 100% "sampling", since they're reading the bytes directly anyway. The PCollection node that would have been added after the DataSource is elided from the graph during construction, but re-used to avoid duplicating the logic for concurrently manipulating the size distribution. * DataSources can properly handle CoGBKs as well, counting non-header bytes for iterables, and state backed iterables. * This still involves a mutex Lock for every update, so we may want to find a lighter weight mechanism to handle the distribution samples from DataSources, or simply opt for the same random sampling. * A similar method could be used for DataSinks as well, but not handled in this PR. * It's important to note that the runner is already aware of the number of bytes sent and received from the SDK side, so we may opt to remove that entirely. * Counts and Samples are yet not made for SideInputs, which would better account for data consumed by DoFns. Thank you @ajamato for reminding me of the pre-select method for sampling, and @lukecwik for pointing out the DataSource can avoid separate additional encoding costs when measuring elements. Performance impact: I have two jobs I use for benchmarking this: Pipeline A uses int64s as elements and does simple passthroughs and sums, and Pipeline B where it's using large protocol buffers as elements, which spends a fair amount of CPU time decoding them. For small "fast" elements, the overhead is about ~19.5% of the Go side processing (which makes sense if elements are just being passed around or incremented). For large "heavy" elements, the overhead is about ~0.125% of the Go side of processing. Specifically, this is only taking into account the Go SDK worker, and not any runner side costs. This feels acceptable for the time being, though it's possible we can improve this later, especially for "lighter" jobs.

@ajamato

Restoring apache#10942 to narrow down where the post submits failed previously. Expands should also not have a PCollection node after them,since the CoGBK coder is handled by the Datasource. ------- This adds PCollection metrics to the Go SDK, in particular, Element Count, and Sampled Size. New exec.PCollection nodes are added between every processing node in the bundle execution graph. * The new metrics are only added as MonitoringInfos, not the legacy protos. * There's about ~10ns added per element per PCollection node due to the atomic additions for every element. * Elements for sizes are selected randomly, then encoded to count their bytes (w/o window headers). * The first three elements are always re-encoded and collected, and then a random pre-select method picks the next index from somewhere later on, proportional to the bundle so far. * As currently set up, it will take around 200-300 samples for the first 1M elements, so encoded overhead is limited * PCollections from a DataSource do 100% "sampling", since they're reading the bytes directly anyway. The PCollection node that would have been added after the DataSource is elided from the graph during construction, but re-used to avoid duplicating the logic for concurrently manipulating the size distribution. * DataSources can properly handle CoGBKs as well, counting non-header bytes for iterables, and state backed iterables. * This still involves a mutex Lock for every update, so we may want to find a lighter weight mechanism to handle the distribution samples from DataSources, or simply opt for the same random sampling. * A similar method could be used for DataSinks as well, but not handled in this PR. * It's important to note that the runner is already aware of the number of bytes sent and received from the SDK side, so we may opt to remove that entirely. * Counts and Samples are yet not made for SideInputs, which would better account for data consumed by DoFns. Thank you @ajamato for reminding me of the pre-select method for sampling, and @lukecwik for pointing out the DataSource can avoid separate additional encoding costs when measuring elements. Performance impact: I have two jobs I use for benchmarking this: Pipeline A uses int64s as elements and does simple passthroughs and sums, and Pipeline B where it's using large protocol buffers as elements, which spends a fair amount of CPU time decoding them. For small "fast" elements, the overhead is about ~19.5% of the Go side processing (which makes sense if elements are just being passed around or incremented). For large "heavy" elements, the overhead is about ~0.125% of the Go side of processing. Specifically, this is only taking into account the Go SDK worker, and not any runner side costs. This feels acceptable for the time being, though it's possible we can improve this later, especially for "lighter" jobs.

lostluck added 2 commits February 22, 2020 19:30

[GoSDK] Move user metrics store out of exec.Plan

e794634

[BEAM-6374] Emit PCollection metrics from GoSDK

52f4d7a

probot-autolabeler bot added direct go runners labels Feb 23, 2020

youngoli approved these changes Feb 25, 2020

View reviewed changes

sdks/go/pkg/beam/core/runtime/exec/plan.go Outdated Show resolved Hide resolved

sdks/go/pkg/beam/core/runtime/exec/datasource_test.go Show resolved Hide resolved

lostluck added 2 commits March 4, 2020 11:13

Clarify what "small ints" means

42888e0

Clarity #Progress() returned bool

0f2f9e1

lostluck merged commit ded686a into apache:master Mar 4, 2020

lostluck added a commit that referenced this pull request Mar 6, 2020

Revert "[BEAM-6374] Emit PCollection metrics from GoSDK (#10942)"

1bcaaaa

This reverts commit ded686a.

lostluck mentioned this pull request Mar 6, 2020

[BEAM-9459] Revert "[BEAM-6374] Emit PCollection metrics from GoSDK" #11061

Merged

lostluck added a commit that referenced this pull request Mar 6, 2020

Revert "[BEAM-6374] Emit PCollection metrics from GoSDK (#10942)" (#1…

7097850

…1061) This reverts commit ded686a.

lostluck mentioned this pull request Aug 5, 2021

[BEAM-6374] Emit PCollection metrics from GoSDK #15289

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-6374] Emit PCollection metrics from GoSDK #10942

[BEAM-6374] Emit PCollection metrics from GoSDK #10942

lostluck commented Feb 23, 2020 •

edited

Loading

lostluck commented Feb 23, 2020

youngoli left a comment

[BEAM-6374] Emit PCollection metrics from GoSDK #10942

[BEAM-6374] Emit PCollection metrics from GoSDK #10942

Conversation

lostluck commented Feb 23, 2020 • edited Loading

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

lostluck commented Feb 23, 2020

youngoli left a comment

Choose a reason for hiding this comment

lostluck commented Feb 23, 2020 •

edited

Loading