Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-7969] Fix doublecount on GRPC PCollections in streaming jobs. #9494

Merged
merged 2 commits into from
Sep 6, 2019

Conversation

Ardagan
Copy link
Contributor

@Ardagan Ardagan commented Sep 6, 2019

For PTransform IOs that cross SDK Harness border we create two PCollections: one lives outside SDK harness, one inside. As a result counters for these PCollections are counted inside Java DF worker as well as in SDK Harness. This causes us to double count them in Streaming jobs. This worked fine in Batch jobs since DFE will do deduplication work for us.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@Ardagan

This comment has been minimized.

@Ardagan Ardagan changed the title Python FnAPI metrix fix. [BEAM-7969] Fix doublecount on GRPC PCollections in streaming jobs. Sep 6, 2019
@Ardagan
Copy link
Contributor Author

Ardagan commented Sep 6, 2019

@lukecwik @y1chi @angoenka

@Ardagan
Copy link
Contributor Author

Ardagan commented Sep 6, 2019

@ajamato

List<MonitoringInfo> monitoringInfosCopy = new ArrayList<>(monitoringInfos);

List<MonitoringInfo> misToFilter =
bundleProcessOperation.findIOPCollectionMonitoringInfos(monitoringInfos);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findIOPCollectionMonitoringInfos

Could we use a better name here? Not sure what this is referring to. I assume IO referrs to sources/sinks, but I don't think this is the case

List<MonitoringInfo> monitoringInfosCopy = new ArrayList<>(monitoringInfos);

List<MonitoringInfo> misToFilter =
bundleProcessOperation.findIOPCollectionMonitoringInfos(monitoringInfos);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we need to collect counters for these ones at all? Our UI doesn't display the grpc steps, they are an implementation detail.

The alternative design I was thinking of here was to try and transform each monitoring info, and drop the ones that do not have a step in the original graph. Though, maybe there is no way to detect this? So they could be dropped int the transformer, as there is no need to send them to DFE.

If I understand correctly, you are just avoiding a double count now? Though I think we don't need to report these at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced offline. I'll try to elaborate more on this.

When we modify graph for cross-boundary grpc operations, we give two different PCollections same metadata. As a result we report metrics correctly from SDK harness. Both of these PCollections generate metrics that show as ElementCount metric on UI.

At this point, the shortest way to fix the issue is to utilize deduping on runner.

@Ardagan Ardagan merged commit 4a6dc28 into apache:master Sep 6, 2019
@Ardagan Ardagan deleted the StreamMetricsV2 branch September 13, 2019 17:22
soyrice pushed a commit to soyrice/beam that referenced this pull request Sep 19, 2019
…n streaming jobs. (apache#9494)

For PTransform GRPC IOs that cross SDK Harness border we create two PCollections: one lives outside SDK harness, one inside. As a result counters for these PCollections are counted inside Java DF worker as well as in SDK Harness. This causes us to double count them in Streaming jobs. This worked fine in Batch jobs since DFE will do deduplication work for us.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants