-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-7969] Fix doublecount on GRPC PCollections in streaming jobs. #9494
Conversation
This comment has been minimized.
This comment has been minimized.
97f10aa
to
8f1938f
Compare
8f1938f
to
a95a3b7
Compare
a95a3b7
to
fbd4853
Compare
List<MonitoringInfo> monitoringInfosCopy = new ArrayList<>(monitoringInfos); | ||
|
||
List<MonitoringInfo> misToFilter = | ||
bundleProcessOperation.findIOPCollectionMonitoringInfos(monitoringInfos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
findIOPCollectionMonitoringInfos
Could we use a better name here? Not sure what this is referring to. I assume IO referrs to sources/sinks, but I don't think this is the case
List<MonitoringInfo> monitoringInfosCopy = new ArrayList<>(monitoringInfos); | ||
|
||
List<MonitoringInfo> misToFilter = | ||
bundleProcessOperation.findIOPCollectionMonitoringInfos(monitoringInfos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we need to collect counters for these ones at all? Our UI doesn't display the grpc steps, they are an implementation detail.
The alternative design I was thinking of here was to try and transform each monitoring info, and drop the ones that do not have a step in the original graph. Though, maybe there is no way to detect this? So they could be dropped int the transformer, as there is no need to send them to DFE.
If I understand correctly, you are just avoiding a double count now? Though I think we don't need to report these at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synced offline. I'll try to elaborate more on this.
When we modify graph for cross-boundary grpc operations, we give two different PCollections same metadata. As a result we report metrics correctly from SDK harness. Both of these PCollections generate metrics that show as ElementCount metric on UI.
At this point, the shortest way to fix the issue is to utilize deduping on runner.
…n streaming jobs. (apache#9494) For PTransform GRPC IOs that cross SDK Harness border we create two PCollections: one lives outside SDK harness, one inside. As a result counters for these PCollections are counted inside Java DF worker as well as in SDK Harness. This causes us to double count them in Streaming jobs. This worked fine in Batch jobs since DFE will do deduplication work for us.
For PTransform IOs that cross SDK Harness border we create two PCollections: one lives outside SDK harness, one inside. As a result counters for these PCollections are counted inside Java DF worker as well as in SDK Harness. This causes us to double count them in Streaming jobs. This worked fine in Batch jobs since DFE will do deduplication work for us.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.