[BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field #13533

y1chi · 2020-12-11T22:29:45Z

Please add a meaningful description for your change here

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang	Dataflow	Samza	Twister2
Go	---	---	---
Java
Python		---	---
XLang		---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website	Whitespace	Typescript
Non-portable
Portable	---		---	---	---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

transform id field.

y1chi · 2020-12-15T00:37:29Z

R: @boyuanzz

boyuanzz

Currently you have recorded the threadId-transformId mapping on

startBundle
finishBundle
processElementForWindowObservingParDo
processElementForParDo

There are some other places we also invoke user code:

processElementForWindowObservingPairWithRestriction
processElementForPairWithRestriction
processElementForWindowObservingSplitRestriction
processElementForSplitRestriction
processElementForWindowObservingTruncateRestriction
processElementForTruncateRestriction
processElementForWindowObservingSizedElementAndRestriction
tearDown
invoke timer callback

boyuanzz · 2020-12-15T19:24:07Z

.../java/harness/src/main/java/org/apache/beam/fn/harness/TransformProcessingThreadTracker.java

+ * TransformProcessingThreadTracker tracks the thread ids for the transforms that are being
+ * processed in the SDK harness.
+ */
+public class TransformProcessingThreadTracker {


Sorry that I'm not familiar with how logging service works, I'm wondering whether this will have multi-threading concurrency issue.

I think there might be a very slight chance the processing thread moved onto another transform when the LogHandler haven't done transforming the log entries in previous one. But I think the it should be very rare(log transform should be almost instant) and I would argue that it's probably better to keep the logging just best effort instead of introducing locks to guarantee 100% metadata correctness?

The argument for it's probably better to keep the logging just best effort is that it's ok to have step name mismatched with log message itself. Do you think it's acceptable when it happens?

I'm not sure if it is ever gonna happen, we get to see if the integration test is flaky(in my 30+ IT test runs mismatch never happens). If the occurrence is less than 0.01% I don't think it'll have actual impact on usability. Current empty step in sdk logs have no values to users and can be considered almost 100% mismatch, so I think this PR should be at least an improvement to that.

I agree that test signals can give us more confidence.

Current empty step in sdk logs have no values to users and can be considered almost 100% mismatch, so I think this PR should be at least an improvement to that.

I would say, We provide some information but they can be wrong is not better than We don't provide more information.

I agree. I don't think the concurrency risk is ever gonna be an issue, we can tell from enabling the test and track the history(I also manually tested another 50 times and tests all passed with matching step). I believe we provide some information and they can be wrong in very rare cases would still be more valuable than don't provide the information and probably won't cause too much trouble for users, the element count in streaming pipeline falls into this best effort category as well.

y1chi · 2020-12-15T20:06:36Z

Currently you have recorded the threadId-transformId mapping on

startBundle

finishBundle

processElementForWindowObservingParDo

processElementForParDo

There are some other places we also invoke user code:

processElementForWindowObservingPairWithRestriction

processElementForPairWithRestriction

processElementForWindowObservingSplitRestriction

processElementForSplitRestriction

processElementForWindowObservingTruncateRestriction

processElementForTruncateRestriction

processElementForWindowObservingSizedElementAndRestriction

tearDown

invoke timer callback

Do you think the step info will be useful for the logs in SDF related callbacks? I haven't added for them since I'm not familiar how user would use logging in SDFs but sure we can add tracking for all of them if necessary.

boyuanzz · 2020-12-15T20:14:04Z

Currently you have recorded the threadId-transformId mapping on

startBundle

finishBundle

processElementForWindowObservingParDo

processElementForParDo

There are some other places we also invoke user code:

processElementForWindowObservingPairWithRestriction

processElementForPairWithRestriction

processElementForWindowObservingSplitRestriction

processElementForSplitRestriction

processElementForWindowObservingTruncateRestriction

processElementForTruncateRestriction

processElementForWindowObservingSizedElementAndRestriction

tearDown

invoke timer callback

Do you think the step info will be useful for the logs in SDF related callbacks? I haven't added for them since I'm not familiar how user would use logging in SDFs but sure we can add tracking for all of them if necessary.

SDF indeed is a DoFn, not a callback. The SDF author could add additional logging just like in a normal DoFn. If the purpose of this PR is to add step info for the log that users add in their code, then I think we should consider all these places where we will invoke user code.

y1chi · 2020-12-15T21:05:03Z

Currently you have recorded the threadId-transformId mapping on

startBundle

finishBundle

processElementForWindowObservingParDo

processElementForParDo

There are some other places we also invoke user code:

processElementForWindowObservingPairWithRestriction

processElementForPairWithRestriction

processElementForWindowObservingSplitRestriction

processElementForSplitRestriction

processElementForWindowObservingTruncateRestriction

processElementForTruncateRestriction

processElementForWindowObservingSizedElementAndRestriction

tearDown

invoke timer callback

Do you think the step info will be useful for the logs in SDF related callbacks? I haven't added for them since I'm not familiar how user would use logging in SDFs but sure we can add tracking for all of them if necessary.

SDF indeed is a DoFn, not a callback. The SDF author could add additional logging just like in a normal DoFn. If the purpose of this PR is to add step info for the log that users add in their code, then I think we should consider all these places where we will invoke user code.

got it, I'll add it.

boyuanzz · 2020-12-15T22:54:27Z

Run Java PreCommit

boyuanzz · 2020-12-16T00:15:10Z

.../java/harness/src/main/java/org/apache/beam/fn/harness/TransformProcessingThreadTracker.java

+public class TransformProcessingThreadTracker {
+  private static final TransformProcessingThreadTracker INSTANCE =
+      new TransformProcessingThreadTracker();
+  private final ConcurrentHashMap<Long, String> threadIdToTransformIdMappings;


Another question is that will this map grow unlimitedly? I'm kind of concerning that it consumes too much memory with a long run instance(and the thread is not reused).

hmm yeah I think you are right, it's potentially an issue and I've changed to use a LoadingCache with expiration.

…hread tracker

.../java/harness/src/main/java/org/apache/beam/fn/harness/TransformProcessingThreadTracker.java

boyuanzz · 2020-12-16T02:30:15Z

Run Java PreCommit

boyuanzz

Thanks! I'll merge this PR when all tests pass.

boyuanzz · 2020-12-16T19:31:01Z

Task :sdks:java:harness:checkstyleMain is failing.

probot-autolabeler bot added the java label Dec 11, 2020

y1chi force-pushed the test_logging branch from c77984f to 61bb8a8 Compare December 11, 2020 22:30

Track transform processing thread in Java SDK harness and set log entry

554d254

transform id field.

y1chi force-pushed the test_logging branch from 61bb8a8 to 554d254 Compare December 14, 2020 23:34

y1chi changed the title ~~Set ptransform id for log entries~~ Track transform processing thread in Java SDK harness and set log entry field Dec 14, 2020

y1chi marked this pull request as ready for review December 14, 2020 23:35

y1chi force-pushed the test_logging branch from 8b7afea to 8ed7fcd Compare December 15, 2020 00:41

Add javadoc

a6ba301

y1chi force-pushed the test_logging branch from 8ed7fcd to a6ba301 Compare December 15, 2020 00:46

y1chi changed the title ~~Track transform processing thread in Java SDK harness and set log entry field~~ [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field Dec 15, 2020

boyuanzz reviewed Dec 15, 2020

View reviewed changes

Add tracking to other ProcessElement methods

23b2007

boyuanzz reviewed Dec 16, 2020

View reviewed changes

Use LoadingCache instead of ConcurrentHashMap to limit the size for t…

945e794

…hread tracker

y1chi force-pushed the test_logging branch from 5118755 to 945e794 Compare December 16, 2020 01:57

boyuanzz reviewed Dec 16, 2020

View reviewed changes

.../java/harness/src/main/java/org/apache/beam/fn/harness/TransformProcessingThreadTracker.java Outdated Show resolved Hide resolved

boyuanzz approved these changes Dec 16, 2020

View reviewed changes

Address comment

857d41f

y1chi force-pushed the test_logging branch from d0820fe to 857d41f Compare December 16, 2020 18:41

Fix checkstyle

c8d9d1d

boyuanzz merged commit 6869dfa into apache:master Dec 16, 2020

y1chi mentioned this pull request Jan 7, 2021

Revert "[BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field" #13696

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field #13533

[BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field #13533

y1chi commented Dec 11, 2020

y1chi commented Dec 15, 2020

boyuanzz left a comment •

edited

boyuanzz Dec 15, 2020

y1chi Dec 15, 2020 •

edited

boyuanzz Dec 15, 2020

y1chi Dec 15, 2020

boyuanzz Dec 15, 2020

y1chi Dec 15, 2020

y1chi commented Dec 15, 2020

boyuanzz commented Dec 15, 2020 •

edited

y1chi commented Dec 15, 2020

boyuanzz commented Dec 15, 2020

boyuanzz Dec 16, 2020

y1chi Dec 16, 2020

boyuanzz commented Dec 16, 2020

boyuanzz left a comment

boyuanzz commented Dec 16, 2020

[BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field #13533

[BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field #13533

Conversation

y1chi commented Dec 11, 2020

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

GitHub Actions Tests Status (on master branch)

y1chi commented Dec 15, 2020

boyuanzz left a comment • edited

Choose a reason for hiding this comment

boyuanzz Dec 15, 2020

Choose a reason for hiding this comment

y1chi Dec 15, 2020 • edited

Choose a reason for hiding this comment

boyuanzz Dec 15, 2020

Choose a reason for hiding this comment

y1chi Dec 15, 2020

Choose a reason for hiding this comment

boyuanzz Dec 15, 2020

Choose a reason for hiding this comment

y1chi Dec 15, 2020

Choose a reason for hiding this comment

y1chi commented Dec 15, 2020

boyuanzz commented Dec 15, 2020 • edited

y1chi commented Dec 15, 2020

boyuanzz commented Dec 15, 2020

boyuanzz Dec 16, 2020

Choose a reason for hiding this comment

y1chi Dec 16, 2020

Choose a reason for hiding this comment

boyuanzz commented Dec 16, 2020

boyuanzz left a comment

Choose a reason for hiding this comment

boyuanzz commented Dec 16, 2020

boyuanzz left a comment •

edited

y1chi Dec 15, 2020 •

edited

boyuanzz commented Dec 15, 2020 •

edited