Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field #13533

Merged
merged 6 commits into from Dec 16, 2020

Conversation

y1chi
Copy link
Contributor

@y1chi y1chi commented Dec 11, 2020

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status Build Status Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status --- --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@y1chi y1chi changed the title Set ptransform id for log entries Track transform processing thread in Java SDK harness and set log entry field Dec 14, 2020
@y1chi y1chi marked this pull request as ready for review December 14, 2020 23:35
@y1chi
Copy link
Contributor Author

y1chi commented Dec 15, 2020

R: @boyuanzz

@y1chi y1chi changed the title Track transform processing thread in Java SDK harness and set log entry field [BEAM-11474] Track transform processing thread in Java SDK harness and set log entry field Dec 15, 2020
Copy link
Contributor

@boyuanzz boyuanzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently you have recorded the threadId-transformId mapping on

  • startBundle
  • finishBundle
  • processElementForWindowObservingParDo
  • processElementForParDo

There are some other places we also invoke user code:

  • processElementForWindowObservingPairWithRestriction
  • processElementForPairWithRestriction
  • processElementForWindowObservingSplitRestriction
  • processElementForSplitRestriction
  • processElementForWindowObservingTruncateRestriction
  • processElementForTruncateRestriction
  • processElementForWindowObservingSizedElementAndRestriction
  • tearDown
  • invoke timer callback

* TransformProcessingThreadTracker tracks the thread ids for the transforms that are being
* processed in the SDK harness.
*/
public class TransformProcessingThreadTracker {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that I'm not familiar with how logging service works, I'm wondering whether this will have multi-threading concurrency issue.

Copy link
Contributor Author

@y1chi y1chi Dec 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be a very slight chance the processing thread moved onto another transform when the LogHandler haven't done transforming the log entries in previous one. But I think the it should be very rare(log transform should be almost instant) and I would argue that it's probably better to keep the logging just best effort instead of introducing locks to guarantee 100% metadata correctness?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument for it's probably better to keep the logging just best effort is that it's ok to have step name mismatched with log message itself. Do you think it's acceptable when it happens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it is ever gonna happen, we get to see if the integration test is flaky(in my 30+ IT test runs mismatch never happens). If the occurrence is less than 0.01% I don't think it'll have actual impact on usability. Current empty step in sdk logs have no values to users and can be considered almost 100% mismatch, so I think this PR should be at least an improvement to that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that test signals can give us more confidence.

Current empty step in sdk logs have no values to users and can be considered almost 100% mismatch, so I think this PR should be at least an improvement to that.

I would say, We provide some information but they can be wrong is not better than We don't provide more information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I don't think the concurrency risk is ever gonna be an issue, we can tell from enabling the test and track the history(I also manually tested another 50 times and tests all passed with matching step). I believe we provide some information and they can be wrong in very rare cases would still be more valuable than don't provide the information and probably won't cause too much trouble for users, the element count in streaming pipeline falls into this best effort category as well.

@y1chi
Copy link
Contributor Author

y1chi commented Dec 15, 2020

Currently you have recorded the threadId-transformId mapping on

  • startBundle
  • finishBundle
  • processElementForWindowObservingParDo
  • processElementForParDo

There are some other places we also invoke user code:

  • processElementForWindowObservingPairWithRestriction
  • processElementForPairWithRestriction
  • processElementForWindowObservingSplitRestriction
  • processElementForSplitRestriction
  • processElementForWindowObservingTruncateRestriction
  • processElementForTruncateRestriction
  • processElementForWindowObservingSizedElementAndRestriction
  • tearDown
  • invoke timer callback

Do you think the step info will be useful for the logs in SDF related callbacks? I haven't added for them since I'm not familiar how user would use logging in SDFs but sure we can add tracking for all of them if necessary.

@boyuanzz
Copy link
Contributor

boyuanzz commented Dec 15, 2020

Currently you have recorded the threadId-transformId mapping on

  • startBundle
  • finishBundle
  • processElementForWindowObservingParDo
  • processElementForParDo

There are some other places we also invoke user code:

  • processElementForWindowObservingPairWithRestriction
  • processElementForPairWithRestriction
  • processElementForWindowObservingSplitRestriction
  • processElementForSplitRestriction
  • processElementForWindowObservingTruncateRestriction
  • processElementForTruncateRestriction
  • processElementForWindowObservingSizedElementAndRestriction
  • tearDown
  • invoke timer callback

Do you think the step info will be useful for the logs in SDF related callbacks? I haven't added for them since I'm not familiar how user would use logging in SDFs but sure we can add tracking for all of them if necessary.

SDF indeed is a DoFn, not a callback. The SDF author could add additional logging just like in a normal DoFn. If the purpose of this PR is to add step info for the log that users add in their code, then I think we should consider all these places where we will invoke user code.

@y1chi
Copy link
Contributor Author

y1chi commented Dec 15, 2020

Currently you have recorded the threadId-transformId mapping on

  • startBundle
  • finishBundle
  • processElementForWindowObservingParDo
  • processElementForParDo

There are some other places we also invoke user code:

  • processElementForWindowObservingPairWithRestriction
  • processElementForPairWithRestriction
  • processElementForWindowObservingSplitRestriction
  • processElementForSplitRestriction
  • processElementForWindowObservingTruncateRestriction
  • processElementForTruncateRestriction
  • processElementForWindowObservingSizedElementAndRestriction
  • tearDown
  • invoke timer callback

Do you think the step info will be useful for the logs in SDF related callbacks? I haven't added for them since I'm not familiar how user would use logging in SDFs but sure we can add tracking for all of them if necessary.

SDF indeed is a DoFn, not a callback. The SDF author could add additional logging just like in a normal DoFn. If the purpose of this PR is to add step info for the log that users add in their code, then I think we should consider all these places where we will invoke user code.

got it, I'll add it.

@boyuanzz
Copy link
Contributor

Run Java PreCommit

public class TransformProcessingThreadTracker {
private static final TransformProcessingThreadTracker INSTANCE =
new TransformProcessingThreadTracker();
private final ConcurrentHashMap<Long, String> threadIdToTransformIdMappings;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question is that will this map grow unlimitedly? I'm kind of concerning that it consumes too much memory with a long run instance(and the thread is not reused).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yeah I think you are right, it's potentially an issue and I've changed to use a LoadingCache with expiration.

@boyuanzz
Copy link
Contributor

Run Java PreCommit

Copy link
Contributor

@boyuanzz boyuanzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll merge this PR when all tests pass.

@boyuanzz
Copy link
Contributor

Task :sdks:java:harness:checkstyleMain is failing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants