Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-13829] Expose status API from Go SDK Harness #16957

Merged
merged 19 commits into from Apr 5, 2022

Conversation

riteshghorse
Copy link
Contributor

@riteshghorse riteshghorse commented Feb 26, 2022

Expose SDK harness status from Go SDK to the runner. The attached doc contains the sample goroutine stack dump result from the log when run against dataflow runner and how to replicate it.
Doc Link: https://docs.google.com/document/d/1dMTD5_sKdzLcnoe0ZsQU5Wf9q11uliyYgFnnOZQDzuI/


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@github-actions github-actions bot removed the runners label Feb 28, 2022
@codecov
Copy link

codecov bot commented Feb 28, 2022

Codecov Report

Merging #16957 (0010a67) into master (3cd1f7f) will increase coverage by 0.14%.
The diff coverage is 37.28%.

@@            Coverage Diff             @@
##           master   #16957      +/-   ##
==========================================
+ Coverage   73.80%   73.94%   +0.14%     
==========================================
  Files         663      674      +11     
  Lines       87056    87823     +767     
==========================================
+ Hits        64251    64941     +690     
- Misses      21721    21731      +10     
- Partials     1084     1151      +67     
Flag Coverage Δ
go 50.08% <37.28%> (+1.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/go/pkg/beam/core/runtime/harness/harness.go 10.96% <0.00%> (-1.73%) ⬇️
.../go/pkg/beam/core/runtime/harness/worker_status.go 51.16% <51.16%> (ø)
sdks/go/pkg/beam/core/runtime/exec/plan.go 48.61% <0.00%> (-8.80%) ⬇️
.../python/apache_beam/testing/test_stream_service.py 88.37% <0.00%> (-4.66%) ⬇️
...che_beam/runners/interactive/interactive_runner.py 90.97% <0.00%> (-2.26%) ⬇️
...ks/python/apache_beam/runners/worker/data_plane.py 87.50% <0.00%> (-1.71%) ⬇️
.../python/apache_beam/transforms/periodicsequence.py 96.72% <0.00%> (-1.64%) ⬇️
sdks/go/pkg/beam/core/runtime/graphx/dataflow.go 53.93% <0.00%> (-1.39%) ⬇️
sdks/go/pkg/beam/core/runtime/exec/fn_arity.go 8.00% <0.00%> (-0.89%) ⬇️
... and 38 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3cd1f7f...0010a67. Read the comment docs.

@riteshghorse
Copy link
Contributor Author

R: @lostluck @damccorm @jrmccluskey

Copy link
Contributor

@jrmccluskey jrmccluskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM. Have you validated that the runner-side is able to access the status API? And is there a way to integration test this functionality?

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good - just had a couple small changes and a question

sdks/go/pkg/beam/core/runtime/harness/harness.go Outdated Show resolved Hide resolved
sdks/go.mod Outdated Show resolved Hide resolved
sdks/go/pkg/beam/core/runtime/harness/harness.go Outdated Show resolved Hide resolved
Copy link
Contributor

@lostluck lostluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start! I have a few Go style bit,s and implementation question. I think ultimately, it comes down to: Do we need two goroutines and a channel for this?

sdks/go.mod Show resolved Hide resolved
sdks/go/pkg/beam/core/runtime/harness/worker_status.go Outdated Show resolved Hide resolved
sdks/go/pkg/beam/core/runtime/harness/worker_status.go Outdated Show resolved Hide resolved
sdks/go/pkg/beam/core/runtime/harness/worker_status.go Outdated Show resolved Hide resolved
sdks/go/pkg/beam/core/runtime/harness/worker_status.go Outdated Show resolved Hide resolved
sdks/go/pkg/beam/core/runtime/harness/worker_status.go Outdated Show resolved Hide resolved
@riteshghorse
Copy link
Contributor Author

R: @youngoli
yet to add lull logger.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2022

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

Copy link
Contributor

@youngoli youngoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did my best and left some comments, but I'm not really familiar enough with using grpc in Go to be confident in this. I think you should also get a review from Robert.

Edit: Disregard that, I forgot he already reviewed this. In that case aside from my comments it's all good.

@@ -0,0 +1,91 @@
// Licensed to the Apache Software Foundation (ASF) under one or more
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You could probably shorten the filename to status.go (and status_test.go). Not something I'll block on though.

sdks/go/pkg/beam/core/runtime/harness/harness.go Outdated Show resolved Hide resolved
@youngoli
Copy link
Contributor

youngoli commented Mar 18, 2022

R: @lostluck

Edit: Disregard that, I forgot he already reviewed this.

Copy link
Contributor

@lostluck lostluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once Daniel's comments have been addressed I can do a new pass.

Copy link
Contributor

@lostluck lostluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits and cleanups! Thanks for consolidating the loops, it's a much cleaner approach than before.

sdks/go/pkg/beam/core/runtime/harness/harness.go Outdated Show resolved Hide resolved
sdks/go/pkg/beam/core/runtime/harness/harness.go Outdated Show resolved Hide resolved
sdks/go/pkg/beam/core/runtime/harness/worker_status.go Outdated Show resolved Hide resolved
sdks/go/pkg/beam/core/runtime/harness/harness.go Outdated Show resolved Hide resolved
sdks/go/pkg/beam/core/runtime/harness/worker_status.go Outdated Show resolved Hide resolved
@riteshghorse
Copy link
Contributor Author

PTAL. I guess things left to do are more debug information instead of just stacktrace, lull logger, bundle progress.

Copy link
Contributor

@lostluck lostluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last small changes requested, but otherwise this LGTM.
Thanks!

sdks/go/pkg/beam/core/runtime/harness/harness.go Outdated Show resolved Hide resolved
sdks/go/pkg/beam/core/runtime/harness/worker_status.go Outdated Show resolved Hide resolved
Copy link
Contributor

@lostluck lostluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lostluck lostluck merged commit 9317462 into apache:master Apr 5, 2022
ryanthompson591 pushed a commit to ryanthompson591/beam that referenced this pull request Apr 7, 2022
TheNeuralBit added a commit that referenced this pull request Apr 12, 2022
* added initial commit

* removed modified file

* removed params that dont exist

* added clock, removed generics that were causing pickle error, fixed metrics name

* fixed class names removed class that goes in apis

* added base test file

* Added unit tests

* reordered imports

* replied to comments

* apis to api

* added license

* added mock clock test for metrics, realized our metric wouldn't work right with a generator

* Minor changes from Andys comments. Push metric namespace decision to modleLoader class

* Update sdks/python/apache_beam/ml/inference/base.py

typo fix valentyn's suggestion

Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>

* updated with changes from valentyns comments

* merged from tfx version

* added comment

* linted

* changed import order for jenkins linter

* added a bug to track a todo

* fixed for Roberts comments

* make clock and metrics collector private

* make shared second parameter

* mark RunInferenceDoFn private

* moved initialization of shared.Shared into constructor

* added todo

* Update sdks/python/apache_beam/ml/inference/base.py

Co-authored-by: Brian Hulette <hulettbh@gmail.com>

* Update sdks/python/apache_beam/ml/inference/base.py

Co-authored-by: Brian Hulette <hulettbh@gmail.com>

* Update sdks/python/apache_beam/ml/inference/base.py

Co-authored-by: Brian Hulette <hulettbh@gmail.com>

* updated to correct variable names

* udpated variable names

* added typevar

* remove unbatch

* added note that users should expect changes

* Update python container version

* Add --dataflowServiceOptions=enable_prime to useUnifiedWorker conditions (#17213)

* Add self-descriptive message for expected errors.

Ideally we would not log these in the first place, but this is an easy hack.

* [BEAM-10529] nullable xlang coder (#16923)

* [BEAM-10529] add java and generic components of nullable xlang tests

* [BEAM-10529] fix test case

* [BEAM-10529] add coders and typehints to support nullable xlang coders

* [BEAM-10529] update external builder to support nullable coder

* [BEAM-10529] clean up coders.py

* [BEAM-10529] add coder translation test

* [BEAM-10529] add additional check to typecoder to not accidentally misidentify coders as nullable

* [BEAM-10529] add test to retrieve nullable coder from typehint

* [BEAM-10529] run spotless

* [BEAM-10529] add go nullable coder

* [BEAM-10529] cleanup extra println

* [BEAM-10529] improve comments, clean up python

* [BEAM-10529] remove changes to kafkaIO to simplify pr

* [BEAM-10529] add coders to go exec, add asf license text

* [BEAM-10529] clean up error handlign

* [BEAM-10529] update go fromyaml to handle nullable cases

* [BEAM-10529] add unit test, register nullable coder in dataflow.go

* [BEAM-10529] remove mistaken commit

* [BEAM-10529] add argument check to CoderTranslators

* [BEAM-10529] Address python comments & cleanup

* [BEAM-10529] address go comments

* [BEAM-10529] remove extra check that was added in error

* [BEAM-10529] fix typo

* [BEAM-10529] re-order check for nonetype to prevent attribute errors

* [BEAM-10529] change isinstance to ==

* Fix go fmt break in core/typex/special.go (#17266)

* [BEAM-8970] Add docs to run wordcount example on portable Spark Runner

* [BEAM-8970] Add period to end of sentence

* [BEAM-5436] Add doc page on Go cross compilation. (#17256)

* Pr-bot Don't count all reviews as approvals (#17269)

* Fix postcommits (#17263)

* [BEAM-14241] Address staticcheck warnings in boot.go (#17264)

* [BEAM-14157] GrpcWindmillServer: Use stream specific boolean to do client closed check (#17191)

* [BEAM-14157] GrpcWindmillServer: Use stream specific boolean to do client closed check

This is a follow up to #17162. An AbstractWindmillStream can have more than one grpc stream during its lifetime, new streams can be created after client closed for sending pending requests. So it is not correct to check `if(clientClosed)` in `send()`, this PR adds a new grpc stream level boolean to do the closed check in `send()`.

* [BEAM-14157] Add unit test testing CommitWorkStream retries around stream closing

* [BEAM-14157] review comments

* [BEAM-14157] review comments

* [BEAM-14157] review comments

* [BEAM-14157] fix test

* [BEAM-14157] fix test

Co-authored-by: Arun Pandian <pandiana@google.com>

* [BEAM-10582] Allow (and test) pyarrow 7 (#17229)

* [BEAM-13519] Solve race issues when the server responds with an error before the GrpcStateClient finishes being constructed. (#17240)

* [BEAM-13519] Solve race issues when the server responds with an error before the GrpcStateClient finishes.

The issue was that the InboundObserver can be invoked before outboundObserverFactory#outboundObserverFor returns meaning that
the server is waiting for a response for cache.remove but cache.computeIfAbsent is being invoked at the same time.

Another issue was that the outstandingRequests map could be updated with another request within GrpcStateClient during closeAndCleanup meaning that the CompleteableFuture would never be completed exceptionally.

Passes 1000 times locally now without getting stuck or failing.

* [BEAM-14256] update SpEL dependency to 5.3.18.RELEASE

* [BEAM-14256] remove .RELEASE

* [BEAM-13015] Disable retries for fnapi grpc channels which otherwise defaults on. (#17243)

* [BEAM-13015] Disable retries for grpc channels which otherwise default to true.

Since the channel is to the local runner process, retries are not expected to
help. This simplifies the grpc stream stack to not involve a RetryStream object.

* fixup comment

* Update sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/channel/ManagedChannelFactory.java

* Update sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/channel/ManagedChannelFactory.java

Co-authored-by: Lukasz Cwik <lcwik@google.com>

* [BEAM-9649] Add region option to Mongo Dataflow test.

* Fix dependency issue causing failures

* [BEAM-13952] Sickbay testAfterProcessingTimeContinuationTriggerUsingState (#17214)

* BEAM-14235 parquetio module does not parse PEP-440 compliant Pyarrow version (#17275)

* Update parquetio.py

* Update CHANGES.md

* Fix import order

* [BEAM-14250] Fix coder registration for types defined in __main__.

Until all runners are portable and we can get rid of all round trips
between Pipeline and proto representatons, register types in __main__
according to their string representations as pickling does not
preserve identity.

* Allow get_coder(None).

Co-authored-by: Andy Ye <andyye333@gmail.com>

* [Website] Contribution guide page indent bug fix (#17287)

* Fix markdown indent issue in Development Setup section

* update query

* [BEAM-10976] Document go sdk bundle finalization (#17048)

* [BEAM-13829] Expose status API from Go SDK Harness (#16957)

* Avoid pr-bot state desync (#17299)

* [BEAM-14259] Clean up staticcheck warnings in the exec package (#17285)

* Minor: Prefer registered schema in SQL docs (#17298)

* Prefer registered schema in SQL docs

* address review comments

* [Playground] add meta tags (#17207)

* playground add meta tags

* playground fix meta tags

* fixes golint and deprecated issues in recent Go SDK import (#17304)

* [BEAM-14262] Update plugins for Dockerized Jenkins.

I copied the list from the cwiki and removed all of the ones that failed to install. https://cwiki.apache.org/confluence/display/INFRA/ci-beam.apache.org

* Add ansicolor and ws-cleanup plugins.

Without them, the seed job prints warnings:

Warning: (CommonJobProperties.groovy, line 107) plugin 'ansicolor' needs to be installed
Warning: (CommonJobProperties.groovy, line 113) plugin 'ws-cleanup' needs to be installed

* [BEAM-14266] Replace deprecated ptypes package uses (#17302)

* [BEAM-11936] Fix rawtypes warnings in SnowflakeIO (#17257)

* [BEAM-10556] Fix rawtypes warnings in SnowflakeIO

* fixup! [BEAM-10556] Fix rawtypes warnings in SnowflakeIO

* Merge pull request #17262: [BEAM-14244] Use the supplied output timestamp for processing time timers rather than the input watermark

* removed unused typing

* added list typing

* linted

Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>
Co-authored-by: Brian Hulette <hulettbh@gmail.com>
Co-authored-by: kileys <kileysok@google.com>
Co-authored-by: Yichi Zhang <zyichi@google.com>
Co-authored-by: Kyle Weaver <kcweaver@google.com>
Co-authored-by: johnjcasey <95318300+johnjcasey@users.noreply.github.com>
Co-authored-by: Jack McCluskey <34928439+jrmccluskey@users.noreply.github.com>
Co-authored-by: Benjamin Gonzalez <benjamin.gonzalez@wizeline.com>
Co-authored-by: Robert Burke <lostluck@users.noreply.github.com>
Co-authored-by: Danny McCormick <dannymccormick@google.com>
Co-authored-by: Arun Pandian <arunpandianp@gmail.com>
Co-authored-by: Arun Pandian <pandiana@google.com>
Co-authored-by: Brian Hulette <bhulette@google.com>
Co-authored-by: Lukasz Cwik <lcwik@google.com>
Co-authored-by: johnjcasey <johnjcasey@google.com>
Co-authored-by: scwhittle <scwhittle@users.noreply.github.com>
Co-authored-by: Arwin Tio <arwin.tio@adroll.com>
Co-authored-by: Robert Bradshaw <robertwb@gmail.com>
Co-authored-by: Andy Ye <andyye333@gmail.com>
Co-authored-by: Yi Hu <yathu@google.com>
Co-authored-by: Michael Li <bingyeli@google.com>
Co-authored-by: Ritesh Ghorse <riteshghorse@gmail.com>
Co-authored-by: Aydar Farrakhov <stranniknm@gmail.com>
Co-authored-by: Kamil Breguła <kamil.bregula@snowflake.com>
Co-authored-by: Steven Niemitz <steveniemitz@gmail.com>
@riteshghorse riteshghorse deleted the debug branch June 27, 2022 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants