Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-11104] Add code snippet for Go SDK Self-Checkpointing #17956

Merged
merged 4 commits into from
Jun 10, 2022

Conversation

jrmccluskey
Copy link
Contributor

@jrmccluskey jrmccluskey commented Jun 3, 2022

Adds small code snippet example to the Beam Programming Guide that demonstrates self-checkpointing behavior in Beam Go.

Rendering


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Add a link to the appropriate issue in your description, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

Adds small code snippet example to the Beam Programming Guide that demonstrates self-checkpointing behavior in Beam Go.
@asf-ci
Copy link

asf-ci commented Jun 3, 2022

Can one of the admins verify this patch?

1 similar comment
@asf-ci
Copy link

asf-ci commented Jun 3, 2022

Can one of the admins verify this patch?

@jrmccluskey
Copy link
Contributor Author

R: @riteshghorse @damccorm

@@ -6422,7 +6422,26 @@ resource utilization.
{{< /highlight >}}

{{< highlight go >}}
This is not supported yet, see BEAM-11104.
func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit func(Record)) sdf.ProcessContinuation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this in the snippets folder (example below in Watermark estimation section)? I know we haven't been clean on that before, but it:

(a) makes sure that the code actually compiles
(b) makes it easier to reuse (e.g. I know Dataflow has docs that use snippets from Beam)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a completely fictional IO since we don't actually have a robust native streaming IO, so there's nothing to compile. It's just modeled after the Python and Java versions.

Copy link
Contributor

@damccorm damccorm Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I don't care if we make up empty functions for that, we already do that for a number of the existing snippets. The process continuation stuff should compile though, and that's the important bit anyways.

@@ -6422,7 +6422,26 @@ resource utilization.
{{< /highlight >}}

{{< highlight go >}}
This is not supported yet, see BEAM-11104.
func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit func(Record)) sdf.ProcessContinuation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you return an err parameter as well (it can just return nil)? Something I realized w/ Bundle Finalization is that its much more helpful if we provide the parameters that surround the one we are demonstrating because it allows users to see the ordering we require.

Side note unrelated to this PR: We probably need better ordering error messages, they are pretty confusing right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually also curious about what process continuation we should return when we return an err response actually - is it nil? Might be worth including that as an option if for example, records, err := fn.ExternalService.readNextRecords(position) returns a non-nil, non-throttling error respone

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC if an SDF signature has a ProcessContinuation return we always expect either a Resume() or Stop() continuation and never a nil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an error parameter is reasonable, so that's added. It did lead to some extra error checking overhead since we don't have the try-catch mechanism the other SDKs leverage but that's not a huge problem.

return sdf.ResumeProcessingIn(60 * time.Seconds)
}
if len(records) == 0 {
return sdf.ResumeProcessingIn(10 * time.Seconds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment along the lines of // Wait for data to be available? Might be nice to have a similar comment for the throttling case and the finish execution case as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair note. Adding clarifying comments is always good for a documentation snippet

@github-actions
Copy link
Contributor

github-actions bot commented Jun 3, 2022

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @riteshghorse for label go.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@jrmccluskey
Copy link
Contributor Author

R: @lostluck

@github-actions
Copy link
Contributor

github-actions bot commented Jun 3, 2022

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@lostluck lostluck merged commit 408664b into apache:master Jun 10, 2022
@jrmccluskey jrmccluskey deleted the patch-1 branch June 15, 2022 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants