-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-3221] Improve documentation around split request and response #17726
Conversation
Give an example and provide properties that must be held within a split and across multiple splits within the same bundle.
I documented the splitting protocol in the protos and the properties that must hold within a split and across splits due to the recent issue with Dataflow rejecting a valid split. |
Codecov Report
@@ Coverage Diff @@
## master #17726 +/- ##
==========================================
- Coverage 73.99% 73.91% -0.09%
==========================================
Files 696 697 +1
Lines 91851 92064 +213
==========================================
+ Hits 67964 68046 +82
- Misses 22638 22769 +131
Partials 1249 1249
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Run Java PreCommit |
5 similar comments
Run Java PreCommit |
Run Java PreCommit |
Run Java PreCommit |
Run Java PreCommit |
Run Java PreCommit |
model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto
Outdated
Show resolved
Hide resolved
// - last_primary_element < first_residual_element | ||
// - primary roots and residual roots can only be specified if the | ||
// last_primary_element + 1 < first_residual_element | ||
// (typically there is one primary and residual root per element in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe say something about primary root and residual roots being a disjoint but full coverage of the work represented by the elements between last_primary_element and first_residual_element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// - The work in primary and residual doesn't overlap, and combined, adds up | ||
// to the work in the current bundle if the split hadn't happened. | ||
// - The current bundle, if it keeps executing, will have done none of the | ||
// work under residual_roots. | ||
// work under residual_roots and none of the elements in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
complete sentence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// the work under primary_roots. | ||
// the work under primary_roots and all elements up to and including the | ||
// channel splits last_primary_element. | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be valuable to have a conceptual summary here, e.g. "This allows the SDK to relinquish ownership of and commit to not process some of the elements that it may have been sent (the residual) while retaining ownership and commitment to finish the other portion (the primary)."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// range (last_primary_element, first_residual_element)) | ||
// | ||
// Note that subsequent splits of the same bundle must ensure that: | ||
// - the last_primary_element does not increase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a requirement. The underlying requirement is that primary_n + residual_n = primary_{n-1}.
In practice part or all of last_primary_element + 1 is often part of the previous residual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
// part of the primary, identified by its absolute index in the (ordered) | ||
// channel. | ||
// (Required) The last element of the input channel that should be entirely | ||
// considered part of the primary, identified by its absolute index in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zero-based index? (similarly below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Run Java PreCommit passed, the GH UI has yet to update. |
@robertwb Any additional comments or good to merge? |
Run Java PreCommit |
…pache#17726) * [BEAM-3221] Improve documentation around split request and response Give an example and provide properties that must be held within a split and across multiple splits within the same bundle.
Give an example and provide properties that must be held within a split and across multiple splits within the same bundle.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.