Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-3221] Improve documentation around split request and response #17726

Merged
merged 3 commits into from
Jul 6, 2022

Conversation

lukecwik
Copy link
Member

Give an example and provide properties that must be held within a split and across multiple splits within the same bundle.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

Give an example and provide properties that must be held within a split and across multiple splits within the same bundle.
@github-actions github-actions bot added the model label May 20, 2022
@lukecwik
Copy link
Member Author

R: @robertwb @scwhittle

I documented the splitting protocol in the protos and the properties that must hold within a split and across splits due to the recent issue with Dataflow rejecting a valid split.

@codecov
Copy link

codecov bot commented May 20, 2022

Codecov Report

Merging #17726 (5484663) into master (64c61e9) will decrease coverage by 0.08%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #17726      +/-   ##
==========================================
- Coverage   73.99%   73.91%   -0.09%     
==========================================
  Files         696      697       +1     
  Lines       91851    92064     +213     
==========================================
+ Hits        67964    68046      +82     
- Misses      22638    22769     +131     
  Partials     1249     1249              
Flag Coverage Δ
python 83.59% <ø> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ache_beam/runners/dataflow/dataflow_job_service.py 50.00% <0.00%> (-12.17%) ⬇️
...s/interactive/dataproc/dataproc_cluster_manager.py 77.41% <0.00%> (-6.80%) ⬇️
.../python/apache_beam/testing/test_stream_service.py 88.09% <0.00%> (-4.77%) ⬇️
sdks/python/apache_beam/dataframe/io.py 88.78% <0.00%> (-3.26%) ⬇️
...eam/runners/portability/fn_api_runner/fn_runner.py 87.51% <0.00%> (-2.50%) ⬇️
...nners/portability/fn_api_runner/worker_handlers.py 77.89% <0.00%> (-1.45%) ⬇️
...hon/apache_beam/runners/direct/test_stream_impl.py 93.28% <0.00%> (-0.75%) ⬇️
sdks/python/apache_beam/internal/dill_pickler.py 85.61% <0.00%> (-0.72%) ⬇️
...che_beam/runners/interactive/interactive_runner.py 90.06% <0.00%> (-0.59%) ⬇️
sdks/python/apache_beam/runners/direct/executor.py 96.46% <0.00%> (-0.55%) ⬇️
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 64c61e9...5484663. Read the comment docs.

@lukecwik
Copy link
Member Author

Run Java PreCommit

5 similar comments
@lukecwik
Copy link
Member Author

Run Java PreCommit

@lukecwik
Copy link
Member Author

Run Java PreCommit

@lukecwik
Copy link
Member Author

Run Java PreCommit

@lukecwik
Copy link
Member Author

Run Java PreCommit

@lukecwik
Copy link
Member Author

Run Java PreCommit

@aaltay aaltay requested a review from robertwb June 2, 2022 19:06
// - last_primary_element < first_residual_element
// - primary roots and residual roots can only be specified if the
// last_primary_element + 1 < first_residual_element
// (typically there is one primary and residual root per element in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe say something about primary root and residual roots being a disjoint but full coverage of the work represented by the elements between last_primary_element and first_residual_element?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// - The work in primary and residual doesn't overlap, and combined, adds up
// to the work in the current bundle if the split hadn't happened.
// - The current bundle, if it keeps executing, will have done none of the
// work under residual_roots.
// work under residual_roots and none of the elements in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

complete sentence

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// the work under primary_roots.
// the work under primary_roots and all elements up to and including the
// channel splits last_primary_element.
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be valuable to have a conceptual summary here, e.g. "This allows the SDK to relinquish ownership of and commit to not process some of the elements that it may have been sent (the residual) while retaining ownership and commitment to finish the other portion (the primary)."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// range (last_primary_element, first_residual_element))
//
// Note that subsequent splits of the same bundle must ensure that:
// - the last_primary_element does not increase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a requirement. The underlying requirement is that primary_n + residual_n = primary_{n-1}.

In practice part or all of last_primary_element + 1 is often part of the previous residual.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

// part of the primary, identified by its absolute index in the (ordered)
// channel.
// (Required) The last element of the input channel that should be entirely
// considered part of the primary, identified by its absolute index in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zero-based index? (similarly below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@lukecwik
Copy link
Member Author

Run Java PreCommit passed, the GH UI has yet to update.

@lukecwik
Copy link
Member Author

@robertwb Any additional comments or good to merge?

@lukecwik
Copy link
Member Author

lukecwik commented Jul 6, 2022

Run Java PreCommit

@lukecwik lukecwik merged commit f893f6e into apache:master Jul 6, 2022
sorokin-andrey pushed a commit to akvelon/beam that referenced this pull request Jul 7, 2022
…pache#17726)

* [BEAM-3221] Improve documentation around split request and response

Give an example and provide properties that must be held within a split and across multiple splits within the same bundle.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants