Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document and test overriding batch type inference #21844

Merged
merged 3 commits into from
Jun 14, 2022

Conversation

TheNeuralBit
Copy link
Member

Fixes #21652

Some Batched DoFns (e.g. RunInference) will need to declare their input/output batch types dynamically based on some configuration. Technically a DoFn implementation should already be able to do this, but it's untested and undocumented. This PR simply documents the functions that need to be overridden (get_input_batch_type, get_output_batch_type), and adds tests verifying it's possible.

We also add new _normalized versions of these functions which are responsible for normalizing the typehints to Beam typehints. This allows users to return native typehints in their implementations if they prefer.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@TheNeuralBit
Copy link
Member Author

R: @yeandy

@codecov
Copy link

codecov bot commented Jun 14, 2022

Codecov Report

Merging #21844 (0d70a37) into master (87a7dcc) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #21844      +/-   ##
==========================================
- Coverage   74.15%   74.13%   -0.03%     
==========================================
  Files         698      698              
  Lines       92411    92433      +22     
==========================================
- Hits        68530    68524       -6     
- Misses      22630    22658      +28     
  Partials     1251     1251              
Flag Coverage Δ
python 83.73% <100.00%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/transforms/core.py 92.42% <100.00%> (+0.03%) ⬆️
.../python/apache_beam/testing/test_stream_service.py 88.09% <0.00%> (-4.77%) ⬇️
sdks/python/apache_beam/utils/interactive_utils.py 95.12% <0.00%> (-2.44%) ⬇️
...n/apache_beam/ml/gcp/recommendations_ai_test_it.py 73.46% <0.00%> (-2.05%) ⬇️
sdks/python/apache_beam/io/source_test_utils.py 88.01% <0.00%> (-1.39%) ⬇️
...che_beam/runners/interactive/interactive_runner.py 90.06% <0.00%> (-1.33%) ⬇️
...eam/runners/portability/fn_api_runner/execution.py 92.44% <0.00%> (-0.65%) ⬇️
...ks/python/apache_beam/runners/worker/sdk_worker.py 88.94% <0.00%> (-0.16%) ⬇️
...hon/apache_beam/runners/worker/bundle_processor.py 93.54% <0.00%> (-0.13%) ⬇️
sdks/python/apache_beam/io/hadoopfilesystem.py 97.28% <0.00%> (ø)
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 87a7dcc...0d70a37. Read the comment docs.

Copy link
Contributor

@yeandy yeandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

sdks/python/apache_beam/transforms/core.py Outdated Show resolved Hide resolved
DoFn is being applied to.

Returns:
``None`` if this DoFn cannot accept batches, a Beam typehint or a native
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``None`` if this DoFn cannot accept batches, a Beam typehint or a native
``None`` if this DoFn cannot accept batches, a Beam typehint, or a native

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

DoFn is being applied to.

Returns:
``None`` if this DoFn will never yield batches, a Beam typehint or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``None`` if this DoFn will never yield batches, a Beam typehint or
``None`` if this DoFn will never yield batches, a Beam typehint, or

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want "Beam typehint or native typehint" as a unit to be the "else" clause. I updated the language to make that explicit instead of applying this. Thanks for pointing it out

Comment on lines +770 to +776
def _get_input_batch_type_normalized(self, input_element_type):
return typehints.native_type_compatibility.convert_to_beam_type(
self.get_input_batch_type(input_element_type))

def _get_output_batch_type_normalized(self, input_element_type):
return typehints.native_type_compatibility.convert_to_beam_type(
self.get_output_batch_type(input_element_type))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these private functions? Is it because normalizing to Beam types isn't going to be a common op?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are convenience functions I provided for our internal use, users shouldn't call them. Users shouldn't call the others (get_{input,output}_batch_type) either - but they are part of the public API since users can override them if they need to.

Come to think of it I should probably mark some other convenience functions we added as protected. I'll follow up with a PR for that.

@TheNeuralBit
Copy link
Member Author

Thanks @yeandy!

@TheNeuralBit TheNeuralBit merged commit 5f04b97 into apache:master Jun 14, 2022
bullet03 pushed a commit to akvelon/beam that referenced this pull request Jun 20, 2022
* Document and test overriding batch type inference

* address review comments

* Update sdks/python/apache_beam/transforms/core.py

Co-authored-by: Andy Ye <andyye333@gmail.com>

Co-authored-by: Andy Ye <andyye333@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider providing a dynamic API for declaring batch input type
2 participants