Document and test overriding batch type inference #21844

TheNeuralBit · 2022-06-14T00:58:52Z

Some Batched DoFns (e.g. RunInference) will need to declare their input/output batch types dynamically based on some configuration. Technically a DoFn implementation should already be able to do this, but it's untested and undocumented. This PR simply documents the functions that need to be overridden (get_input_batch_type, get_output_batch_type), and adds tests verifying it's possible.

We also add new _normalized versions of these functions which are responsible for normalizing the typehints to Beam typehints. This allows users to return native typehints in their implementations if they prefer.

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

TheNeuralBit · 2022-06-14T00:59:17Z

R: @yeandy

codecov · 2022-06-14T01:18:45Z

Codecov Report

Merging #21844 (0d70a37) into master (87a7dcc) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #21844      +/-   ##
==========================================
- Coverage   74.15%   74.13%   -0.03%     
==========================================
  Files         698      698              
  Lines       92411    92433      +22     
==========================================
- Hits        68530    68524       -6     
- Misses      22630    22658      +28     
  Partials     1251     1251

Flag	Coverage Δ
python	`83.73% <100.00%> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
sdks/python/apache_beam/transforms/core.py	`92.42% <100.00%> (+0.03%)`	⬆️
.../python/apache_beam/testing/test_stream_service.py	`88.09% <0.00%> (-4.77%)`	⬇️
sdks/python/apache_beam/utils/interactive_utils.py	`95.12% <0.00%> (-2.44%)`	⬇️
...n/apache_beam/ml/gcp/recommendations_ai_test_it.py	`73.46% <0.00%> (-2.05%)`	⬇️
sdks/python/apache_beam/io/source_test_utils.py	`88.01% <0.00%> (-1.39%)`	⬇️
...che_beam/runners/interactive/interactive_runner.py	`90.06% <0.00%> (-1.33%)`	⬇️
...eam/runners/portability/fn_api_runner/execution.py	`92.44% <0.00%> (-0.65%)`	⬇️
...ks/python/apache_beam/runners/worker/sdk_worker.py	`88.94% <0.00%> (-0.16%)`	⬇️
...hon/apache_beam/runners/worker/bundle_processor.py	`93.54% <0.00%> (-0.13%)`	⬇️
sdks/python/apache_beam/io/hadoopfilesystem.py	`97.28% <0.00%> (ø)`
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 87a7dcc...0d70a37. Read the comment docs.

yeandy

LGTM

sdks/python/apache_beam/transforms/core.py

yeandy · 2022-06-14T11:23:31Z

sdks/python/apache_beam/transforms/core.py

+        DoFn is being applied to.
+
+    Returns:
+      ``None`` if this DoFn cannot accept batches, a Beam typehint or a native


Suggested change

``None`` if this DoFn cannot accept batches, a Beam typehint or a native

``None`` if this DoFn cannot accept batches, a Beam typehint, or a native

yeandy · 2022-06-14T11:28:34Z

sdks/python/apache_beam/transforms/core.py

+        DoFn is being applied to.
+
+    Returns:
+      ``None`` if this DoFn will never yield batches, a Beam typehint or


Suggested change

``None`` if this DoFn will never yield batches, a Beam typehint or

``None`` if this DoFn will never yield batches, a Beam typehint, or

I want "Beam typehint or native typehint" as a unit to be the "else" clause. I updated the language to make that explicit instead of applying this. Thanks for pointing it out

sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py

sdks/python/apache_beam/transforms/batch_dofn_test.py

yeandy · 2022-06-14T12:11:18Z

sdks/python/apache_beam/transforms/core.py

+  def _get_input_batch_type_normalized(self, input_element_type):
+    return typehints.native_type_compatibility.convert_to_beam_type(
+        self.get_input_batch_type(input_element_type))
+
+  def _get_output_batch_type_normalized(self, input_element_type):
+    return typehints.native_type_compatibility.convert_to_beam_type(
+        self.get_output_batch_type(input_element_type))


Why are these private functions? Is it because normalizing to Beam types isn't going to be a common op?

These are convenience functions I provided for our internal use, users shouldn't call them. Users shouldn't call the others (get_{input,output}_batch_type) either - but they are part of the public API since users can override them if they need to.

Come to think of it I should probably mark some other convenience functions we added as protected. I'll follow up with a PR for that.

sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py

Co-authored-by: Andy Ye <andyye333@gmail.com>

TheNeuralBit · 2022-06-14T17:38:53Z

Thanks @yeandy!

* Document and test overriding batch type inference * address review comments * Update sdks/python/apache_beam/transforms/core.py Co-authored-by: Andy Ye <andyye333@gmail.com> Co-authored-by: Andy Ye <andyye333@gmail.com>

Document and test overriding batch type inference

b4a3c88

github-actions bot added the python label Jun 14, 2022

yeandy approved these changes Jun 14, 2022

View reviewed changes

TheNeuralBit and others added 2 commits June 14, 2022 10:32

address review comments

7b3c936

Update sdks/python/apache_beam/transforms/core.py

0d70a37

Co-authored-by: Andy Ye <andyye333@gmail.com>

TheNeuralBit merged commit 5f04b97 into apache:master Jun 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document and test overriding batch type inference #21844

Document and test overriding batch type inference #21844

TheNeuralBit commented Jun 14, 2022

TheNeuralBit commented Jun 14, 2022

codecov bot commented Jun 14, 2022 •

edited

yeandy left a comment

yeandy Jun 14, 2022

TheNeuralBit Jun 14, 2022

yeandy Jun 14, 2022

TheNeuralBit Jun 14, 2022

yeandy Jun 14, 2022

TheNeuralBit Jun 14, 2022

TheNeuralBit commented Jun 14, 2022

	``None`` if this DoFn cannot accept batches, a Beam typehint or a native
	``None`` if this DoFn cannot accept batches, a Beam typehint, or a native

	``None`` if this DoFn will never yield batches, a Beam typehint or
	``None`` if this DoFn will never yield batches, a Beam typehint, or

Document and test overriding batch type inference #21844

Document and test overriding batch type inference #21844

Conversation

TheNeuralBit commented Jun 14, 2022

GitHub Actions Tests Status (on master branch)

TheNeuralBit commented Jun 14, 2022

codecov bot commented Jun 14, 2022 • edited

Codecov Report

yeandy left a comment

Choose a reason for hiding this comment

yeandy Jun 14, 2022

Choose a reason for hiding this comment

TheNeuralBit Jun 14, 2022

Choose a reason for hiding this comment

yeandy Jun 14, 2022

Choose a reason for hiding this comment

TheNeuralBit Jun 14, 2022

Choose a reason for hiding this comment

yeandy Jun 14, 2022

Choose a reason for hiding this comment

TheNeuralBit Jun 14, 2022

Choose a reason for hiding this comment

TheNeuralBit commented Jun 14, 2022

codecov bot commented Jun 14, 2022 •

edited