Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove kwargs and add explicit runinference_args #21806

Merged
merged 14 commits into from
Jun 15, 2022

Conversation

yeandy
Copy link
Contributor

@yeandy yeandy commented Jun 10, 2022

Removing **kwargs from RunInference and add an explicit extra_runinference_args argument. Only frameworks, like Pytorch, that need the extra params will use it. Other frameworks won't need pass in these extra extra_runinference_args args.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@yeandy
Copy link
Contributor Author

yeandy commented Jun 10, 2022

@asf-ci
Copy link

asf-ci commented Jun 10, 2022

Can one of the admins verify this patch?

3 similar comments
@asf-ci
Copy link

asf-ci commented Jun 10, 2022

Can one of the admins verify this patch?

@asf-ci
Copy link

asf-ci commented Jun 10, 2022

Can one of the admins verify this patch?

@asf-ci
Copy link

asf-ci commented Jun 10, 2022

Can one of the admins verify this patch?

Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, just some minor points

sdks/python/apache_beam/ml/inference/base.py Outdated Show resolved Hide resolved
examples, self._model, extra_kwargs)
else:
result_generator = self._model_handler.run_inference(
examples, self._model)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just always pass extra_kwargs through?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I originally had it this way because I didn't replace **kwargs with extra_kwargs in base or sklearn, and only in pytorch. Changed.

@yeandy
Copy link
Contributor Author

yeandy commented Jun 13, 2022

PTAL @TheNeuralBit

@codecov
Copy link

codecov bot commented Jun 13, 2022

Codecov Report

Merging #21806 (fe84987) into master (8c8c431) will decrease coverage by 0.05%.
The diff coverage is 72.22%.

@@            Coverage Diff             @@
##           master   #21806      +/-   ##
==========================================
- Coverage   74.06%   74.01%   -0.06%     
==========================================
  Files         698      699       +1     
  Lines       92600    92675      +75     
==========================================
+ Hits        68585    68592       +7     
- Misses      22760    22828      +68     
  Partials     1255     1255              
Flag Coverage Δ
python 83.65% <72.22%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...thon/apache_beam/ml/inference/pytorch_inference.py 0.00% <0.00%> (ø)
sdks/python/apache_beam/ml/inference/base.py 95.29% <100.00%> (-0.54%) ⬇️
...thon/apache_beam/ml/inference/sklearn_inference.py 95.23% <100.00%> (+0.69%) ⬆️
...eam/runners/portability/fn_api_runner/execution.py 92.44% <0.00%> (-0.65%) ⬇️
...hon/apache_beam/runners/worker/bundle_processor.py 93.17% <0.00%> (-0.38%) ⬇️
...am/examples/inference/pytorch_language_modeling.py 0.00% <0.00%> (ø)
sdks/python/apache_beam/runners/common.py 88.92% <0.00%> (+0.24%) ⬆️
sdks/python/apache_beam/transforms/combiners.py 93.43% <0.00%> (+0.38%) ⬆️
sdks/python/apache_beam/utils/interactive_utils.py 97.56% <0.00%> (+2.43%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8c8c431...fe84987. Read the comment docs.

@TheNeuralBit
Copy link
Member

Run PythonDocs PreCommit

@TheNeuralBit
Copy link
Member

TheNeuralBit commented Jun 14, 2022

It seems the actual error in PythonDocs is hidden by all the dataframe cruft, sorry about that. I filed #21845 to fix that.

15:21:58 /home/jenkins/jenkins-slave/workspace/beam_PreCommit_PythonDocs_Phrase/src/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py38-docs/py38-docs/lib/python3.8/site-packages/apache_beam/ml/inference/base.py:docstring of apache_beam.ml.inference.base.RunInference:5: WARNING: Unexpected indentation.

keys, unkeyed_batch = zip(*batch)
return zip(
keys, self._unkeyed.run_inference(unkeyed_batch, model, **kwargs))
keys, self._unkeyed.run_inference(unkeyed_batch, model, extra_kwargs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These arguments should be passed through if they exist and not otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about growing the interface in a lot of places to handle arguments that are only applicable to a single framework.

If an argument is shouldn't be in a framework ideally it should fail and give an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the fact is the interface has grown, extra_kwargs is an argument on ModelHandler.run_inference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean to slow things down here, I'll defer to what you all want to do. But I just want to play devil's advocate a bit: If it's a part of the interface it should be a part of the interface. Making it into a dynamic, optional parameter just makes the carve out worse (IMO).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't refresh this page before posting so I didn't see the new comments. I'm leaning towards having extra_kwargs b/c it's part of the interface, even if it isn't used in sklearn.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caught up with @ryanthompson591. We're going to stick with passing (renamed) inference_args for all frameworks, but for sklearn, raise an exception if a non-empty value is passed.

self,
batch: Sequence[numpy.ndarray],
model: BaseEstimator,
extra_kwargs: Optional[Dict[str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Let's not put these extra args into sklearn. The model doesn't want them, they don't need to be there.

This will be extra unused parameters that don't need to be there. They don't need to be here and they don't need to be in tensorflow implementations.

Don't put them here. If anything the only thing this PR should do is remove **kwargs from this interface, it shouldn't be here.

If someone puts extra args into this interface that it isn't expecting it should fail instead of work silently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone puts extra args into this interface that it isn't expecting it should fail instead of work silently.

A ModelHandler that doesn't support extra_kwargs could raise an error. We don't need to rely on Python argument matching to raise the error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Removed.

model=model,
prediction_params=prediction_params)
for actual, expected in zip(predictions, KWARGS_TORCH_PREDICTIONS):
batch=KEYED_TORCH_EXAMPLES, model=model, extra_kwargs=extra_kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep these args as anonymous

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify by anonymous args?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I've been thinking about this:

I was playing around with ways to make arguments specific to run_inference and I think there are only three ways. Either what you have done, anonymous args, or an if statement

if inference_args:
model_handler.run_inference(model, batch, inference_args)
else:
model_handler.run_inference(model, batch)

I'm not sure what I prefer now that I'm looking at it.

The if statement has the advantage of allowing clients that don't expect this argument to fail or pass without modifcations.

sdks/python/apache_beam/ml/inference/base.py Outdated Show resolved Hide resolved
@TheNeuralBit
Copy link
Member

Run Python PreCommit

@tvalentyn tvalentyn merged commit 2de4c2c into apache:master Jun 15, 2022
model: BaseEstimator,
inference_args: Optional[Dict[str, Any]] = None
) -> Iterable[PredictionResult]:
_validate_inference_args(inference_args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not urgent). This validation will happen at pipeline execution, right? Can it be done at construction time to reduce the feedback loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I've filed this issue #21894 as something to do later.

bullet03 pushed a commit to akvelon/beam that referenced this pull request Jun 20, 2022
Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>
prodriguezdefino pushed a commit to prodriguezdefino/beam-pabs that referenced this pull request Jun 21, 2022
Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants