Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-10427] Benchmark runtime typechecking for the Python SDK #12242

Conversation

saavan-google
Copy link
Contributor

@saavan-google saavan-google commented Jul 13, 2020

This PR adds a load test for evaluating the performance of the runtime typechecking system for the Python SDK. It works by comparing the performance of a pipeline with the runtime typechecking feature on versus the same pipeline with it off. This load test also allows the user to specify whether the pipeline should be run with simple typehints (e.g. str) or complex, nested typehints (e.g. Tuple[str, int, Iterable[bool]].

This PR is ready for review.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang --- --- Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@saavan-google saavan-google force-pushed the BEAM-10427-benchmark-runtime-typechecking branch from 1cc5ca7 to 91c3afd Compare July 29, 2020 18:07
@saavan-google
Copy link
Contributor Author

R: @udim
R: @robertwb

@udim
Copy link
Member

udim commented Aug 1, 2020

Please update PR desc. on whether it's ready for review

@saavan-google
Copy link
Contributor Author

R: @udim
R: @robertwb

PTAL - this is ready for review

Copy link
Contributor

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't have a chance to look at this sooner.

Since this is a purely local change, I would suggest rather than adding all this infrastructure simply creating a variant of (or even adding an option to)

https://github.com/apache/beam/blob/master/sdks/python/apache_beam/tools/map_fn_microbenchmark.py

which enables the two runtime type checking. This'll be much cleaner numbers, and a lot faster turn-around time. (It'd probably be valuable to run this before and after your changes even with typechecking turned off to ensure you're not adding overhead in that case too.)

@udim
Copy link
Member

udim commented Aug 6, 2020

Sorry I didn't have a chance to look at this sooner.

Since this is a purely local change, I would suggest rather than adding all this infrastructure simply creating a variant of (or even adding an option to)

https://github.com/apache/beam/blob/master/sdks/python/apache_beam/tools/map_fn_microbenchmark.py

which enables the two runtime type checking. This'll be much cleaner numbers, and a lot faster turn-around time. (It'd probably be valuable to run this before and after your changes even with typechecking turned off to ensure you're not adding overhead in that case too.)

My goal in suggesting this framework was to gather historical numbers, and to test for performance regressions in future PRs.

Copy link
Contributor

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do the numbers look like now? I'm still concerned a full integration test may be rather noisy to capture this data well and a micro-benchmark would be better suited for this.


class RunTimeTypeCheckOffTest(BaseRunTimeTypeCheckTest):
def __init__(self):
self.runtime_type_check = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than pass this as an attribute of self, can't we pass it as an additional flag in the test? This would allow us to get rid of much if not all of this duplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We looked into this but the CLI flags are only parsed after the LoadTest has been initialized, and by that point it's too late to modify the Pipeline options because that happens in the LoadTest constructor.. or can I still modify the options?

It happens here

]
],
[
title : 'Runtime Type Checking Python Load Test: On | Nested Type Hints',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running load tests is a limited resource; could you test them both in the same pipeline? This should be sufficient for the purposes of noticing regressions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure

I think the original goal of having both nested type hints and simple type hints as separate tests was to see if there was any performance difference between the two when runtime type checking was on, which would help us narrow down whether the performance drop came from the overhead of the decorator versus the actual type check itself, and also to test for the regressions separately.

I can merge them if it's okay with @udim

@saavan-google
Copy link
Contributor Author

Closing this PR because we've collectively come to a consensus that micro-benchmarking will provide better precision, and consume less resources than load testing. This PR can be a historical marker in case we decide to add a load test in the future to check for regressions. Therefore, the micro-benchmark will be added in a different PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants