New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-10427] Benchmark runtime typechecking for the Python SDK #12242
[BEAM-10427] Benchmark runtime typechecking for the Python SDK #12242
Conversation
1cc5ca7
to
91c3afd
Compare
Please update PR desc. on whether it's ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't have a chance to look at this sooner.
Since this is a purely local change, I would suggest rather than adding all this infrastructure simply creating a variant of (or even adding an option to)
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/tools/map_fn_microbenchmark.py
which enables the two runtime type checking. This'll be much cleaner numbers, and a lot faster turn-around time. (It'd probably be valuable to run this before and after your changes even with typechecking turned off to ensure you're not adding overhead in that case too.)
My goal in suggesting this framework was to gather historical numbers, and to test for performance regressions in future PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do the numbers look like now? I'm still concerned a full integration test may be rather noisy to capture this data well and a micro-benchmark would be better suited for this.
|
||
class RunTimeTypeCheckOffTest(BaseRunTimeTypeCheckTest): | ||
def __init__(self): | ||
self.runtime_type_check = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than pass this as an attribute of self, can't we pass it as an additional flag in the test? This would allow us to get rid of much if not all of this duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We looked into this but the CLI flags are only parsed after the LoadTest has been initialized, and by that point it's too late to modify the Pipeline options because that happens in the LoadTest constructor.. or can I still modify the options?
It happens here
] | ||
], | ||
[ | ||
title : 'Runtime Type Checking Python Load Test: On | Nested Type Hints', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running load tests is a limited resource; could you test them both in the same pipeline? This should be sufficient for the purposes of noticing regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sure
I think the original goal of having both nested type hints and simple type hints as separate tests was to see if there was any performance difference between the two when runtime type checking was on, which would help us narrow down whether the performance drop came from the overhead of the decorator versus the actual type check itself, and also to test for the regressions separately.
I can merge them if it's okay with @udim
Closing this PR because we've collectively come to a consensus that micro-benchmarking will provide better precision, and consume less resources than load testing. This PR can be a historical marker in case we decide to add a load test in the future to check for regressions. Therefore, the micro-benchmark will be added in a different PR. |
This PR adds a load test for evaluating the performance of the runtime typechecking system for the Python SDK. It works by comparing the performance of a pipeline with the runtime typechecking feature on versus the same pipeline with it off. This load test also allows the user to specify whether the pipeline should be run with simple typehints (e.g. str) or complex, nested typehints (e.g. Tuple[str, int, Iterable[bool]].
This PR is ready for review.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.