-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-5878] support DoFns with Keyword-only arguments #9237
[BEAM-5878] support DoFns with Keyword-only arguments #9237
Conversation
R: @tvalentyn |
60caf81
to
26d4a04
Compare
26d4a04
to
0e5bd93
Compare
Hi @lazylynx , is this change ready for review? |
@tvalentyn sorry for no notification, please start reviewing |
sdks/python/apache_beam/transforms/transforms_keyword_only_args_py3_only_test.py
Outdated
Show resolved
Hide resolved
Run Python 3.5 PostCommit |
Run Python 2 PostCommit |
db1c343
to
56fa1b6
Compare
Run Python PreCommit |
… in Python 2 and avoid using exec
56fa1b6
to
d7e1212
Compare
@tvalentyn PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, @lazylynx ! Left a few small comments.
Have you considered making a change in Dill (uqfoundation/dill#313)? Your patch can translate to dill codebase, and it would be a cleaner change there. We can still keep the patch in Beam until Dill comes up with a release. Dill maintainers can provide you with additional feedback.
Also, I will not be unavailable for next 2 weeks. If you would like the PR review to finish before then, you can reach out to https://github.com/udim or https://github.com/aaltay.
Thank you.
sdks/python/apache_beam/transforms/transforms_keyword_only_args_test_py3.py
Show resolved
Hide resolved
def test_combine_keyword_only_args(self): | ||
pipeline = TestPipeline() | ||
|
||
def bounded_sum(values, *s, bound=500): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason you had to change test_combine_keyword_only_args
is that this scenario is actually nondeterministic.
From https://beam.apache.org/documentation/programming-guide/#combine
Because the input data (including the value collection) may be distributed across multiple workers, the combining function might be called multiple times to perform partial combining on subsets of the value collection.
By asserting that result2
is [49]
we imply that a runner will compute partial sum 3 times, which will add 13 three times. So we rely on an implementation detail of the Direct Runner (which executes this test).
The test would be deterministic if we add zeros, e.g. beam.CombineGlobally(bounded_sum, 0, 0)
. This would make the test more stable towards future changes in direct runner, and fit the purpose of this PR to exercise keyword-only arguments. We can also come up with a better example.
And, thanks a lot for calling out changes to the tests!
…yle and None check
c32aa86
to
7721417
Compare
Run Portable_Python PreCommit |
@tvalentyn PTAL |
state=state, listitems=listitems, dictitems=dictitems, | ||
obj=obj) | ||
else: | ||
pickler.save_reduce(func, args, state, listitems, dictitems, obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would not this call back into new_save_reduce
since you patched it in L164?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call back calls save_reduce
function of pickle.Pickler
, super class of dill.dill.Pickler
.
So new_save_reduce
would not be called.
Run Portable_Python PreCommit |
Run Python PreCommit |
@aaltay @tvalentyn PTAL |
func.__kwdefaults__ = fkwdefaults | ||
return func | ||
|
||
def new_save_reduce(self, func, args, state=None, listitems=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a more generic signature of new_save_reduce, for example def new_save_reduce(self, func, args, **kwargs)
? The problem is that we assume a particular version of the API for pickle.save_reduce here, and we can see that it will change in Python 3.8, see https://github.com/python/cpython/blob/c75f0e5bdee3cfaba9fd5b3a8549dec0aba01ebe/Lib/pickle.py#L619.
I think with a generic definition of new_save_reduce
we can still update args
list , and pass **kwargs
to pickler.save_reduce
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite so. I'll work on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @lazylynx . It would be nice to add this to next Beam release that is not be cut in ~2 weeks. Thanks for your help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tvalentyn Sorry for late. PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retest this please |
Run Python 2 PostCommit |
Run Python 3.5 PostCommit |
Run Portable_Python PreCommit |
Merged, thanks again. As discussed before, consider making the change to Dill and reverting the monkeypatching here after Dill picks up the changes. @mmckerns is Dill maintainer and can probably help with the review. |
One Beam SDK users who use master branch reported that monkey-patch caused their pipeline to fail with what appears to be an infinite recursion from new_save_reduce. If line 163 is removed, the problem disappears. The problem persists with the monkey patch, even if the body of new_save_reduce is changed to
@lazylynx I have not investigated that report yet but I'd suggest to revert this change in the mean time until we understand a better way to patch, which may not be necessary soon given your outstanding changes to dill. |
@tvalentyn: I agree, the monkey-patch is better pushed into |
@tvalentyn Understood. Sorry for bothering. |
the relevant changes are merged into |
FYI, I have checked that apache_beam.transforms.transforms_keyword_only_args_test_py3 that we added in this PR passes with dill==0.3.1.dev0. |
@tvalentyn: should be before the end of the month. Trying to get in some more 3.8/3.7 compat, and currently it's slated for release on Sept 22. |
@mmckerns Thanks for the update. Looking forward to that release. By the way, it would be helpful for Dill users to have a summary of release notes associated with releases. |
I know (uqfoundation/dill#223) I attach a milestone to each issue, but that doesn't give a full summary as some features are not reflected in the issues. It's on the todo. |
* Also add back the unit tests introduced in apache#8505 with minor changes.
Hi @mmckerns do you have by chance an updated timeline for next release? I don't know how complicated dill releases are, but would be happy to know if you have any suggestions for how other dill users can help with releases. Thank you! |
The |
@mmckerns Thanks for the update. |
support DoFns with Keyword-only arguments in Python 3 and add test reverted in #8750 with no syntax error in Python 2
tests condition are fixed as follows due to errors:
test_side_input_keyword_only_args
to
test_combine_keyword_only_args
to
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.