Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-5878] support DoFns with Keyword-only arguments #9237

Merged
merged 6 commits into from
Sep 3, 2019

Conversation

lazylynx
Copy link
Contributor

@lazylynx lazylynx commented Aug 3, 2019

support DoFns with Keyword-only arguments in Python 3 and add test reverted in #8750 with no syntax error in Python 2

tests condition are fixed as follows due to errors:

test_side_input_keyword_only_args
result2 = pcol | 'compute2' >> beam.FlatMap(
  sort_with_side_inputs,
  beam.pvalue.AsIter(side))

to

result2 = pcol | 'compute2' >> beam.FlatMap(
  sort_with_side_inputs,
  beam.pvalue.AsList(side))
test_combine_keyword_only_args
assert_that(result2, equal_to([23]), label='assert2')

to

assert_that(result2, equal_to([49]), label='assert2')

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@aaltay
Copy link
Member

aaltay commented Aug 3, 2019

R: @tvalentyn

@lazylynx lazylynx force-pushed the support_keyword-only-arguments branch from 60caf81 to 26d4a04 Compare August 3, 2019 09:51
@lazylynx lazylynx force-pushed the support_keyword-only-arguments branch from 26d4a04 to 0e5bd93 Compare August 3, 2019 11:41
@tvalentyn
Copy link
Contributor

Hi @lazylynx , is this change ready for review?

@lazylynx
Copy link
Contributor Author

lazylynx commented Aug 5, 2019

@tvalentyn sorry for no notification, please start reviewing

@tvalentyn
Copy link
Contributor

Run Python 3.5 PostCommit

@tvalentyn
Copy link
Contributor

Run Python 2 PostCommit

@lazylynx lazylynx force-pushed the support_keyword-only-arguments branch from db1c343 to 56fa1b6 Compare August 7, 2019 16:35
@lazylynx
Copy link
Contributor Author

lazylynx commented Aug 7, 2019

Run Python PreCommit

@lazylynx lazylynx force-pushed the support_keyword-only-arguments branch from 56fa1b6 to d7e1212 Compare August 8, 2019 14:32
@lazylynx
Copy link
Contributor Author

lazylynx commented Aug 8, 2019

@tvalentyn PTAL

Copy link
Contributor

@tvalentyn tvalentyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, @lazylynx ! Left a few small comments.

Have you considered making a change in Dill (uqfoundation/dill#313)? Your patch can translate to dill codebase, and it would be a cleaner change there. We can still keep the patch in Beam until Dill comes up with a release. Dill maintainers can provide you with additional feedback.

Also, I will not be unavailable for next 2 weeks. If you would like the PR review to finish before then, you can reach out to https://github.com/udim or https://github.com/aaltay.

Thank you.

def test_combine_keyword_only_args(self):
pipeline = TestPipeline()

def bounded_sum(values, *s, bound=500):
Copy link
Contributor

@tvalentyn tvalentyn Aug 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason you had to change test_combine_keyword_only_args is that this scenario is actually nondeterministic.
From https://beam.apache.org/documentation/programming-guide/#combine

Because the input data (including the value collection) may be distributed across multiple workers, the combining function might be called multiple times to perform partial combining on subsets of the value collection.

By asserting that result2 is [49] we imply that a runner will compute partial sum 3 times, which will add 13 three times. So we rely on an implementation detail of the Direct Runner (which executes this test).
The test would be deterministic if we add zeros, e.g. beam.CombineGlobally(bounded_sum, 0, 0). This would make the test more stable towards future changes in direct runner, and fit the purpose of this PR to exercise keyword-only arguments. We can also come up with a better example.

And, thanks a lot for calling out changes to the tests!

@lazylynx lazylynx force-pushed the support_keyword-only-arguments branch from c32aa86 to 7721417 Compare August 12, 2019 13:42
@lazylynx
Copy link
Contributor Author

Run Portable_Python PreCommit

@lazylynx
Copy link
Contributor Author

@tvalentyn PTAL
Thank you for comments, especially with combine function!
And I have started working on uqfoundation/dill#313 and will create PR before long.

state=state, listitems=listitems, dictitems=dictitems,
obj=obj)
else:
pickler.save_reduce(func, args, state, listitems, dictitems, obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would not this call back into new_save_reduce since you patched it in L164?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call back calls save_reduce function of pickle.Pickler, super class of dill.dill.Pickler.
So new_save_reduce would not be called.

sdks/python/apache_beam/internal/pickler.py Outdated Show resolved Hide resolved
sdks/python/apache_beam/internal/pickler.py Show resolved Hide resolved
@lazylynx
Copy link
Contributor Author

Run Portable_Python PreCommit

@lazylynx
Copy link
Contributor Author

Run Python PreCommit

@lazylynx
Copy link
Contributor Author

@aaltay @tvalentyn PTAL

func.__kwdefaults__ = fkwdefaults
return func

def new_save_reduce(self, func, args, state=None, listitems=None,
Copy link
Contributor

@tvalentyn tvalentyn Aug 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a more generic signature of new_save_reduce, for example def new_save_reduce(self, func, args, **kwargs)? The problem is that we assume a particular version of the API for pickle.save_reduce here, and we can see that it will change in Python 3.8, see https://github.com/python/cpython/blob/c75f0e5bdee3cfaba9fd5b3a8549dec0aba01ebe/Lib/pickle.py#L619.

I think with a generic definition of new_save_reduce we can still update args list , and pass **kwargs to pickler.save_reduce.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite so. I'll work on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @lazylynx . It would be nice to add this to next Beam release that is not be cut in ~2 weeks. Thanks for your help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tvalentyn Sorry for late. PTAL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, and thanks a lot, @lazylynx ! The change looks good to me. I'm going to re-run a few tests to make sure that they still pass with this change combined with a recent upgrade of dill version (#9419).

@tvalentyn
Copy link
Contributor

retest this please

@tvalentyn
Copy link
Contributor

Run Python 2 PostCommit

@tvalentyn
Copy link
Contributor

Run Python 3.5 PostCommit

@tvalentyn
Copy link
Contributor

Run Portable_Python PreCommit

@tvalentyn tvalentyn merged commit 0c2d8be into apache:master Sep 3, 2019
@tvalentyn
Copy link
Contributor

Merged, thanks again. As discussed before, consider making the change to Dill and reverting the monkeypatching here after Dill picks up the changes. @mmckerns is Dill maintainer and can probably help with the review.

@tvalentyn
Copy link
Contributor

One Beam SDK users who use master branch reported that monkey-patch caused their pipeline to fail with what appears to be an infinite recursion from new_save_reduce.

If line 163 is removed, the problem disappears. The problem persists with the monkey patch, even if the body of new_save_reduce is changed to

StockPickler.save_reduce(self, func, args, *other_args, **kwargs)

@lazylynx I have not investigated that report yet but I'd suggest to revert this change in the mean time until we understand a better way to patch, which may not be necessary soon given your outstanding changes to dill.

@mmckerns
Copy link

mmckerns commented Sep 8, 2019

@tvalentyn: I agree, the monkey-patch is better pushed into dill. I have several 3.7/3.8 specific changes in the pipeline now, and this was one I hadn't gotten to yet. Will review the PR and move forward to get it into dill, which is a cleaner solution. Thanks @lazylynx

@lazylynx
Copy link
Contributor Author

lazylynx commented Sep 9, 2019

@tvalentyn Understood. Sorry for bothering.
@mmckerns Thanks. I'm waiting for your review.

@mmckerns
Copy link

mmckerns commented Sep 9, 2019

the relevant changes are merged into dill.

@tvalentyn
Copy link
Contributor

thanks, @lazylynx and @mmckerns .

@tvalentyn
Copy link
Contributor

FYI, I have checked that apache_beam.transforms.transforms_keyword_only_args_test_py3 that we added in this PR passes with dill==0.3.1.dev0.
@mmckerns you by chance have an idea when dill 0.3.1 may be released? Thank you.

@mmckerns
Copy link

@tvalentyn: should be before the end of the month. Trying to get in some more 3.8/3.7 compat, and currently it's slated for release on Sept 22.

@tvalentyn
Copy link
Contributor

@mmckerns Thanks for the update. Looking forward to that release. By the way, it would be helpful for Dill users to have a summary of release notes associated with releases.

@mmckerns
Copy link

I know (uqfoundation/dill#223) I attach a milestone to each issue, but that doesn't give a full summary as some features are not reflected in the issues. It's on the todo.

soyrice pushed a commit to soyrice/beam that referenced this pull request Sep 19, 2019
* Also add back the unit tests introduced in apache#8505 with minor changes.
soyrice pushed a commit to soyrice/beam that referenced this pull request Sep 19, 2019
@tvalentyn
Copy link
Contributor

@tvalentyn: should be before the end of the month. Trying to get in some more 3.8/3.7 compat, and currently it's slated for release on Sept 22.

Hi @mmckerns do you have by chance an updated timeline for next release? I don't know how complicated dill releases are, but would be happy to know if you have any suggestions for how other dill users can help with releases. Thank you!

@mmckerns
Copy link

mmckerns commented Sep 25, 2019

The dill release was pushed back a few days in dealing with some 3.8 issues, and now will happen on the 27th. I try not to slide the release dates back, but we didn't allocate time sufficiently for the extent of the changes that came with pickling in 3.8.

@tvalentyn
Copy link
Contributor

@mmckerns Thanks for the update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants