Skip to content

[BEAM-9547] Add basic support for DataFrame.{eval,query}#13264

Merged
TheNeuralBit merged 6 commits intoapache:masterfrom
TheNeuralBit:eval-query
Nov 9, 2020
Merged

[BEAM-9547] Add basic support for DataFrame.{eval,query}#13264
TheNeuralBit merged 6 commits intoapache:masterfrom
TheNeuralBit:eval-query

Conversation

@TheNeuralBit
Copy link
Member

Adds support for executing eval and query as element-wise methods. Restricts most kwargs to their defaults for simplicity, only parser, engine and truediv are still allowed to be changed.

This PR also includes a commit to pass *args through maybe_inplace as not having it breaks eval.

Post-Commit Tests Status (on master branch)

Lang SDK Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status --- Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status --- --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently if the user specifies a local variable in expr with @<local> this will fail at construction time (when generating the proxy), but the error message isn't helpful, it just indicates <local> doesn't exist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'll update the XXX with a jira number before merging if this gets approved, I just didn't want to create it prematurely)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead and create the jira. Can we give a better message if @ is in the expression?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll add a regex to check for @<py identifier> and point to the jira.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@TheNeuralBit
Copy link
Member Author

R: @robertwb

@TheNeuralBit TheNeuralBit requested a review from robertwb November 4, 2020 23:34
@functools.wraps(func)
def wrapper(self, inplace=False, **kwargs):
result = func(self, **kwargs)
def wrapper(self, *args, inplace=False, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dangerous, as it won't catch if inplace was passed as a non-keyword argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could happen only with methods where the first kwarg is inplace right? I wouldn't think there's many of those but I suppose there could be.

So what's the right thing to do here? We could use inspect to find inplace in the args/kwargs of func and handle it correctly. Or more simply just never use this in methods where we need to pass through args.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't use this in places where we may need to pass args. (Actually, if we're just passing a*args, we can't generally detect inplace safely at all).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we passed the base type as in populate_defaults we could inspect it's method. Seems like a lot of trouble.. but I'm just worried that there's some edge case where we're dropping non-keyword args, and it would be great to catch that possibility at import time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this change and instead handle inplace in the new methods, PTAL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead and create the jira. Can we give a better message if @ is in the expression?

Copy link
Contributor

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@TheNeuralBit TheNeuralBit merged commit 96610c9 into apache:master Nov 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants