Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names #40287

Closed
wants to merge 2 commits into from

Conversation

beliefer
Copy link
Contributor

@beliefer beliefer commented Mar 6, 2023

What changes were proposed in this pull request?

UnresolvedNamedLambdaVariable do not need unique names in python. We already did this for the scala client, and it is good to have parity between the two implementations.

Why are the changes needed?

Try to avoid unique names for UnresolvedNamedLambdaVariable.

Does this PR introduce any user-facing change?

'No'.
New feature

How was this patch tested?

N/A

@hvanhovell
Copy link
Contributor

@HyukjinKwon @zhengruifeng the rationale for this change is that analyzer takes care of making lambda variables unique.

@beliefer
Copy link
Contributor Author

beliefer commented Mar 6, 2023

@hvanhovell After my test, python/run-tests --testnames 'pyspark.sql.tests.test_functions' will not passed.

@zhengruifeng
Copy link
Contributor

I guess we will need to rewrite the lamda function in spark connect planner.

cc @ueshin as well, since existing implementation follows the fix in #32523

@beliefer
Copy link
Contributor Author

beliefer commented Mar 6, 2023

image

@beliefer
Copy link
Contributor Author

beliefer commented Mar 6, 2023

It seems pyspark supports the nested lambda variables and two PR fix the issue.

@hvanhovell
Copy link
Contributor

@beliefer scala does support nested lambda variables as well, and they actually work. So either (my) testing on the scala side is incomplete (which might well be the case), or something weird is going on here.

@beliefer
Copy link
Contributor Author

beliefer commented Mar 7, 2023

@hvanhovell Scala also uses UnresolvedNamedLambdaVariable.freshVarName("x") to get the unique names. see:

private def createLambda(f: Column => Column) = {

@hvanhovell
Copy link
Contributor

@beliefer here is the thing. When this was designed it was mainly aimed at sql, and there we definitely do not generate unique names in lambda functions either. This is all done in the analyzer. We should be able to follow the same path.

Do you happen to know if test failing for python also fail for scala?

@beliefer
Copy link
Contributor Author

beliefer commented Mar 7, 2023

@beliefer here is the thing. When this was designed it was mainly aimed at sql, and there we definitely do not generate unique names in lambda functions either. This is all done in the analyzer. We should be able to follow the same path.

It seems only the lambda functions in SQL will be transformed with analyzer. But the scala, pyspark API will not through analyzer.

@hvanhovell
Copy link
Contributor

Ehhhh... SQL/scala/Python all use the analyzer; they are all just frontends to the same thing.

@beliefer
Copy link
Contributor Author

beliefer commented Mar 7, 2023

Ehhhh... SQL/scala/Python all use the analyzer; they are all just frontends to the same thing.

I found the reason. Although the scala API use analyzer too. object ResolveLambdaVariables extends Rule[LogicalPlan] can't fix the issue.

If I removed the UnresolvedNamedLambdaVariable.freshVarName(...)

image

and test the case, see at:

test("SPARK-34794: lambda variable name issues in nested functions") {

image

The related PR.
#32424

@beliefer
Copy link
Contributor Author

beliefer commented Mar 8, 2023

@hvanhovell Do we still need this change ?

@zhengruifeng
Copy link
Contributor

If the nested lambda issue also exists in the Scala Client, do we need to fix it in the same way?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jun 17, 2023
@github-actions github-actions bot closed this Jun 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants