New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names #40287
Conversation
… need unique names
@HyukjinKwon @zhengruifeng the rationale for this change is that analyzer takes care of making lambda variables unique. |
@hvanhovell After my test, |
It seems pyspark supports the nested lambda variables and two PR fix the issue. |
@beliefer scala does support nested lambda variables as well, and they actually work. So either (my) testing on the scala side is incomplete (which might well be the case), or something weird is going on here. |
@hvanhovell Scala also uses
|
@beliefer here is the thing. When this was designed it was mainly aimed at sql, and there we definitely do not generate unique names in lambda functions either. This is all done in the analyzer. We should be able to follow the same path. Do you happen to know if test failing for python also fail for scala? |
It seems only the lambda functions in SQL will be transformed with analyzer. But the scala, pyspark API will not through analyzer. |
Ehhhh... SQL/scala/Python all use the analyzer; they are all just frontends to the same thing. |
I found the reason. Although the scala API use analyzer too. If I removed the and test the case, see at: spark/sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala Line 5250 in 2e7207f
The related PR. |
@hvanhovell Do we still need this change ? |
If the nested lambda issue also exists in the Scala Client, do we need to fix it in the same way? |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
UnresolvedNamedLambdaVariable
do not need unique names in python. We already did this for the scala client, and it is good to have parity between the two implementations.Why are the changes needed?
Try to avoid unique names for
UnresolvedNamedLambdaVariable
.Does this PR introduce any user-facing change?
'No'.
New feature
How was this patch tested?
N/A