-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37202][SQL] Use AnalysisContext to track referred temp functions #34546
Conversation
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #145062 has finished for PR 34546 at commit
|
// Collect the referred temporary functions from AnalysisContext | ||
copy(referredTempFunctions = analysisContext.referredTempFunctionNames.toSeq) | ||
} | ||
|
||
def markAsAnalyzed(): LogicalPlan = copy(isAnalyzed = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm thinking about combining them to
def markAsAnalyzed(analysisContext: AnalysisContext): LogicalPlan = {
copy(
isAnalyzed = true,
referredTempFunctions = analysisContext.referredTempFunctionNames.toSeq)
}
cannot decide which is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think combining is better.
executeSameContext(plan) | ||
} finally { | ||
AnalysisContext.reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rule ResolveEncodersInUDF
will call analyzer. execute()
again, the AnalysisContext will be messed up after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a bug, can we open a separate PR for it? then we can backport it.
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #145080 has finished for PR 34546 at commit
|
cc @cloud-fan |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #145096 has finished for PR 34546 at commit
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
Test build #145145 has finished for PR 34546 at commit
|
Test build #145154 has finished for PR 34546 at commit
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
Kubernetes integration test starting |
Kubernetes integration test status failure |
thanks, merging to master/3.2! |
This PR uses `AnalysisContext` to track the referred temp functions in order to fix a temp function resolution issue when it's registered with a `FunctionBuilder` and referred by a temp view. During temporary view creation, we need to collect all the temp functions and save them to the metadata. So that next time when resolving the view SQL text, the functions can be resolved correctly. But if the temp function is registered with a `FunctionBuilder`, it's not a `UserDefinedExpression` so it cannot be collected as a temp function. As a result, the next time when the analyzer resolves a temp view, the registered function couldn't be found. Example: ```scala val func = CatalogFunction(FunctionIdentifier("tempFunc", None), ...) val builder = (e: Seq[Expression]) => e.head spark.sessionState.catalog.registerFunction(func, Some(builder)) sql("create temp view tv as select tempFunc(a, b) from values (1, 2) t(a, b)") sql("select * from tv").collect() ``` bug-fix no newly added test cases. Closes #34546 from linhongliu-db/SPARK-37702-ver3. Authored-by: Linhong Liu <linhong.liu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 68a0ab5) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Test build #145158 has finished for PR 34546 at commit
|
…ableAsSelect ### What changes were proposed in this pull request? This is a follow-up of PR #34546 which uses the `AnalysisContext` to store referred temporary functions when creating temp views. But the `CacheTableAsSelect` also needs to update the temporary functions because it's a temp view under the hood. ### Why are the changes needed? Followup of #34546 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? newly added ut Closes #34603 from linhongliu-db/SPARK-37702-followup. Authored-by: Linhong Liu <linhong.liu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ableAsSelect ### What changes were proposed in this pull request? This is a follow-up of PR #34546 which uses the `AnalysisContext` to store referred temporary functions when creating temp views. But the `CacheTableAsSelect` also needs to update the temporary functions because it's a temp view under the hood. ### Why are the changes needed? Followup of #34546 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? newly added ut Closes #34603 from linhongliu-db/SPARK-37702-followup. Authored-by: Linhong Liu <linhong.liu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit b6c472e) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This PR uses `AnalysisContext` to track the referred temp functions in order to fix a temp function resolution issue when it's registered with a `FunctionBuilder` and referred by a temp view. During temporary view creation, we need to collect all the temp functions and save them to the metadata. So that next time when resolving the view SQL text, the functions can be resolved correctly. But if the temp function is registered with a `FunctionBuilder`, it's not a `UserDefinedExpression` so it cannot be collected as a temp function. As a result, the next time when the analyzer resolves a temp view, the registered function couldn't be found. Example: ```scala val func = CatalogFunction(FunctionIdentifier("tempFunc", None), ...) val builder = (e: Seq[Expression]) => e.head spark.sessionState.catalog.registerFunction(func, Some(builder)) sql("create temp view tv as select tempFunc(a, b) from values (1, 2) t(a, b)") sql("select * from tv").collect() ``` bug-fix no newly added test cases. Closes apache#34546 from linhongliu-db/SPARK-37702-ver3. Authored-by: Linhong Liu <linhong.liu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 68a0ab5) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ableAsSelect ### What changes were proposed in this pull request? This is a follow-up of PR apache#34546 which uses the `AnalysisContext` to store referred temporary functions when creating temp views. But the `CacheTableAsSelect` also needs to update the temporary functions because it's a temp view under the hood. ### Why are the changes needed? Followup of apache#34546 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? newly added ut Closes apache#34603 from linhongliu-db/SPARK-37702-followup. Authored-by: Linhong Liu <linhong.liu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit b6c472e) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This PR uses `AnalysisContext` to track the referred temp functions in order to fix a temp function resolution issue when it's registered with a `FunctionBuilder` and referred by a temp view. During temporary view creation, we need to collect all the temp functions and save them to the metadata. So that next time when resolving the view SQL text, the functions can be resolved correctly. But if the temp function is registered with a `FunctionBuilder`, it's not a `UserDefinedExpression` so it cannot be collected as a temp function. As a result, the next time when the analyzer resolves a temp view, the registered function couldn't be found. Example: ```scala val func = CatalogFunction(FunctionIdentifier("tempFunc", None), ...) val builder = (e: Seq[Expression]) => e.head spark.sessionState.catalog.registerFunction(func, Some(builder)) sql("create temp view tv as select tempFunc(a, b) from values (1, 2) t(a, b)") sql("select * from tv").collect() ``` bug-fix no newly added test cases. Closes apache#34546 from linhongliu-db/SPARK-37702-ver3. Authored-by: Linhong Liu <linhong.liu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 68a0ab5) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ableAsSelect ### What changes were proposed in this pull request? This is a follow-up of PR apache#34546 which uses the `AnalysisContext` to store referred temporary functions when creating temp views. But the `CacheTableAsSelect` also needs to update the temporary functions because it's a temp view under the hood. ### Why are the changes needed? Followup of apache#34546 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? newly added ut Closes apache#34603 from linhongliu-db/SPARK-37702-followup. Authored-by: Linhong Liu <linhong.liu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit b6c472e) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This PR uses `AnalysisContext` to track the referred temp functions in order to fix a temp function resolution issue when it's registered with a `FunctionBuilder` and referred by a temp view. During temporary view creation, we need to collect all the temp functions and save them to the metadata. So that next time when resolving the view SQL text, the functions can be resolved correctly. But if the temp function is registered with a `FunctionBuilder`, it's not a `UserDefinedExpression` so it cannot be collected as a temp function. As a result, the next time when the analyzer resolves a temp view, the registered function couldn't be found. Example: ```scala val func = CatalogFunction(FunctionIdentifier("tempFunc", None), ...) val builder = (e: Seq[Expression]) => e.head spark.sessionState.catalog.registerFunction(func, Some(builder)) sql("create temp view tv as select tempFunc(a, b) from values (1, 2) t(a, b)") sql("select * from tv").collect() ``` bug-fix no newly added test cases. Closes apache#34546 from linhongliu-db/SPARK-37702-ver3. Authored-by: Linhong Liu <linhong.liu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 68a0ab5) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ableAsSelect ### What changes were proposed in this pull request? This is a follow-up of PR apache#34546 which uses the `AnalysisContext` to store referred temporary functions when creating temp views. But the `CacheTableAsSelect` also needs to update the temporary functions because it's a temp view under the hood. ### Why are the changes needed? Followup of apache#34546 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? newly added ut Closes apache#34603 from linhongliu-db/SPARK-37702-followup. Authored-by: Linhong Liu <linhong.liu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit b6c472e) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This PR uses
AnalysisContext
to track the referred temp functions in order to fix a tempfunction resolution issue when it's registered with a
FunctionBuilder
and referred by a temp view.During temporary view creation, we need to collect all the temp functions and save them
to the metadata. So that next time when resolving the view SQL text, the functions can be
resolved correctly. But if the temp function is registered with a
FunctionBuilder
, it's not aUserDefinedExpression
so it cannot be collected as a temp function. As a result, the next timewhen the analyzer resolves a temp view, the registered function couldn't be found.
Example:
Why are the changes needed?
bug-fix
Does this PR introduce any user-facing change?
no
How was this patch tested?
newly added test cases.