Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37202][SQL] Use AnalysisContext to track referred temp functions #34546

Closed
wants to merge 8 commits into from

Conversation

linhongliu-db
Copy link
Contributor

@linhongliu-db linhongliu-db commented Nov 10, 2021

What changes were proposed in this pull request?

This PR uses AnalysisContext to track the referred temp functions in order to fix a temp
function resolution issue when it's registered with a FunctionBuilder and referred by a temp view.

During temporary view creation, we need to collect all the temp functions and save them
to the metadata. So that next time when resolving the view SQL text, the functions can be
resolved correctly. But if the temp function is registered with a FunctionBuilder, it's not a
UserDefinedExpression so it cannot be collected as a temp function. As a result, the next time
when the analyzer resolves a temp view, the registered function couldn't be found.

Example:

val func = CatalogFunction(FunctionIdentifier("tempFunc", None), ...)
val builder = (e: Seq[Expression]) => e.head
spark.sessionState.catalog.registerFunction(func, Some(builder))
sql("create temp view tv as select tempFunc(a, b) from values (1, 2) t(a, b)")
sql("select * from tv").collect()

Why are the changes needed?

bug-fix

Does this PR introduce any user-facing change?

no

How was this patch tested?

newly added test cases.

@linhongliu-db linhongliu-db changed the title [SPARK-37702] Use AnalysisContext to track referred temp functions [SPARK-37702][SQL] Use AnalysisContext to track referred temp functions Nov 10, 2021
@github-actions github-actions bot added the SQL label Nov 10, 2021
@SparkQA
Copy link

SparkQA commented Nov 10, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49532/

@SparkQA
Copy link

SparkQA commented Nov 10, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49532/

@SparkQA
Copy link

SparkQA commented Nov 10, 2021

Test build #145062 has finished for PR 34546 at commit 6f2e9a6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// Collect the referred temporary functions from AnalysisContext
copy(referredTempFunctions = analysisContext.referredTempFunctionNames.toSeq)
}

def markAsAnalyzed(): LogicalPlan = copy(isAnalyzed = true)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm thinking about combining them to

def markAsAnalyzed(analysisContext: AnalysisContext): LogicalPlan = {
  copy(
    isAnalyzed = true,
    referredTempFunctions = analysisContext.referredTempFunctionNames.toSeq)
}

cannot decide which is better

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think combining is better.

executeSameContext(plan)
} finally {
AnalysisContext.reset()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rule ResolveEncodersInUDF will call analyzer. execute() again, the AnalysisContext will be messed up after that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a bug, can we open a separate PR for it? then we can backport it.

@SparkQA
Copy link

SparkQA commented Nov 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49549/

@SparkQA
Copy link

SparkQA commented Nov 11, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49549/

@SparkQA
Copy link

SparkQA commented Nov 11, 2021

Test build #145080 has finished for PR 34546 at commit aa9ef80.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@linhongliu-db
Copy link
Contributor Author

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Nov 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49566/

@SparkQA
Copy link

SparkQA commented Nov 11, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49566/

@SparkQA
Copy link

SparkQA commented Nov 11, 2021

Test build #145096 has finished for PR 34546 at commit 76153a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 12, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49616/

@SparkQA
Copy link

SparkQA commented Nov 12, 2021

Test build #145145 has finished for PR 34546 at commit 44ce71c.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 12, 2021

Test build #145154 has finished for PR 34546 at commit f59a709.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 12, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49626/

@SparkQA
Copy link

SparkQA commented Nov 12, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49629/

@SparkQA
Copy link

SparkQA commented Nov 12, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49629/

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.2!

@cloud-fan cloud-fan closed this in 68a0ab5 Nov 12, 2021
cloud-fan pushed a commit that referenced this pull request Nov 12, 2021
This PR uses `AnalysisContext` to track the referred temp functions in order to fix a temp
function resolution issue when it's registered with a `FunctionBuilder` and referred by a temp view.

During temporary view creation, we need to collect all the temp functions and save them
to the metadata. So that next time when resolving the view SQL text, the functions can be
resolved correctly. But if the temp function is registered with a `FunctionBuilder`, it's not a
`UserDefinedExpression` so it cannot be collected as a temp function. As a result, the next time
when the analyzer resolves a temp view, the registered function couldn't be found.

Example:
```scala
val func = CatalogFunction(FunctionIdentifier("tempFunc", None), ...)
val builder = (e: Seq[Expression]) => e.head
spark.sessionState.catalog.registerFunction(func, Some(builder))
sql("create temp view tv as select tempFunc(a, b) from values (1, 2) t(a, b)")
sql("select * from tv").collect()
```

bug-fix

no

newly added test cases.

Closes #34546 from linhongliu-db/SPARK-37702-ver3.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 68a0ab5)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan cloud-fan changed the title [SPARK-37702][SQL] Use AnalysisContext to track referred temp functions [SPARK-37202][SQL] Use AnalysisContext to track referred temp functions Nov 12, 2021
@SparkQA
Copy link

SparkQA commented Nov 12, 2021

Test build #145158 has finished for PR 34546 at commit 86ab71b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

cloud-fan pushed a commit that referenced this pull request Nov 17, 2021
…ableAsSelect

### What changes were proposed in this pull request?
This is a follow-up of PR #34546 which uses the `AnalysisContext` to store referred
temporary functions when creating temp views. But the `CacheTableAsSelect` also
needs to update the temporary functions because it's a temp view under the hood.

### Why are the changes needed?
Followup of #34546

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
newly added ut

Closes #34603 from linhongliu-db/SPARK-37702-followup.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Nov 17, 2021
…ableAsSelect

### What changes were proposed in this pull request?
This is a follow-up of PR #34546 which uses the `AnalysisContext` to store referred
temporary functions when creating temp views. But the `CacheTableAsSelect` also
needs to update the temporary functions because it's a temp view under the hood.

### Why are the changes needed?
Followup of #34546

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
newly added ut

Closes #34603 from linhongliu-db/SPARK-37702-followup.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit b6c472e)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
sunchao pushed a commit to sunchao/spark that referenced this pull request Dec 8, 2021
This PR uses `AnalysisContext` to track the referred temp functions in order to fix a temp
function resolution issue when it's registered with a `FunctionBuilder` and referred by a temp view.

During temporary view creation, we need to collect all the temp functions and save them
to the metadata. So that next time when resolving the view SQL text, the functions can be
resolved correctly. But if the temp function is registered with a `FunctionBuilder`, it's not a
`UserDefinedExpression` so it cannot be collected as a temp function. As a result, the next time
when the analyzer resolves a temp view, the registered function couldn't be found.

Example:
```scala
val func = CatalogFunction(FunctionIdentifier("tempFunc", None), ...)
val builder = (e: Seq[Expression]) => e.head
spark.sessionState.catalog.registerFunction(func, Some(builder))
sql("create temp view tv as select tempFunc(a, b) from values (1, 2) t(a, b)")
sql("select * from tv").collect()
```

bug-fix

no

newly added test cases.

Closes apache#34546 from linhongliu-db/SPARK-37702-ver3.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 68a0ab5)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
sunchao pushed a commit to sunchao/spark that referenced this pull request Dec 8, 2021
…ableAsSelect

### What changes were proposed in this pull request?
This is a follow-up of PR apache#34546 which uses the `AnalysisContext` to store referred
temporary functions when creating temp views. But the `CacheTableAsSelect` also
needs to update the temporary functions because it's a temp view under the hood.

### Why are the changes needed?
Followup of apache#34546

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
newly added ut

Closes apache#34603 from linhongliu-db/SPARK-37702-followup.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit b6c472e)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
catalinii pushed a commit to lyft/spark that referenced this pull request Feb 22, 2022
This PR uses `AnalysisContext` to track the referred temp functions in order to fix a temp
function resolution issue when it's registered with a `FunctionBuilder` and referred by a temp view.

During temporary view creation, we need to collect all the temp functions and save them
to the metadata. So that next time when resolving the view SQL text, the functions can be
resolved correctly. But if the temp function is registered with a `FunctionBuilder`, it's not a
`UserDefinedExpression` so it cannot be collected as a temp function. As a result, the next time
when the analyzer resolves a temp view, the registered function couldn't be found.

Example:
```scala
val func = CatalogFunction(FunctionIdentifier("tempFunc", None), ...)
val builder = (e: Seq[Expression]) => e.head
spark.sessionState.catalog.registerFunction(func, Some(builder))
sql("create temp view tv as select tempFunc(a, b) from values (1, 2) t(a, b)")
sql("select * from tv").collect()
```

bug-fix

no

newly added test cases.

Closes apache#34546 from linhongliu-db/SPARK-37702-ver3.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 68a0ab5)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
catalinii pushed a commit to lyft/spark that referenced this pull request Feb 22, 2022
…ableAsSelect

### What changes were proposed in this pull request?
This is a follow-up of PR apache#34546 which uses the `AnalysisContext` to store referred
temporary functions when creating temp views. But the `CacheTableAsSelect` also
needs to update the temporary functions because it's a temp view under the hood.

### Why are the changes needed?
Followup of apache#34546

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
newly added ut

Closes apache#34603 from linhongliu-db/SPARK-37702-followup.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit b6c472e)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
catalinii pushed a commit to lyft/spark that referenced this pull request Mar 4, 2022
This PR uses `AnalysisContext` to track the referred temp functions in order to fix a temp
function resolution issue when it's registered with a `FunctionBuilder` and referred by a temp view.

During temporary view creation, we need to collect all the temp functions and save them
to the metadata. So that next time when resolving the view SQL text, the functions can be
resolved correctly. But if the temp function is registered with a `FunctionBuilder`, it's not a
`UserDefinedExpression` so it cannot be collected as a temp function. As a result, the next time
when the analyzer resolves a temp view, the registered function couldn't be found.

Example:
```scala
val func = CatalogFunction(FunctionIdentifier("tempFunc", None), ...)
val builder = (e: Seq[Expression]) => e.head
spark.sessionState.catalog.registerFunction(func, Some(builder))
sql("create temp view tv as select tempFunc(a, b) from values (1, 2) t(a, b)")
sql("select * from tv").collect()
```

bug-fix

no

newly added test cases.

Closes apache#34546 from linhongliu-db/SPARK-37702-ver3.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 68a0ab5)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
catalinii pushed a commit to lyft/spark that referenced this pull request Mar 4, 2022
…ableAsSelect

### What changes were proposed in this pull request?
This is a follow-up of PR apache#34546 which uses the `AnalysisContext` to store referred
temporary functions when creating temp views. But the `CacheTableAsSelect` also
needs to update the temporary functions because it's a temp view under the hood.

### Why are the changes needed?
Followup of apache#34546

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
newly added ut

Closes apache#34603 from linhongliu-db/SPARK-37702-followup.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit b6c472e)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants