Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33653][SQL] DSv2: REFRESH TABLE should recache the table itself #30742

Closed
wants to merge 3 commits into from

Conversation

sunchao
Copy link
Member

@sunchao sunchao commented Dec 11, 2020

What changes were proposed in this pull request?

This changes DSv2 refresh table semantics to also recache the target table itself.

Why are the changes needed?

Currently "REFRESH TABLE" in DSv2 only invalidate all caches referencing the table. With #30403 merged which adds support for caching a DSv2 table, we should also recache the target table itself to make the behavior consistent with DSv1.

Does this PR introduce any user-facing change?

Yes, now refreshing table in DSv2 also recache the target table itself.

How was this patch tested?

Added coverage of this new behavior in the existing UT for v2 refresh table command

@github-actions github-actions bot added the SQL label Dec 11, 2020
@SparkQA
Copy link

SparkQA commented Dec 12, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37283/

@SparkQA
Copy link

SparkQA commented Dec 12, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37283/

@SparkQA
Copy link

SparkQA commented Dec 12, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37288/

@SparkQA
Copy link

SparkQA commented Dec 12, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37288/

@SparkQA
Copy link

SparkQA commented Dec 12, 2020

Test build #132680 has finished for PR 30742 at commit 8f8482a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 12, 2020

Test build #132685 has finished for PR 30742 at commit caa99fe.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val v2Relation = DataSourceV2Relation.create(r.table, Some(r.catalog), Some(r.identifier))
val cache = session.sharedState.cacheManager.lookupCachedData(v2Relation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cache is only necessary in if block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad - it should be checked in the if condition

@SparkQA
Copy link

SparkQA commented Dec 14, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37381/

@SparkQA
Copy link

SparkQA commented Dec 14, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37381/

@SparkQA
Copy link

SparkQA commented Dec 14, 2020

Test build #132779 has finished for PR 30742 at commit 2975e56.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but could you rebase to the master branch once more, @sunchao ?

@sunchao
Copy link
Member Author

sunchao commented Dec 14, 2020

@dongjoon-hyun sure - it's done

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master for Apache Spark 3.2.0.

@dongjoon-hyun
Copy link
Member

Could you make a backporting PR to branch-3.1 please, @sunchao ?

sunchao added a commit to sunchao/spark that referenced this pull request Dec 15, 2020
### What changes were proposed in this pull request?

This changes DSv2 refresh table semantics to also recache the target table itself.

### Why are the changes needed?

Currently "REFRESH TABLE" in DSv2 only invalidate all caches referencing the table. With apache#30403 merged which adds support for caching a DSv2 table, we should also recache the target table itself to make the behavior consistent with DSv1.

### Does this PR introduce _any_ user-facing change?

Yes, now refreshing table in DSv2 also recache the target table itself.
### How was this patch tested?

Added coverage of this new behavior in the existing UT for v2 refresh table command

Closes apache#30742 from sunchao/SPARK-33653.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@sunchao
Copy link
Member Author

sunchao commented Dec 15, 2020

@dongjoon-hyun sure - opened #30769

@sunchao sunchao deleted the SPARK-33653 branch December 15, 2020 00:16
@SparkQA
Copy link

SparkQA commented Dec 15, 2020

Test build #132782 has finished for PR 30742 at commit 3d8d353.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

dongjoon-hyun pushed a commit that referenced this pull request Dec 15, 2020
…itself

This is a backport of #30742 for branch-3.1

### What changes were proposed in this pull request?

This changes DSv2 refresh table semantics to also recache the target table itself.

### Why are the changes needed?

Currently "REFRESH TABLE" in DSv2 only invalidate all caches referencing the table. With #30403 merged which adds support for caching a DSv2 table, we should also recache the target table itself to make the behavior consistent with DSv1.

### Does this PR introduce _any_ user-facing change?

Yes, now refreshing table in DSv2 also recache the target table itself.
### How was this patch tested?

Added coverage of this new behavior in the existing UT for v2 refresh table command.

Closes #30769 from sunchao/SPARK-33653-branch-3.1.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants