Skip to content

[SPARK-30470][SQL]Uncache cached temp tables on session closed#27149

Closed
liupc wants to merge 2 commits intoapache:masterfrom
liupc:SPARK-30470
Closed

[SPARK-30470][SQL]Uncache cached temp tables on session closed#27149
liupc wants to merge 2 commits intoapache:masterfrom
liupc:SPARK-30470

Conversation

@liupc
Copy link

@liupc liupc commented Jan 9, 2020

What changes were proposed in this pull request?

Currently, Spark will not cleanup cached tables in tempViews produced by sql like following
CACHE TABLE table1 as SELECT ....
There are risks that the uncache table not called due to session closed unexpectedly, or user closed manually. Then these temp views will lost, and we can not visit them in other session, but the cached plan still exists in the CacheManager.
Moreover, the leaks may cause the failure of the subsequent query.

Caused by: java.io.FileNotFoundException: File does not exist: /userxxx/xxx/dt=20200107/data__db60e76d_91b8_42f3_909d_5c68692ecdd4
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.

This PR will fix it.

Why are the changes needed?

This PR will fix the above issues by uncache cached temp tables when closing session.

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@maropu
Copy link
Member

maropu commented Jan 10, 2020

Why did you add a public interface for that? I think this issue should be handled implicitly for users.

@liupc
Copy link
Author

liupc commented Jan 10, 2020

Thanks for reply @maropu , Yes, this should be handled implicitly for users, I'll make it handled implictly for users.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @liupc . Could you provide a reproducible procedure on master branch? Or at least, Apache Spark 3.0.0-preview2?

Also, cc @LantaoJin because this might be a duplication of #26543 .

@dongjoon-hyun
Copy link
Member

Gentle ping, @liupc .

@liupc
Copy link
Author

liupc commented Jan 20, 2020

@dongjoon-hyun Thanks, I think this is a duplication of of #26543, I'll close it.

@liupc liupc closed this Jan 20, 2020
@maropu
Copy link
Member

maropu commented Jan 20, 2020

I've closed the corresponding JIRA, too.

@dongjoon-hyun
Copy link
Member

Thank you for confirming and closing PRs, @liupc and @maropu .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants