[SPARK-29550][SQL] - enhance session catalog locking #26213
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
In my streaming application
spark.streaming.concurrentJobs
is set to 50 which is used as size for underlying thread pool. I perform several sql operations on dataframes and automatically create/alter tables/view in runtime. I order to do that i invokecreate ... if not exists operations
on driver on each batch invocation. Once i noticed that most of batch time is spent on driver but not on executors. I made a thread dump and figured out that most of the threads are blocked on SessionCatalog operation waiting for a lock.Existing implementation of SessionCatalog uses a single lock which is used almost by all the methods to guard
currentDb
andtempViews
variables. I propose to enhance locking behaviour of SessionCatalog by :ReadWriteLock
which allows to execute read operations concurrently.Also it's possible to go even further and strip locks for
currentDb
andtempViews
but i'm not sure whether it's possible from the implementation point of view.Probably someone will help me with this?
How was this patch tested?
Only via existing test suits.