Skip to content

[SPARK-23563] config codegen compile cache size#20712

Closed
passionke wants to merge 6 commits intoapache:masterfrom
passionke:feature/codegenerator_cache_size
Closed

[SPARK-23563] config codegen compile cache size#20712
passionke wants to merge 6 commits intoapache:masterfrom
passionke:feature/codegenerator_cache_size

Conversation

@passionke
Copy link

What changes were proposed in this pull request?

config codegen compile class cache size

How was this patch tested?

spark sql integration tests

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@mgaido91
Copy link
Contributor

mgaido91 commented Mar 2, 2018

@cloud-fan @gatorsmile if I am not wrong, this is one of the place which is used not only by the driver but also by the executors, so the introduced parameter won't be evaluated properly, am I right?

@hvanhovell
Copy link
Contributor

@mgaido91 Yeah, you are right. This is pretty pointless unless you either set this as a static spark conf or pass it to the executor using local properties. Both aren't very attractive IMO. @passionke why do we need this?

@kiszk
Copy link
Member

kiszk commented Mar 2, 2018

+1, I would like to know the case where we increased cache hit ratio.
Do we reuse the same plan beyond more than 100 plans?

@hvanhovell
Copy link
Contributor

@passionke can you elaborate on why we need this?

A problem with increasing the cache is that we keep more classes around and those classes can blow out the JVM's internal code caches, which will have a far more negative influence on performance than any increase in the number of cached plans will.

@passionke
Copy link
Author

@mgaido91 @hvanhovell @kiszk

Thank's your attention, My application is a long term spark sql engine server, hundreds of spark sql run on it in spark local model. And these sql will not change until biz logic update.

The sql depends on some tables which will be update before compute.

So, In this special case, if I can make a larger CodeGenerator.cache size, I can reuse GeneratedClass instance, and In my view it will reduce meta space increase and decrease the full gc times

@cloud-fan
Copy link
Contributor

it sounds reasonable to make the cache configurable, but we should figure out the best way to propagate codegen configs to the executor side.

@mgaido91
Copy link
Contributor

mgaido91 commented Mar 6, 2018

@passionke what you did doesn't solve the issue, I think. Your UTs are working because you are checking the value only on the driver.

@passionke
Copy link
Author

oops, so there has no good solution right now? shall I close this PR?

@mgaido91
Copy link
Contributor

mgaido91 commented Mar 6, 2018

@passionke yes, currently there is no good solution. But if you can find one, you're welcome

@passionke passionke closed this Mar 6, 2018
@passionke passionke deleted the feature/codegenerator_cache_size branch March 9, 2018 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants