New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-14577][SQL] Add spark.sql.codegen.maxCaseBranches config option #12353
Conversation
Test build #55701 has finished for PR 12353 at commit
|
@@ -52,6 +52,8 @@ case class ExprCode(var code: String, var isNull: String, var value: String) | |||
*/ | |||
class CodegenContext { | |||
|
|||
var conf: CatalystConf = null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is really hacky -- i'd put this in the constructor and make it a val rather than a var. and maybe we can create a CodegenConf instead of reusing CatalystConf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, Sure. I'll change those things.
Test build #55702 has finished for PR 12353 at commit
|
Test build #55705 has finished for PR 12353 at commit
|
@@ -620,7 +622,7 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin | |||
* expressions that don't support codegen | |||
*/ | |||
def newCodeGenContext(): CodegenContext = { | |||
new CodegenContext | |||
new CodegenContext(new SimpleCodegenConf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.
CodegenContext
had better be a parameter.
I will fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I overlooked the purpose of this function. It's for expressions that don't support codegen. I replaced SimpleCodegenConf
with EmptyCodegenConf
.
/**
* Create a new codegen context for expression evaluator, used to store those
* expressions that don't support codegen
*/
|
||
package org.apache.spark.sql.catalyst.expressions.codegen | ||
|
||
private[spark] trait CodegenConf { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this can just be a class with a list of options, e.g.
class CodegenConf(val maxCaseBranches: Int)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current relationship between SQLConf
and catalyst
module confs (CatalystConf
, CodegenConf
) uses trait
. Should we change that manner consistently (including CatalystConf
)? After this PR, we might have more conf classes in catalyst
module in the future.
-private[sql] class SQLConf extends Serializable with CatalystConf with Logging {
+private[sql] class SQLConf extends Serializable with CatalystConf with CodegenConf with Logging {
Test build #55740 has finished for PR 12353 at commit
|
Hi, @rxin . |
Rebased to resolve conflicts. |
Test build #55808 has finished for PR 12353 at commit
|
Rebased to resolve conflicts. |
Test build #55943 has finished for PR 12353 at commit
|
@dongjoon-hyun what's your idea on how to update this? |
Oh, sorry. I didn't say explicitly. I thought updating |
Actually, it's not needed. |
@@ -305,7 +307,7 @@ case class WholeStageCodegen(child: SparkPlan) extends UnaryNode with CodegenSup | |||
* @return the tuple of the codegen context and the actual generated source. | |||
*/ | |||
def doCodeGen(): (CodegenContext, String) = { | |||
val ctx = new CodegenContext | |||
val ctx = new CodegenContext(SQLContext.getActive().get.conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a problem on non-local mode? SQLContext.getActive is not available on the executors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I haven't test it non-local mode. If then, is there any way to access the configuration of SQLContext in executors?
OK I think about this more. Actually to make this really work, we should just create two expressions, one for codegen version and the other for interpreted (default). And in the optimizer we switch to the codegen version if the number of branches is less than x. cc @davies |
I see. Thank you for the solution! |
For optimizer, may I implement this in |
Might be best to create a OptimizeCodegen rule as the very last batch. We can add other things to that rule in the future. |
Thanks. I will create |
Hi, @rxin . Now, the PR is updated in the following ways.
How do you think about item 3? |
Test build #56191 has finished for PR 12353 at commit
|
@@ -29,6 +29,7 @@ trait CatalystConf { | |||
def groupByOrdinal: Boolean | |||
|
|||
def optimizerMaxIterations: Int | |||
def maxCaseBranches: Int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maxCaseBranchesForCodegen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for quick review. Sure. And also maxCaseBranchesForCodegen
in SQLConf.scala.
case class CaseWhen( | ||
val branches: Seq[(Expression, Expression)], | ||
val elseValue: Option[Expression] = None) | ||
extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just have a toCodegen function that creates CaseWhenCodegen?
We can then remove object CaseWhenCodegen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be right. CaseWhenCodegen
is always generated from CaseWhen
.
cc @cloud-fan this change actually makes your other thing easier i think. |
@@ -242,6 +261,12 @@ object CaseWhen { | |||
} | |||
} | |||
|
|||
/** Factory methods for CaseWhenCodegen. */ | |||
object CaseWhenCodegen { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove this given the above comment
can we mark |
I don't think we can do that unless we "fix" Literal. |
Now, the followings are updated.
|
*/ | ||
case class OptimizeCodegen(conf: CatalystConf) extends Rule[LogicalPlan] { | ||
def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions { | ||
case e @ CaseWhen(branches, elseCase) if branches.size < conf.maxCaseBranchesForCodegen => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the elseCase
is not used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @cloud-fan !
LGTM |
Thank you for review, @cloud-fan . It's fixed. |
Test build #56210 has finished for PR 12353 at commit
|
Test build #56212 has finished for PR 12353 at commit
|
thanks! merging to master! |
Thank you for merging, @cloud-fan ! :) |
Also, thank you so much for your direct guidance, @rxin . |
What changes were proposed in this pull request?
We currently disable codegen for
CaseWhen
if the number of branches is greater than 20 (in CaseWhen.MAX_NUM_CASES_FOR_CODEGEN). It would be better if this value is a non-public config defined in SQLConf.How was this patch tested?
Pass the Jenkins tests (including a new testcase
Support spark.sql.codegen.maxCaseBranches option
)