Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-14577][SQL] Add spark.sql.codegen.maxCaseBranches config option #12353

Closed
wants to merge 5 commits into from
Closed

[SPARK-14577][SQL] Add spark.sql.codegen.maxCaseBranches config option #12353

wants to merge 5 commits into from

Conversation

dongjoon-hyun
Copy link
Member

What changes were proposed in this pull request?

We currently disable codegen for CaseWhen if the number of branches is greater than 20 (in CaseWhen.MAX_NUM_CASES_FOR_CODEGEN). It would be better if this value is a non-public config defined in SQLConf.

How was this patch tested?

Pass the Jenkins tests (including a new testcase Support spark.sql.codegen.maxCaseBranches option)

@SparkQA
Copy link

SparkQA commented Apr 13, 2016

Test build #55701 has finished for PR 12353 at commit 9a83c7f.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -52,6 +52,8 @@ case class ExprCode(var code: String, var isNull: String, var value: String)
*/
class CodegenContext {

var conf: CatalystConf = null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really hacky -- i'd put this in the constructor and make it a val rather than a var. and maybe we can create a CodegenConf instead of reusing CatalystConf?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, Sure. I'll change those things.

@SparkQA
Copy link

SparkQA commented Apr 13, 2016

Test build #55702 has finished for PR 12353 at commit 6295de5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 13, 2016

Test build #55705 has finished for PR 12353 at commit 61b2dc3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SimpleCodegenConf(maxCaseBranches: Int = 20) extends CodegenConf
    • class CodegenContext(codegenConf: CodegenConf)

@@ -620,7 +622,7 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin
* expressions that don't support codegen
*/
def newCodeGenContext(): CodegenContext = {
new CodegenContext
new CodegenContext(new SimpleCodegenConf)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.
CodegenContext had better be a parameter.
I will fix this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I overlooked the purpose of this function. It's for expressions that don't support codegen. I replaced SimpleCodegenConf with EmptyCodegenConf.

/**
   * Create a new codegen context for expression evaluator, used to store those
   * expressions that don't support codegen
   */

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-14577][SQL] Add spark.sql.codegen.maxCaseBranches config option [WIP][SPARK-14577][SQL] Add spark.sql.codegen.maxCaseBranches config option Apr 13, 2016

package org.apache.spark.sql.catalyst.expressions.codegen

private[spark] trait CodegenConf {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this can just be a class with a list of options, e.g.

class CodegenConf(val maxCaseBranches: Int)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current relationship between SQLConf and catalyst module confs (CatalystConf, CodegenConf) uses trait. Should we change that manner consistently (including CatalystConf)? After this PR, we might have more conf classes in catalyst module in the future.

-private[sql] class SQLConf extends Serializable with CatalystConf with Logging {
+private[sql] class SQLConf extends Serializable with CatalystConf with CodegenConf with Logging {

@dongjoon-hyun dongjoon-hyun changed the title [WIP][SPARK-14577][SQL] Add spark.sql.codegen.maxCaseBranches config option [SPARK-14577][SQL] Add spark.sql.codegen.maxCaseBranches config option Apr 13, 2016
@SparkQA
Copy link

SparkQA commented Apr 13, 2016

Test build #55740 has finished for PR 12353 at commit 2d7a509.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Hi, @rxin .
Now, this PR is ready to be reviewed.
Could you review this please?

@dongjoon-hyun
Copy link
Member Author

Rebased to resolve conflicts.

@SparkQA
Copy link

SparkQA commented Apr 14, 2016

Test build #55808 has finished for PR 12353 at commit fe8510a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Rebased to resolve conflicts.

@SparkQA
Copy link

SparkQA commented Apr 15, 2016

Test build #55943 has finished for PR 12353 at commit 6842e68.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Apr 16, 2016

@dongjoon-hyun what's your idea on how to update this?

@dongjoon-hyun
Copy link
Member Author

Oh, sorry. I didn't say explicitly. I thought updating private[spark] trait CodegenConf into trait CodegenConf because CatalystConf changed like that.

@dongjoon-hyun
Copy link
Member Author

Actually, it's not needed.
If there are some missing things to do, could you give me some advice?

@@ -305,7 +307,7 @@ case class WholeStageCodegen(child: SparkPlan) extends UnaryNode with CodegenSup
* @return the tuple of the codegen context and the actual generated source.
*/
def doCodeGen(): (CodegenContext, String) = {
val ctx = new CodegenContext
val ctx = new CodegenContext(SQLContext.getActive().get.conf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a problem on non-local mode? SQLContext.getActive is not available on the executors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I haven't test it non-local mode. If then, is there any way to access the configuration of SQLContext in executors?

@rxin
Copy link
Contributor

rxin commented Apr 18, 2016

OK I think about this more. Actually to make this really work, we should just create two expressions, one for codegen version and the other for interpreted (default). And in the optimizer we switch to the codegen version if the number of branches is less than x.

cc @davies

@dongjoon-hyun
Copy link
Member Author

I see. Thank you for the solution!

@dongjoon-hyun
Copy link
Member Author

For optimizer, may I implement this in SimplifyConditionals?
Or, should I create another one like CaseWhenCodegen?

@rxin
Copy link
Contributor

rxin commented Apr 18, 2016

Might be best to create a OptimizeCodegen rule as the very last batch. We can add other things to that rule in the future.

@dongjoon-hyun
Copy link
Member Author

Thanks. I will create OptimizeCodegen then.

@dongjoon-hyun
Copy link
Member Author

Hi, @rxin . Now, the PR is updated in the following ways.

  1. CaseWhen split into three classes:
    • CaseWhenBase: abstract base class
    • CaseWhen: codegen fallback
    • CaseWhenCodegen: codegen
  2. OptimizeCodegen optimizer added:
    • Replace CaseWhen into CaseWhenCodegen if branches.size < conf.maxCasesBranches.
  3. CodegenConf is removed and CatalystConf has maxCasesBranches. (SQLConf does not changed.)
    • Since Optimizer uses that configuration, I think CatalystConf becomes more proper place.

How do you think about item 3?

@SparkQA
Copy link

SparkQA commented Apr 19, 2016

Test build #56191 has finished for PR 12353 at commit a9294bd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -29,6 +29,7 @@ trait CatalystConf {
def groupByOrdinal: Boolean

def optimizerMaxIterations: Int
def maxCaseBranches: Int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxCaseBranchesForCodegen?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for quick review. Sure. And also maxCaseBranchesForCodegen in SQLConf.scala.

case class CaseWhen(
val branches: Seq[(Expression, Expression)],
val elseValue: Option[Expression] = None)
extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just have a toCodegen function that creates CaseWhenCodegen?

We can then remove object CaseWhenCodegen

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be right. CaseWhenCodegen is always generated from CaseWhen.

@rxin
Copy link
Contributor

rxin commented Apr 19, 2016

cc @cloud-fan this change actually makes your other thing easier i think.

@@ -242,6 +261,12 @@ object CaseWhen {
}
}

/** Factory methods for CaseWhenCodegen. */
object CaseWhenCodegen {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this given the above comment

@cloud-fan
Copy link
Contributor

can we mark CodegenFallback.doCodeGen as final? so that we can guarantee that expressions implement CodegenFallback will always fallback.

@rxin
Copy link
Contributor

rxin commented Apr 19, 2016

I don't think we can do that unless we "fix" Literal.

@dongjoon-hyun
Copy link
Member Author

Now, the followings are updated.

  • maxCaseBranches is renamed to maxCaseBranchesForCodegen.
  • object CaseWhenCodegen is removed.
  • CaseWhen has 'toCodegen` function.
  • 3 testcases added: nested CaseWhen, multiple CaseWhen in one operator, multiple CaseWhen in multiple operators.

*/
case class OptimizeCodegen(conf: CatalystConf) extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
case e @ CaseWhen(branches, elseCase) if branches.size < conf.maxCaseBranchesForCodegen =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the elseCase is not used

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @cloud-fan !

@cloud-fan
Copy link
Contributor

LGTM

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @cloud-fan . It's fixed.

@SparkQA
Copy link

SparkQA commented Apr 19, 2016

Test build #56210 has finished for PR 12353 at commit 502bc61.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 19, 2016

Test build #56212 has finished for PR 12353 at commit 9a2340c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks! merging to master!

@asfgit asfgit closed this in 3d46d79 Apr 19, 2016
@dongjoon-hyun
Copy link
Member Author

Thank you for merging, @cloud-fan ! :)

@dongjoon-hyun
Copy link
Member Author

Also, thank you so much for your direct guidance, @rxin .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants