Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29883][SQL] Implement a helper method for aliasing bool_and() and bool_or() #26712

Closed
wants to merge 12 commits into from

Conversation

@amanomer
Copy link
Contributor

amanomer commented Nov 29, 2019

What changes were proposed in this pull request?

This PR introduces a method expressionWithAlias in class FunctionRegistry which is used to register function's constructor. Currently, expressionWithAlias is used to register BoolAnd & BoolOr.

Why are the changes needed?

Error message is wrong when alias name is used for BoolAnd & BoolOr.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Tested manually.

For query,
select every('true');

Output before this PR,

Error in query: cannot resolve 'bool_and('true')' due to data type mismatch: Input to function 'bool_and' should have been boolean, but it's [string].; line 1 pos 7;

After this PR,

Error in query: cannot resolve 'every('true')' due to data type mismatch: Input to function 'every' should have been boolean, but it's [string].; line 1 pos 7;

@amanomer

This comment has been minimized.

Copy link
Contributor Author

amanomer commented Nov 29, 2019

@gatorsmile I have now handled for BoolAnd and BoolOr. Kindly review the changes.

@amanomer

This comment has been minimized.

Copy link
Contributor Author

amanomer commented Nov 29, 2019

@@ -118,7 +118,7 @@ class SimpleFunctionRegistry extends FunctionRegistry with Logging {
throw new AnalysisException(s"undefined function $name")
}
}
func(children)
func(children).setFuncName(name.funcName)

This comment has been minimized.

Copy link
@srowen

srowen Nov 29, 2019

Member

Hm, do functions not already have names somewhere to use, that can already be set differently per alias? it looks like that's what nodeName is for, and it's already overridden in the aliases, so I'm missing why this is different.

This comment has been minimized.

Copy link
@amanomer

amanomer Nov 29, 2019

Author Contributor

do functions not already have names somewhere to use, that can already be set differently per alias?

I think, No? Currently, when we use alias of bool_and (i.e, every), it will be resolved as a constructor of BoolAnd using FunctionRegistry#expressions, which sets `bool_and' as a nodeName.

case class BoolAnd(arg: Expression) extends UnevaluableBooleanAggBase(arg) {
override def nodeName: String = "bool_and"
}

This comment has been minimized.

Copy link
@srowen

srowen Nov 29, 2019

Member

This may not make sense, I don't know this code well, but: is it not that "Exists" needs to customize its nodeName, for example? or does it never exist as such a node.

This comment has been minimized.

Copy link
@amanomer

amanomer Nov 30, 2019

Author Contributor

every here does not have any node. It will be resolved as a BoolAnd.

expression[BoolAnd]("every"),
expression[BoolAnd]("bool_and"),
expression[BoolOr]("any"),
expression[BoolOr]("some"),
expression[BoolOr]("bool_or"),

This comment has been minimized.

Copy link
@srowen

srowen Nov 30, 2019

Member

I see. I don't know enough to evaluate the effect of changing nodeName for all implementations, which seems like a broader change than required, but it does make some sense. Maybe @liancheng or @yhuai has an opinion.

@amanomer

This comment has been minimized.

Copy link
Contributor Author

amanomer commented Dec 2, 2019

@gatorsmile Kindly review this PR.

@@ -286,6 +286,15 @@ abstract class Expression extends TreeNode[Expression] {
override def simpleStringWithNodeId(): String = {
throw new UnsupportedOperationException(s"$nodeName does not implement simpleStringWithNodeId")
}

var functionAlias: String = ""

This comment has been minimized.

Copy link
@cloud-fan

cloud-fan Dec 2, 2019

Contributor

It's a bad practice to make the Expression mutable for trivial things like this. How about using RuntimeReplaceable?

This comment has been minimized.

Copy link
@amanomer

amanomer Dec 2, 2019

Author Contributor

Argument's data types are matched in Analysis phase of planning and its optimizer's task to replace RuntimeReplaceable, correct me if I'm wrong.
So, optimization rules (ReplaceExpressions) will never be applied on queries like select every('true').

@@ -53,7 +53,6 @@ abstract class UnevaluableBooleanAggBase(arg: Expression)
""",
since = "3.0.0")
case class BoolAnd(arg: Expression) extends UnevaluableBooleanAggBase(arg) {

This comment has been minimized.

Copy link
@cloud-fan

cloud-fan Dec 2, 2019

Contributor

how about

case class BoolAnd(functionName: String, arg: Expression) ... with MultiNamesFunction {
  def nodeName = functionName
}

We can update FunctionRegistry.expression to detect MultiNamesFunction and pass name to the constructor.

This comment has been minimized.

Copy link
@amanomer

amanomer Dec 2, 2019

Author Contributor

Updated PR. cc @cloud-fan

@amanomer amanomer force-pushed the amanomer:29883 branch from bc02d23 to dbf6b4f Dec 2, 2019
@amanomer amanomer requested review from srowen and cloud-fan Dec 2, 2019
@cloud-fan

This comment has been minimized.

Copy link
Contributor

cloud-fan commented Dec 3, 2019

ok to test

@@ -600,7 +600,8 @@ object FunctionRegistry {
} else {
// Otherwise, find a constructor method that matches the number of arguments, and use that.
val params = Seq.fill(expressions.size)(classOf[Expression])
val f = constructors.find(_.getParameterTypes.toSeq == params).getOrElse {
val f = constructors.find(e => e.getParameterTypes.toSeq == params
|| e.getParameterTypes.head == classOf[String]).getOrElse {

This comment has been minimized.

Copy link
@cloud-fan

cloud-fan Dec 3, 2019

Contributor

Seems like it's less hacky to create a new expressionWithAlias method, with only the necessary logic

def expressionWithAlias ... = {
  val constructors = tag.runtimeClass.getConstructors
    .filter(c => e.getParameterTypes.head == classOf[String])
  assert(constructors.length == 1)
  try {
    constructors.head.newInstance(name, expressions : _*).asInstanceOf[Expression]
  } ...
}

This comment has been minimized.

Copy link
@cloud-fan

cloud-fan Dec 3, 2019

Contributor

then we don't even need the MultiNamedExpression trait. We just need to register bool_and, bool_or with expressionWithAlias

This comment has been minimized.

Copy link
@amanomer

amanomer Dec 3, 2019

Author Contributor

@cloud-fan updated as per your suggestions.

@SparkQA

This comment has been minimized.

Copy link

SparkQA commented Dec 3, 2019

Test build #114761 has finished for PR 26712 at commit 3387eef.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
.filter(_.getParameterTypes.head == classOf[String])
assert(constructors.length == 1)
val builder = (expressions: Seq[Expression]) => {
Try(constructors.head.newInstance(name.toString, expressions.head).asInstanceOf[Expression])

This comment has been minimized.

Copy link
@cloud-fan

cloud-fan Dec 3, 2019

Contributor

seems better to use the normal try catch?

try {
  constructors.head.newInstance(name.toString, expressions.head).asInstanceOf[Expression]
} catch {
  // the original comment ...
  case e => throw new AnalysisException(e.getCause.getMessage)
}

We can update def expression as well.

This comment has been minimized.

Copy link
@amanomer

amanomer Dec 3, 2019

Author Contributor

Updated in latest commit.

@SparkQA

This comment has been minimized.

Copy link

SparkQA commented Dec 3, 2019

Test build #114764 has finished for PR 26712 at commit 4476e7e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class BoolAnd(funcName: String, arg: Expression) extends UnevaluableBooleanAggBase(arg)
  • case class BoolOr(funcName: String, arg: Expression) extends UnevaluableBooleanAggBase(arg)
amanomer added 2 commits Dec 3, 2019
@SparkQA

This comment has been minimized.

Copy link

SparkQA commented Dec 3, 2019

Test build #114786 has finished for PR 26712 at commit 77ad53f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

SparkQA commented Dec 3, 2019

Test build #114787 has finished for PR 26712 at commit 404d829.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

SparkQA commented Dec 3, 2019

Test build #114797 has finished for PR 26712 at commit 27f1a7f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
Copy link
Member

HyukjinKwon left a comment

Looks good to me too if tests pass

@SparkQA

This comment has been minimized.

Copy link

SparkQA commented Dec 4, 2019

Test build #114828 has finished for PR 26712 at commit 2952898.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.
@cloud-fan

This comment has been minimized.

Copy link
Contributor

cloud-fan commented Dec 4, 2019

retest this please

@SparkQA

This comment has been minimized.

Copy link

SparkQA commented Dec 4, 2019

Test build #114842 has finished for PR 26712 at commit 2952898.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

SparkQA commented Dec 4, 2019

Test build #114859 has finished for PR 26712 at commit ab2e422.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

SparkQA commented Dec 4, 2019

Test build #114875 has finished for PR 26712 at commit bb665e4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
assert(constructors.length == 1)
val builder = (expressions: Seq[Expression]) => {
try {
constructors.head.newInstance(name.toString, expressions.head).asInstanceOf[Expression]

This comment has been minimized.

Copy link
@amanomer

amanomer Dec 5, 2019

Author Contributor

Since, we are not validating arguments, queries like
SELECT EVERY(true, false); will result true.

This comment has been minimized.

Copy link
@amanomer

amanomer Dec 5, 2019

Author Contributor

We can validate arguments with assert or as used in expression?
cc @cloud-fan @HyukjinKwon

This comment has been minimized.

Copy link
@cloud-fan

cloud-fan Dec 5, 2019

Contributor

how is it done in def expression?

This comment has been minimized.

Copy link
@amanomer

amanomer Dec 5, 2019

Author Contributor

val params = Seq.fill(expressions.size)(classOf[Expression])
val f = constructors.find(_.getParameterTypes.toSeq == params).getOrElse {
val validParametersCount = constructors
.filter(_.getParameterTypes.forall(_ == classOf[Expression]))
.map(_.getParameterCount).distinct.sorted
val invalidArgumentsMsg = if (validParametersCount.length == 0) {
s"Invalid arguments for function $name"
} else {
val expectedNumberOfParameters = if (validParametersCount.length == 1) {
validParametersCount.head.toString
} else {
validParametersCount.init.mkString("one of ", ", ", " and ") +
validParametersCount.last
}
s"Invalid number of arguments for function $name. " +
s"Expected: $expectedNumberOfParameters; Found: ${params.length}"
}
throw new AnalysisException(invalidArgumentsMsg)
}

This comment has been minimized.

Copy link
@amanomer

amanomer Dec 5, 2019

Author Contributor

Since, in expressionWithAlias we are always passing expressions.head to function's constructor. We can use assert statement

...
val builder = (expressions: Seq[Expression]) => {
      assert(expressions.size == 1,
        s"Invalid number of arguments for function $name. " +
          s"Expected: 1; Found: ${expressions.size}")
      assert(expressions.head == classOf[Expression],
        s"Invalid arguments for function $name")
      try {
        constructors.head.newInstance(name.toString, expressions.head).asInstanceOf[Expression]
      }
...

This comment has been minimized.

Copy link
@cloud-fan

cloud-fan Dec 5, 2019

Contributor

SGTM

This comment has been minimized.

Copy link
@cloud-fan

cloud-fan Dec 5, 2019

Contributor

BTW is it possible to do newInstance(name.toString, expressions: _*)? Then it can work for other expressions that take more than 1 parameter.

This comment has been minimized.

Copy link
@amanomer

amanomer Dec 5, 2019

Author Contributor

is it possible to do newInstance(name.toString, expressions: _*)?

No, compilation error.

This comment has been minimized.

Copy link
@cloud-fan

cloud-fan Dec 5, 2019

Contributor

how about newInstance((name.toString +: expressions): _*)?

This comment has been minimized.

Copy link
@amanomer

amanomer Dec 5, 2019

Author Contributor

It works. Updated in latest commit. cc @cloud-fan

@amanomer

This comment has been minimized.

Copy link
Contributor Author

amanomer commented Dec 5, 2019

There are other function with alias name example VarianceSamp, StddevSamp. Should we change them in this PR?

@cloud-fan

This comment has been minimized.

Copy link
Contributor

cloud-fan commented Dec 5, 2019

Should we change them in this PR?

I'm fine either way.

@amanomer

This comment has been minimized.

Copy link
Contributor Author

amanomer commented Dec 5, 2019

Ok, I will handle them in different PR.

@SparkQA

This comment has been minimized.

Copy link

SparkQA commented Dec 5, 2019

Test build #114899 has finished for PR 26712 at commit 3c87222.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

SparkQA commented Dec 5, 2019

Test build #114903 has finished for PR 26712 at commit 17c91f4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@dongjoon-hyun dongjoon-hyun added the SQL label Dec 5, 2019
@amanomer

This comment has been minimized.

Copy link
Contributor Author

amanomer commented Dec 6, 2019

@amanomer

This comment has been minimized.

Copy link
Contributor Author

amanomer commented Dec 6, 2019

@SparkQA

This comment has been minimized.

Copy link

SparkQA commented Dec 6, 2019

Test build #114940 has finished for PR 26712 at commit d07d261.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@amanomer amanomer changed the title [SPARK-29883][SQL] Improve error messages when function name is an alias [SPARK-29883][SQL] Improve error messages when bool_and() and bool_or() is called using alias Dec 7, 2019
@srowen

This comment has been minimized.

Copy link
Member

srowen commented Dec 8, 2019

I'm OK with it if @cloud-fan is.

case Failure(e) =>
// the exception is an invocation exception. To get a meaningful message, we need the
// cause.
throw new AnalysisException(e.getCause.getMessage)

This comment has been minimized.

Copy link
@dongjoon-hyun

dongjoon-hyun Dec 8, 2019

Member

Hi, @amanomer . I'm wondering if this change is required in this PR.

This comment has been minimized.

Copy link
@amanomer

amanomer Dec 9, 2019

Author Contributor

This reformatting of try-catch block can be raised in different PR.

This comment has been minimized.

Copy link
@srowen

srowen Dec 9, 2019

Member

FWIW I think this is fine and cleaner, so think it's OK to change here.

This comment has been minimized.

Copy link
@amanomer

amanomer Dec 9, 2019

Author Contributor

I think there are similar try-catch block format on other files too, which can be reformatted like this.

case Failure(e) =>
// the exception is an invocation exception. To get a meaningful message, we need the
// cause.
throw new AnalysisException(e.getCause.getMessage)

This comment has been minimized.

Copy link
@dongjoon-hyun
@maropu
maropu approved these changes Dec 9, 2019
Copy link
Member

maropu left a comment

Can you make the title clearer like Implement a helper method for aliasing registered functions?

@amanomer amanomer changed the title [SPARK-29883][SQL] Improve error messages when bool_and() and bool_or() is called using alias [SPARK-29883][SQL] IImplement a helper method for aliasing bool_and() and bool_or() Dec 9, 2019
@amanomer amanomer changed the title [SPARK-29883][SQL] IImplement a helper method for aliasing bool_and() and bool_or() [SPARK-29883][SQL] Implement a helper method for aliasing bool_and() and bool_or() Dec 9, 2019
@cloud-fan cloud-fan closed this in dcea7a4 Dec 9, 2019
@cloud-fan

This comment has been minimized.

Copy link
Contributor

cloud-fan commented Dec 9, 2019

thanks, merging to master!

@amanomer

This comment has been minimized.

Copy link
Contributor Author

amanomer commented Dec 9, 2019

Thanks all for reviewing and merging

cloud-fan added a commit that referenced this pull request Dec 20, 2019
### What changes were proposed in this pull request?
This PR is to use `expressionWithAlias` for remaining functions for which alias name can be used. Remaining functions are:
`Average, First, Last, ApproximatePercentile, StddevSamp, VarianceSamp`

PR #26712 introduced `expressionWithAlias`
### Why are the changes needed?
Error message is wrong when alias name is used for above mentioned functions.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Manually

Closes #26808 from amanomer/fncAlias.

Lead-authored-by: Aman Omer <amanomer1996@gmail.com>
Co-authored-by: Aman Omer <40591404+amanomer@users.noreply.github.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.