Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38855][SQL] DS V2 supports push down math functions #36140

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,60 @@
* <li>Since version: 3.3.0</li>
* </ul>
* </li>
* <li>Name: <code>ABS</code>
* <ul>
* <li>SQL semantic: <code>ABS(expr)</code></li>
* <li>Since version: 3.3.0</li>
* </ul>
* </li>
* <li>Name: <code>COALESCE</code>
* <ul>
* <li>SQL semantic: <code>COALESCE(expr1, expr2)</code></li>
* <li>Since version: 3.3.0</li>
* </ul>
* </li>
* <li>Name: <code>LN</code>
* <ul>
* <li>SQL semantic: <code>LN(expr)</code></li>
* <li>Since version: 3.3.0</li>
* </ul>
* </li>
* <li>Name: <code>EXP</code>
* <ul>
* <li>SQL semantic: <code>EXP(expr)</code></li>
* <li>Since version: 3.3.0</li>
* </ul>
* </li>
* <li>Name: <code>POWER</code>
* <ul>
* <li>SQL semantic: <code>POWER(expr, number)</code></li>
* <li>Since version: 3.3.0</li>
* </ul>
* </li>
* <li>Name: <code>SQRT</code>
* <ul>
* <li>SQL semantic: <code>SQRT(expr)</code></li>
* <li>Since version: 3.3.0</li>
* </ul>
* </li>
* <li>Name: <code>FLOOR</code>
* <ul>
* <li>SQL semantic: <code>FLOOR(expr)</code></li>
* <li>Since version: 3.3.0</li>
* </ul>
* </li>
* <li>Name: <code>CEIL</code>
* <ul>
* <li>SQL semantic: <code>CEIL(expr)</code></li>
* <li>Since version: 3.3.0</li>
* </ul>
* </li>
* <li>Name: <code>WIDTH_BUCKET</code>
* <ul>
* <li>SQL semantic: <code>WIDTH_BUCKET(expr)</code></li>
* <li>Since version: 3.3.0</li>
* </ul>
* </li>
* </ol>
* Note: SQL semantic conforms ANSI standard, so some expressions are not supported when ANSI off,
* including: add, subtract, multiply, divide, remainder, pmod.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,13 @@ public String build(Expression expr) {
return visitUnaryArithmetic(name, inputToSQL(e.children()[0]));
case "ABS":
case "COALESCE":
case "LN":
case "EXP":
case "POWER":
case "SQRT":
case "FLOOR":
case "CEIL":
case "WIDTH_BUCKET":
Copy link
Contributor

@cloud-fan cloud-fan Apr 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are they in SQL standard or widely supported by many databases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. They are in SQL standard and widely supported by many databases.
I updated the description of PR to describe these mainstream databases.

return visitSQLFunction(name,
Arrays.stream(e.children()).map(c -> build(c)).toArray(String[]::new));
case "CASE_WHEN": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2380,4 +2380,8 @@ object QueryCompilationErrors {
new AnalysisException(
"Sinks cannot request distribution and ordering in continuous execution mode")
}

def noSuchFunctionError(database: String, funcInfo: String): Throwable = {
new AnalysisException(s"$database does not support function: $funcInfo")
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

package org.apache.spark.sql.catalyst.util

import org.apache.spark.sql.catalyst.expressions.{Abs, Add, And, BinaryComparison, BinaryOperator, BitwiseAnd, BitwiseNot, BitwiseOr, BitwiseXor, CaseWhen, Cast, Coalesce, Contains, Divide, EndsWith, EqualTo, Expression, In, InSet, IsNotNull, IsNull, Literal, Multiply, Not, Or, Predicate, Remainder, StartsWith, StringPredicate, Subtract, UnaryMinus}
import org.apache.spark.sql.catalyst.expressions.{Abs, Add, And, BinaryComparison, BinaryOperator, BitwiseAnd, BitwiseNot, BitwiseOr, BitwiseXor, CaseWhen, Cast, Ceil, Coalesce, Contains, Divide, EndsWith, EqualTo, Exp, Expression, Floor, In, InSet, IsNotNull, IsNull, Literal, Log, Multiply, Not, Or, Pow, Predicate, Remainder, Sqrt, StartsWith, StringPredicate, Subtract, UnaryMinus, WidthBucket}
import org.apache.spark.sql.connector.expressions.{Cast => V2Cast, Expression => V2Expression, FieldReference, GeneralScalarExpression, LiteralValue}
import org.apache.spark.sql.connector.expressions.filter.{AlwaysFalse, AlwaysTrue, And => V2And, Not => V2Not, Or => V2Or, Predicate => V2Predicate}
import org.apache.spark.sql.execution.datasources.PushableColumn
Expand Down Expand Up @@ -104,6 +104,32 @@ class V2ExpressionBuilder(
} else {
None
}
case Log(child) => generateExpression(child)
.map(v => new GeneralScalarExpression("LN", Array[V2Expression](v)))
Copy link
Contributor

@cloud-fan cloud-fan Apr 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's document the newly supported functions in the classdoc of GeneralScalarExpression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

case Exp(child) => generateExpression(child)
.map(v => new GeneralScalarExpression("EXP", Array[V2Expression](v)))
case Pow(left, right) =>
val l = generateExpression(left)
val r = generateExpression(right)
if (l.isDefined && r.isDefined) {
Some(new GeneralScalarExpression("POWER", Array[V2Expression](l.get, r.get)))
} else {
None
}
case Sqrt(child) => generateExpression(child)
.map(v => new GeneralScalarExpression("SQRT", Array[V2Expression](v)))
case Floor(child) => generateExpression(child)
.map(v => new GeneralScalarExpression("FLOOR", Array[V2Expression](v)))
case Ceil(child) => generateExpression(child)
.map(v => new GeneralScalarExpression("CEIL", Array[V2Expression](v)))
case wb: WidthBucket =>
val childrenExpressions = wb.children.flatMap(generateExpression(_))
if (childrenExpressions.length == wb.children.length) {
Some(new GeneralScalarExpression("WIDTH_BUCKET",
childrenExpressions.toArray[V2Expression]))
} else {
None
}
case and: And =>
// AND expects predicate
val l = generateExpression(and.left, true)
Expand Down
26 changes: 26 additions & 0 deletions sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,40 @@ package org.apache.spark.sql.jdbc
import java.sql.SQLException
import java.util.Locale

import scala.util.control.NonFatal

import org.apache.spark.sql.AnalysisException
import org.apache.spark.sql.catalyst.analysis.{NoSuchNamespaceException, NoSuchTableException, TableAlreadyExistsException}
import org.apache.spark.sql.connector.expressions.Expression
import org.apache.spark.sql.connector.expressions.aggregate.{AggregateFunc, GeneralAggregateFunc}
import org.apache.spark.sql.errors.QueryCompilationErrors

private object H2Dialect extends JdbcDialect {
override def canHandle(url: String): Boolean =
url.toLowerCase(Locale.ROOT).startsWith("jdbc:h2")

class H2SQLBuilder extends JDBCSQLBuilder {
override def visitSQLFunction(funcName: String, inputs: Array[String]): String = {
funcName match {
case "WIDTH_BUCKET" =>
val functionInfo = super.visitSQLFunction(funcName, inputs)
throw QueryCompilationErrors.noSuchFunctionError("H2", functionInfo)
case _ => super.visitSQLFunction(funcName, inputs)
}
}
}

override def compileExpression(expr: Expression): Option[String] = {
val h2SQLBuilder = new H2SQLBuilder()
try {
Some(h2SQLBuilder.build(expr))
} catch {
case NonFatal(e) =>
logWarning("Error occurs while compiling V2 expression", e)
None
}
}

override def compileAggregate(aggFunction: AggregateFunc): Option[String] = {
super.compileAggregate(aggFunction).orElse(
aggFunction match {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ import org.apache.spark.sql.catalyst.analysis.CannotReplaceMissingTableException
import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Filter, GlobalLimit, LocalLimit, Sort}
import org.apache.spark.sql.execution.datasources.v2.{DataSourceV2ScanRelation, V1ScanWrapper}
import org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog
import org.apache.spark.sql.functions.{abs, avg, coalesce, count, count_distinct, lit, not, sum, udf, when}
import org.apache.spark.sql.functions.{abs, avg, ceil, coalesce, count, count_distinct, exp, floor, lit, log => ln, not, pow, sqrt, sum, udf, when}
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.test.SharedSparkSession
import org.apache.spark.util.Utils
Expand Down Expand Up @@ -464,6 +464,32 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel
checkPushedInfo(df5, expectedPlanFragment5)
checkAnswer(df5, Seq(Row(1, "amy", 10000, 1000, true),
Row(1, "cathy", 9000, 1200, false), Row(6, "jen", 12000, 1200, true)))

val df6 = spark.table("h2.test.employee")
.filter(ln($"dept") > 1)
.filter(exp($"salary") > 2000)
.filter(pow($"dept", 2) > 4)
.filter(sqrt($"salary") > 100)
.filter(floor($"dept") > 1)
.filter(ceil($"dept") > 1)
checkFiltersRemoved(df6, ansiMode)
val expectedPlanFragment6 = if (ansiMode) {
"PushedFilters: [DEPT IS NOT NULL, SALARY IS NOT NULL, " +
"LN(CAST(DEPT AS double)) > 1.0, EXP(CAST(SALARY AS double)...,"
} else {
"PushedFilters: [DEPT IS NOT NULL, SALARY IS NOT NULL]"
}
checkPushedInfo(df6, expectedPlanFragment6)
checkAnswer(df6, Seq(Row(6, "jen", 12000, 1200, true)))

// H2 does not support width_bucket
val df7 = sql("""
|SELECT * FROM h2.test.employee
|WHERE width_bucket(dept, 1, 6, 3) > 1
|""".stripMargin)
checkFiltersRemoved(df7, false)
checkPushedInfo(df7, "PushedFilters: [DEPT IS NOT NULL]")
checkAnswer(df7, Seq(Row(6, "jen", 12000, 1200, true)))
}
}
}
Expand Down