[SPARK-38997][SPARK-39037][SQL][FOLLOWUP] `PushableColumnWithoutNestedColumn` need be translated to predicate too #36776

beliefer · 2022-06-06T11:58:58Z

What changes were proposed in this pull request?

#35768 assume the expression in And, Or and Not must be predicate.
#36370 and #36325 supported push down expressions in GROUP BY and ORDER BY. But the children of And, Or and Not can be FieldReference.column(name).
FieldReference.column(name) is not a predicate, so the assert may fail.

Why are the changes needed?

This PR fix the bug for PushableColumnWithoutNestedColumn.

Does this PR introduce any user-facing change?

'Yes'.
Let the push-down framework more correctly.

How was this patch tested?

New tests

…n need be translated to predicate too

beliefer · 2022-06-07T02:44:49Z

ping @huaxingao cc @cloud-fan

cloud-fan · 2022-06-07T05:57:06Z

sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala

@@ -55,8 +55,13 @@ class V2ExpressionBuilder(
      } else {
        Some(FieldReference(name))
      }
-    case pushableColumn(name) if !nestedPredicatePushdownEnabled =>
-      Some(FieldReference.column(name))
+    case col @ pushableColumn(name) if !nestedPredicatePushdownEnabled =>


can we merge the code a bit more?

case col @ pushableColumn(name) => val ref = if (nestedPredicatePushdownEnabled) ... else ... if (predicate) ... else ...

cloud-fan · 2022-06-09T06:26:16Z

thanks, merging to master/3.3!

…Column` need be translated to predicate too ### What changes were proposed in this pull request? #35768 assume the expression in `And`, `Or` and `Not` must be predicate. #36370 and #36325 supported push down expressions in `GROUP BY` and `ORDER BY`. But the children of `And`, `Or` and `Not` can be `FieldReference.column(name)`. `FieldReference.column(name)` is not a predicate, so the assert may fail. ### Why are the changes needed? This PR fix the bug for `PushableColumnWithoutNestedColumn`. ### Does this PR introduce _any_ user-facing change? 'Yes'. Let the push-down framework more correctly. ### How was this patch tested? New tests Closes #36776 from beliefer/SPARK-38997_SPARK-39037_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 125555c) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

beliefer · 2022-06-09T06:31:29Z

@cloud-fan @huaxingao Thank you for the review.

…Column` need be translated to predicate too ### What changes were proposed in this pull request? apache#35768 assume the expression in `And`, `Or` and `Not` must be predicate. apache#36370 and apache#36325 supported push down expressions in `GROUP BY` and `ORDER BY`. But the children of `And`, `Or` and `Not` can be `FieldReference.column(name)`. `FieldReference.column(name)` is not a predicate, so the assert may fail. ### Why are the changes needed? This PR fix the bug for `PushableColumnWithoutNestedColumn`. ### Does this PR introduce _any_ user-facing change? 'Yes'. Let the push-down framework more correctly. ### How was this patch tested? New tests Closes apache#36776 from beliefer/SPARK-38997_SPARK-39037_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…mal binary arithmetic (#481) * [SPARK-39270][SQL] JDBC dialect supports registering dialect specific functions ### What changes were proposed in this pull request? The build-in functions in Spark is not the same as JDBC database. We can provide the chance users could register dialect specific functions. ### Why are the changes needed? JDBC dialect supports registering dialect specific functions ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New tests. Closes apache#36649 from beliefer/SPARK-39270. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite ### What changes were proposed in this pull request? `JDBCV2Suite` exists some test case which uses sql keywords are not capitalized. This PR will capitalize sql keywords in `JDBCV2Suite`. ### Why are the changes needed? Capitalize sql keywords in `JDBCV2Suite`. ### Does this PR introduce _any_ user-facing change? 'No'. Just update test cases. ### How was this patch tested? N/A. Closes apache#36805 from beliefer/SPARK-39413. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: huaxingao <huaxin_gao@apple.com> * [SPARK-38997][SPARK-39037][SQL][FOLLOWUP] PushableColumnWithoutNestedColumn` need be translated to predicate too ### What changes were proposed in this pull request? apache#35768 assume the expression in `And`, `Or` and `Not` must be predicate. apache#36370 and apache#36325 supported push down expressions in `GROUP BY` and `ORDER BY`. But the children of `And`, `Or` and `Not` can be `FieldReference.column(name)`. `FieldReference.column(name)` is not a predicate, so the assert may fail. ### Why are the changes needed? This PR fix the bug for `PushableColumnWithoutNestedColumn`. ### Does this PR introduce _any_ user-facing change? 'Yes'. Let the push-down framework more correctly. ### How was this patch tested? New tests Closes apache#36776 from beliefer/SPARK-38997_SPARK-39037_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic ### What changes were proposed in this pull request? The main change: - Add a new method `resultDecimalType` in `BinaryArithmetic` - Add a new expression `DecimalAddNoOverflowCheck` for the internal decimal add, e.g. `Sum`/`Average`, the different with `Add` is: - `DecimalAddNoOverflowCheck` does not check overflow - `DecimalAddNoOverflowCheck` make `dataType` as its input parameter - Merge the decimal precision code of `DecimalPrecision` into each arithmetic data type, so every arithmetic should report the accurate decimal type. And we can remove the unused expression `PromotePrecision` and related code - Merge `CheckOverflow` iinto arithmetic eval and code-gen code path, so every arithmetic can handle the overflow case during runtime Merge `PromotePrecision` into `dataType`, for example, `Add`: ```scala override def resultDecimalType(p1: Int, s1: Int, p2: Int, s2: Int): DecimalType = { val resultScale = max(s1, s2) if (allowPrecisionLoss) { DecimalType.adjustPrecisionScale(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale) } else { DecimalType.bounded(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale) } } ``` Merge `CheckOverflow`, for example, `Add` eval: ```scala dataType match { case decimalType: DecimalType => val value = numeric.plus(input1, input2) checkOverflow(value.asInstanceOf[Decimal], decimalType) ... } ``` Note that, `CheckOverflow` is still useful after this pr, e.g. `RowEncoder`. We can do further in a separate pr. ### Why are the changes needed? Fix the bug of `TypeCoercion`, for example: ```sql SELECT CAST(1 AS DECIMAL(28, 2)) UNION ALL SELECT CAST(1 AS DECIMAL(18, 2)) / CAST(1 AS DECIMAL(18, 2)); ``` Relax the decimal precision at runtime, so we do not need redundant Cast ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? Pass exists test and add some bug fix test in `decimalArithmeticOperations.sql` Closes apache#36698 from ulysses-you/decimal. Lead-authored-by: ulysses-you <ulyssesyou18@gmail.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * fix ut Co-authored-by: Jiaan Geng <beliefer@163.com> Co-authored-by: ulysses-you <ulyssesyou18@gmail.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com>

…mal binary arithmetic (#481) * [SPARK-39270][SQL] JDBC dialect supports registering dialect specific functions The build-in functions in Spark is not the same as JDBC database. We can provide the chance users could register dialect specific functions. JDBC dialect supports registering dialect specific functions 'No'. New feature. New tests. Closes apache#36649 from beliefer/SPARK-39270. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite `JDBCV2Suite` exists some test case which uses sql keywords are not capitalized. This PR will capitalize sql keywords in `JDBCV2Suite`. Capitalize sql keywords in `JDBCV2Suite`. 'No'. Just update test cases. N/A. Closes apache#36805 from beliefer/SPARK-39413. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: huaxingao <huaxin_gao@apple.com> * [SPARK-38997][SPARK-39037][SQL][FOLLOWUP] PushableColumnWithoutNestedColumn` need be translated to predicate too apache#35768 assume the expression in `And`, `Or` and `Not` must be predicate. apache#36370 and apache#36325 supported push down expressions in `GROUP BY` and `ORDER BY`. But the children of `And`, `Or` and `Not` can be `FieldReference.column(name)`. `FieldReference.column(name)` is not a predicate, so the assert may fail. This PR fix the bug for `PushableColumnWithoutNestedColumn`. 'Yes'. Let the push-down framework more correctly. New tests Closes apache#36776 from beliefer/SPARK-38997_SPARK-39037_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic The main change: - Add a new method `resultDecimalType` in `BinaryArithmetic` - Add a new expression `DecimalAddNoOverflowCheck` for the internal decimal add, e.g. `Sum`/`Average`, the different with `Add` is: - `DecimalAddNoOverflowCheck` does not check overflow - `DecimalAddNoOverflowCheck` make `dataType` as its input parameter - Merge the decimal precision code of `DecimalPrecision` into each arithmetic data type, so every arithmetic should report the accurate decimal type. And we can remove the unused expression `PromotePrecision` and related code - Merge `CheckOverflow` iinto arithmetic eval and code-gen code path, so every arithmetic can handle the overflow case during runtime Merge `PromotePrecision` into `dataType`, for example, `Add`: ```scala override def resultDecimalType(p1: Int, s1: Int, p2: Int, s2: Int): DecimalType = { val resultScale = max(s1, s2) if (allowPrecisionLoss) { DecimalType.adjustPrecisionScale(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale) } else { DecimalType.bounded(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale) } } ``` Merge `CheckOverflow`, for example, `Add` eval: ```scala dataType match { case decimalType: DecimalType => val value = numeric.plus(input1, input2) checkOverflow(value.asInstanceOf[Decimal], decimalType) ... } ``` Note that, `CheckOverflow` is still useful after this pr, e.g. `RowEncoder`. We can do further in a separate pr. Fix the bug of `TypeCoercion`, for example: ```sql SELECT CAST(1 AS DECIMAL(28, 2)) UNION ALL SELECT CAST(1 AS DECIMAL(18, 2)) / CAST(1 AS DECIMAL(18, 2)); ``` Relax the decimal precision at runtime, so we do not need redundant Cast yes, bug fix Pass exists test and add some bug fix test in `decimalArithmeticOperations.sql` Closes apache#36698 from ulysses-you/decimal. Lead-authored-by: ulysses-you <ulyssesyou18@gmail.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * fix ut Co-authored-by: Jiaan Geng <beliefer@163.com> Co-authored-by: ulysses-you <ulyssesyou18@gmail.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com>

…mal binary arithmetic (#481) * [SPARK-39270][SQL] JDBC dialect supports registering dialect specific functions ### What changes were proposed in this pull request? The build-in functions in Spark is not the same as JDBC database. We can provide the chance users could register dialect specific functions. ### Why are the changes needed? JDBC dialect supports registering dialect specific functions ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New tests. Closes apache#36649 from beliefer/SPARK-39270. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite ### What changes were proposed in this pull request? `JDBCV2Suite` exists some test case which uses sql keywords are not capitalized. This PR will capitalize sql keywords in `JDBCV2Suite`. ### Why are the changes needed? Capitalize sql keywords in `JDBCV2Suite`. ### Does this PR introduce _any_ user-facing change? 'No'. Just update test cases. ### How was this patch tested? N/A. Closes apache#36805 from beliefer/SPARK-39413. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: huaxingao <huaxin_gao@apple.com> * [SPARK-38997][SPARK-39037][SQL][FOLLOWUP] PushableColumnWithoutNestedColumn` need be translated to predicate too ### What changes were proposed in this pull request? apache#35768 assume the expression in `And`, `Or` and `Not` must be predicate. apache#36370 and apache#36325 supported push down expressions in `GROUP BY` and `ORDER BY`. But the children of `And`, `Or` and `Not` can be `FieldReference.column(name)`. `FieldReference.column(name)` is not a predicate, so the assert may fail. ### Why are the changes needed? This PR fix the bug for `PushableColumnWithoutNestedColumn`. ### Does this PR introduce _any_ user-facing change? 'Yes'. Let the push-down framework more correctly. ### How was this patch tested? New tests Closes apache#36776 from beliefer/SPARK-38997_SPARK-39037_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic ### What changes were proposed in this pull request? The main change: - Add a new method `resultDecimalType` in `BinaryArithmetic` - Add a new expression `DecimalAddNoOverflowCheck` for the internal decimal add, e.g. `Sum`/`Average`, the different with `Add` is: - `DecimalAddNoOverflowCheck` does not check overflow - `DecimalAddNoOverflowCheck` make `dataType` as its input parameter - Merge the decimal precision code of `DecimalPrecision` into each arithmetic data type, so every arithmetic should report the accurate decimal type. And we can remove the unused expression `PromotePrecision` and related code - Merge `CheckOverflow` iinto arithmetic eval and code-gen code path, so every arithmetic can handle the overflow case during runtime Merge `PromotePrecision` into `dataType`, for example, `Add`: ```scala override def resultDecimalType(p1: Int, s1: Int, p2: Int, s2: Int): DecimalType = { val resultScale = max(s1, s2) if (allowPrecisionLoss) { DecimalType.adjustPrecisionScale(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale) } else { DecimalType.bounded(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale) } } ``` Merge `CheckOverflow`, for example, `Add` eval: ```scala dataType match { case decimalType: DecimalType => val value = numeric.plus(input1, input2) checkOverflow(value.asInstanceOf[Decimal], decimalType) ... } ``` Note that, `CheckOverflow` is still useful after this pr, e.g. `RowEncoder`. We can do further in a separate pr. ### Why are the changes needed? Fix the bug of `TypeCoercion`, for example: ```sql SELECT CAST(1 AS DECIMAL(28, 2)) UNION ALL SELECT CAST(1 AS DECIMAL(18, 2)) / CAST(1 AS DECIMAL(18, 2)); ``` Relax the decimal precision at runtime, so we do not need redundant Cast ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? Pass exists test and add some bug fix test in `decimalArithmeticOperations.sql` Closes apache#36698 from ulysses-you/decimal. Lead-authored-by: ulysses-you <ulyssesyou18@gmail.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * fix ut Co-authored-by: Jiaan Geng <beliefer@163.com> Co-authored-by: ulysses-you <ulyssesyou18@gmail.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com>

…mal binary arithmetic (#481) * [SPARK-39270][SQL] JDBC dialect supports registering dialect specific functions The build-in functions in Spark is not the same as JDBC database. We can provide the chance users could register dialect specific functions. JDBC dialect supports registering dialect specific functions 'No'. New feature. New tests. Closes apache#36649 from beliefer/SPARK-39270. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite `JDBCV2Suite` exists some test case which uses sql keywords are not capitalized. This PR will capitalize sql keywords in `JDBCV2Suite`. Capitalize sql keywords in `JDBCV2Suite`. 'No'. Just update test cases. N/A. Closes apache#36805 from beliefer/SPARK-39413. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: huaxingao <huaxin_gao@apple.com> * [SPARK-38997][SPARK-39037][SQL][FOLLOWUP] PushableColumnWithoutNestedColumn` need be translated to predicate too apache#35768 assume the expression in `And`, `Or` and `Not` must be predicate. apache#36370 and apache#36325 supported push down expressions in `GROUP BY` and `ORDER BY`. But the children of `And`, `Or` and `Not` can be `FieldReference.column(name)`. `FieldReference.column(name)` is not a predicate, so the assert may fail. This PR fix the bug for `PushableColumnWithoutNestedColumn`. 'Yes'. Let the push-down framework more correctly. New tests Closes apache#36776 from beliefer/SPARK-38997_SPARK-39037_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic The main change: - Add a new method `resultDecimalType` in `BinaryArithmetic` - Add a new expression `DecimalAddNoOverflowCheck` for the internal decimal add, e.g. `Sum`/`Average`, the different with `Add` is: - `DecimalAddNoOverflowCheck` does not check overflow - `DecimalAddNoOverflowCheck` make `dataType` as its input parameter - Merge the decimal precision code of `DecimalPrecision` into each arithmetic data type, so every arithmetic should report the accurate decimal type. And we can remove the unused expression `PromotePrecision` and related code - Merge `CheckOverflow` iinto arithmetic eval and code-gen code path, so every arithmetic can handle the overflow case during runtime Merge `PromotePrecision` into `dataType`, for example, `Add`: ```scala override def resultDecimalType(p1: Int, s1: Int, p2: Int, s2: Int): DecimalType = { val resultScale = max(s1, s2) if (allowPrecisionLoss) { DecimalType.adjustPrecisionScale(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale) } else { DecimalType.bounded(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale) } } ``` Merge `CheckOverflow`, for example, `Add` eval: ```scala dataType match { case decimalType: DecimalType => val value = numeric.plus(input1, input2) checkOverflow(value.asInstanceOf[Decimal], decimalType) ... } ``` Note that, `CheckOverflow` is still useful after this pr, e.g. `RowEncoder`. We can do further in a separate pr. Fix the bug of `TypeCoercion`, for example: ```sql SELECT CAST(1 AS DECIMAL(28, 2)) UNION ALL SELECT CAST(1 AS DECIMAL(18, 2)) / CAST(1 AS DECIMAL(18, 2)); ``` Relax the decimal precision at runtime, so we do not need redundant Cast yes, bug fix Pass exists test and add some bug fix test in `decimalArithmeticOperations.sql` Closes apache#36698 from ulysses-you/decimal. Lead-authored-by: ulysses-you <ulyssesyou18@gmail.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * fix ut Co-authored-by: Jiaan Geng <beliefer@163.com> Co-authored-by: ulysses-you <ulyssesyou18@gmail.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com>

[SPARK-38997][SPARK-39037][FOLLOWUP] PushableColumnWithoutNestedColum…

e723341

…n need be translated to predicate too

github-actions bot added the SQL label Jun 6, 2022

beliefer changed the title ~~[SPARK-38997][SPARK-39037][FOLLOWUP] PushableColumnWithoutNestedColumn need be translated to predicate too~~ [SPARK-38997][SPARK-39037][SQL][FOLLOWUP] PushableColumnWithoutNestedColumn need be translated to predicate too Jun 6, 2022

cloud-fan reviewed Jun 7, 2022

View reviewed changes

cloud-fan approved these changes Jun 7, 2022

View reviewed changes

huaxingao approved these changes Jun 7, 2022

View reviewed changes

Update code

de4bac5

cloud-fan closed this in 125555c Jun 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-38997][SPARK-39037][SQL][FOLLOWUP] `PushableColumnWithoutNestedColumn` need be translated to predicate too #36776

[SPARK-38997][SPARK-39037][SQL][FOLLOWUP] `PushableColumnWithoutNestedColumn` need be translated to predicate too #36776

beliefer commented Jun 6, 2022 •

edited

Loading

beliefer commented Jun 7, 2022

cloud-fan Jun 7, 2022

huaxingao Jun 7, 2022

beliefer Jun 7, 2022

cloud-fan commented Jun 9, 2022

beliefer commented Jun 9, 2022

[SPARK-38997][SPARK-39037][SQL][FOLLOWUP] PushableColumnWithoutNestedColumn need be translated to predicate too #36776

[SPARK-38997][SPARK-39037][SQL][FOLLOWUP] PushableColumnWithoutNestedColumn need be translated to predicate too #36776

Conversation

beliefer commented Jun 6, 2022 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

beliefer commented Jun 7, 2022

cloud-fan Jun 7, 2022

Choose a reason for hiding this comment

huaxingao Jun 7, 2022

Choose a reason for hiding this comment

beliefer Jun 7, 2022

Choose a reason for hiding this comment

cloud-fan commented Jun 9, 2022

beliefer commented Jun 9, 2022

[SPARK-38997][SPARK-39037][SQL][FOLLOWUP] `PushableColumnWithoutNestedColumn` need be translated to predicate too #36776

[SPARK-38997][SPARK-39037][SQL][FOLLOWUP] `PushableColumnWithoutNestedColumn` need be translated to predicate too #36776

beliefer commented Jun 6, 2022 •

edited

Loading