Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38997][SQL] DS V2 aggregate push-down supports group by expressions #36325

Closed
wants to merge 9 commits into from

Conversation

beliefer
Copy link
Contributor

@beliefer beliefer commented Apr 22, 2022

What changes were proposed in this pull request?

Currently, Spark DS V2 aggregate push-down only supports group by column.
But the SQL show below is very useful and common.

SELECT
  CASE 
    WHEN 'SALARY' > 8000.00
      AND 'SALARY' < 10000.00
    THEN 'SALARY'
    ELSE 0.00
  END AS key,
  SUM('SALARY')
FROM "test"."employee"
GROUP BY key

Why are the changes needed?

Let DS V2 aggregate push-down supports group by expressions

Does this PR introduce any user-facing change?

'No'.
New feature.

How was this patch tested?

New tests

@github-actions github-actions bot added the SQL label Apr 22, 2022
@beliefer
Copy link
Contributor Author

ping @huaxingao cc @cloud-fan

@@ -106,7 +107,9 @@ object AggregatePushDownUtils {
// aggregate push down simple and don't handle this complicate case for now.
return None
}
aggregation.groupByColumns.foreach { col =>
aggregation.groupByExpressions.foreach { expr =>
assert(expr.isInstanceOf[FieldReference])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can we make this assertion? do you mean the v2 pushdown framework still only support group by attrs?

@@ -206,14 +213,15 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper wit
DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
if (r.supportCompletePushDown(pushedAggregates.get)) {
val projectExpressions = finalResultExpressions.map { expr =>
// TODO At present, only push down group by attribute is supported.
// In future, more attribute conversion is extended here. e.g. GetStructField
expr.transform {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's explicitly use transformDown as we need to match an expression tree for the group by expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

"""
|PushedAggregates: [SUM(SALARY)],
|PushedFilters: [],
|PushedGroupByColumns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|PushedGroupByColumns:
|PushedGroupByExpressions:

@@ -106,7 +107,9 @@ object AggregatePushDownUtils {
// aggregate push down simple and don't handle this complicate case for now.
return None
}
aggregation.groupByColumns.foreach { col =>
aggregation.groupByExpressions.foreach { expr =>
if (!expr.isInstanceOf[FieldReference]) return None
Copy link
Contributor

@cloud-fan cloud-fan Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

val colName = expr match {
  case f: FieldReference if f.fieldNames.length == 1 => f.fieldNames.head
  case _ => None
}
if (colName.isEmpty || ! isPartitionCol(colName.get)) return None
finalSchema = ...

val groupByColNames = aggregation.groupByColumns.map(_.fieldNames.head)
val groupByColNames =
aggregation.groupByExpressions.map { expr =>
assert(expr.isInstanceOf[FieldReference])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, we can even have a common function to extract column name

private def extractColName(v2Expr: V2Expression): Option[String] = ...

case PushableColumnWithoutNestedColumn(name) =>
Some(FieldReference.column(name).asInstanceOf[FieldReference])
def translateGroupBy(e: Expression): Option[V2Expression] = e match {
case PushableExpression(expr) => Some(expr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to confirm: this supports nested column, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PushableExpression uses PushableColumnWithoutNestedColumn in default.

case (expr, b) =>
if (!groupByExprToOutputOrdinal.contains(expr.canonicalized)) {
groupByExprToOutputOrdinal(expr.canonicalized) = ordinal
ordinal += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks incorrect. We use the ordinal to access groupAttrs, but here we skip incrementing ordinal if group by expression is an attribute. I think we should use normalizedGroupingExpressions.zip(newOutput).zipWithIndex.

BTW, do we have a test for mixed group by attr and group by expr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I forgot the case.

@@ -106,11 +107,11 @@ object AggregatePushDownUtils {
// aggregate push down simple and don't handle this complicate case for now.
return None
}
aggregation.groupByColumns.foreach { col =>
aggregation.groupByExpressions.flatMap(extractColName).foreach { fieldName =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should return None earlier if any of the group by expr is not a column.

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for one issue

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.3! (I believe this is the last API breaking change for DS v2 pushdown API)

@cloud-fan cloud-fan closed this in ee6ea3c Apr 27, 2022
cloud-fan pushed a commit that referenced this pull request Apr 27, 2022
…sions

### What changes were proposed in this pull request?
Currently, Spark DS V2 aggregate push-down only supports group by column.
But the SQL show below is very useful and common.
```
SELECT
  CASE
    WHEN 'SALARY' > 8000.00
      AND 'SALARY' < 10000.00
    THEN 'SALARY'
    ELSE 0.00
  END AS key,
  SUM('SALARY')
FROM "test"."employee"
GROUP BY key
```

### Why are the changes needed?
Let DS V2 aggregate push-down supports group by expressions

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New tests

Closes #36325 from beliefer/SPARK-38997.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit ee6ea3c)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@beliefer
Copy link
Contributor Author

@cloud-fan Thank you for the help!

chenzhx pushed a commit to chenzhx/spark that referenced this pull request May 17, 2022
…sions

### What changes were proposed in this pull request?
Currently, Spark DS V2 aggregate push-down only supports group by column.
But the SQL show below is very useful and common.
```
SELECT
  CASE
    WHEN 'SALARY' > 8000.00
      AND 'SALARY' < 10000.00
    THEN 'SALARY'
    ELSE 0.00
  END AS key,
  SUM('SALARY')
FROM "test"."employee"
GROUP BY key
```

### Why are the changes needed?
Let DS V2 aggregate push-down supports group by expressions

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New tests

Closes apache#36325 from beliefer/SPARK-38997.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
chenzhx pushed a commit to chenzhx/spark that referenced this pull request May 27, 2022
…sions

### What changes were proposed in this pull request?
Currently, Spark DS V2 aggregate push-down only supports group by column.
But the SQL show below is very useful and common.
```
SELECT
  CASE
    WHEN 'SALARY' > 8000.00
      AND 'SALARY' < 10000.00
    THEN 'SALARY'
    ELSE 0.00
  END AS key,
  SUM('SALARY')
FROM "test"."employee"
GROUP BY key
```

### Why are the changes needed?
Let DS V2 aggregate push-down supports group by expressions

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New tests

Closes apache#36325 from beliefer/SPARK-38997.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

fix ut
chenzhx added a commit to Kyligence/spark that referenced this pull request May 31, 2022
* [SPARK-38997][SQL] DS V2 aggregate push-down supports group by expressions

### What changes were proposed in this pull request?
Currently, Spark DS V2 aggregate push-down only supports group by column.
But the SQL show below is very useful and common.
```
SELECT
  CASE
    WHEN 'SALARY' > 8000.00
      AND 'SALARY' < 10000.00
    THEN 'SALARY'
    ELSE 0.00
  END AS key,
  SUM('SALARY')
FROM "test"."employee"
GROUP BY key
```

### Why are the changes needed?
Let DS V2 aggregate push-down supports group by expressions

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New tests

Closes apache#36325 from beliefer/SPARK-38997.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

fix ut

* [SPARK-39135][SQL] DS V2 aggregate partial push-down should supports group by without aggregate functions

### What changes were proposed in this pull request?
Currently, the SQL show below not supported by DS V2 aggregate partial push-down.
`select key from tab group by key`

### Why are the changes needed?
Make DS V2 aggregate partial push-down supports group by without aggregate functions.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New tests

Closes apache#36492 from beliefer/SPARK-39135.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39157][SQL] H2Dialect should override getJDBCType so as make the data type is correct

### What changes were proposed in this pull request?
Currently, `H2Dialect` not implement `getJDBCType` of `JdbcDialect`, so the DS V2 push-down will throw exception show below:
```
Job aborted due to stage failure: Task 0 in stage 13.0 failed 1 times, most recent failure: Lost task 0.0 in stage 13.0 (TID 13) (jiaan-gengdembp executor driver):
 org.h2.jdbc.JdbcSQLNonTransientException: Unknown data type: "STRING"; SQL statement:
SELECT "DEPT","NAME","SALARY","BONUS","IS_MANAGER" FROM "test"."employee"  WHERE ("BONUS" IS NOT NULL) AND ("DEPT" IS NOT NULL) AND (CAST("BONUS" AS string) LIKE '%30%') AND (CAST("DEPT" AS byte) > 1) AND (CAST("DEPT" AS short) > 1) AND (CAST("BONUS" AS decimal(20,2)) > 1200.00)    [50004-210]
```
H2Dialect should implement `getJDBCType` of `JdbcDialect`.

### Why are the changes needed?
 make the H2 data type is correct.

### Does this PR introduce _any_ user-facing change?
'Yes'.
Fix a bug for `H2Dialect`.

### How was this patch tested?
New tests.

Closes apache#36516 from beliefer/SPARK-39157.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39162][SQL] Jdbc dialect should decide which function could be pushed down

### What changes were proposed in this pull request?
Regardless of whether the functions are ANSI or not, most databases are actually unsure of their support.
So we should add a new API into `JdbcDialect` so that jdbc dialect decide which function could be pushed down.

### Why are the changes needed?
Let function push-down more flexible.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
Exists tests.

Closes apache#36521 from beliefer/SPARK-39162.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-38897][SQL] DS V2 supports push down string functions

### What changes were proposed in this pull request?

Currently, Spark have some string functions of ANSI standard. Please refer

https://github.com/apache/spark/blob/2f8613f22c0750c00cf1dcfb2f31c431d8dc1be7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L503

These functions show below:
`SUBSTRING,`
`UPPER`,
`LOWER`,
`TRANSLATE`,
`TRIM`,
`OVERLAY`

The mainstream databases support these functions show below.
Function | PostgreSQL | ClickHouse | H2 | MySQL | Oracle | Redshift | Presto | Teradata | Snowflake | DB2 | Vertica | Exasol | SqlServer | Yellowbrick | Impala | Mariadb | Druid | Pig | SQLite | Influxdata | Singlestore | ElasticSearch
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
`SUBSTRING` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes
`UPPER` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes
`LOWER` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | YES | Yes | Yes | Yes | Yes
`TRIM` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes
`TRANSLATE` | Yes | No | Yes | No | Yes | Yes | No | No | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | No | No | No | No | No | No
`OVERLAY` | Yes | No | No | No | Yes | No | No | No | No | Yes | Yes | No | No | No | No | No | No | No | No | No | No | No

DS V2 should supports push down these string functions.

### Why are the changes needed?

DS V2 supports push down string functions

### Does this PR introduce _any_ user-facing change?

'No'.
New feature.

### How was this patch tested?

New tests.

Closes apache#36330 from chenzhx/spark-master.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression

### What changes were proposed in this pull request?
This is a ANSI SQL and feature id is `F861`
```
<query expression> ::=
[ <with clause> ] <query expression body>
[ <order by clause> ] [ <result offset clause> ] [ <fetch first clause> ]

<result offset clause> ::=
OFFSET <offset row count> { ROW | ROWS }
```
For example:
```
SELECT customer_name, customer_gender FROM customer_dimension
   WHERE occupation='Dancer' AND customer_city = 'San Francisco' ORDER BY customer_name;
    customer_name     | customer_gender
----------------------+-----------------
 Amy X. Lang          | Female
 Anna H. Li           | Female
 Brian O. Weaver      | Male
 Craig O. Pavlov      | Male
 Doug Z. Goldberg     | Male
 Harold S. Jones      | Male
 Jack E. Perkins      | Male
 Joseph W. Overstreet | Male
 Kevin . Campbell     | Male
 Raja Y. Wilson       | Male
 Samantha O. Brown    | Female
 Steve H. Gauthier    | Male
 William . Nielson    | Male
 William Z. Roy       | Male
(14 rows)

SELECT customer_name, customer_gender FROM customer_dimension
   WHERE occupation='Dancer' AND customer_city = 'San Francisco' ORDER BY customer_name OFFSET 8;
   customer_name   | customer_gender
-------------------+-----------------
 Kevin . Campbell  | Male
 Raja Y. Wilson    | Male
 Samantha O. Brown | Female
 Steve H. Gauthier | Male
 William . Nielson | Male
 William Z. Roy    | Male
(6 rows)
```
There are some mainstream database support the syntax.

**Druid**
https://druid.apache.org/docs/latest/querying/sql.html#offset

**Kylin**
http://kylin.apache.org/docs/tutorial/sql_reference.html#QUERYSYNTAX

**Exasol**
https://docs.exasol.com/sql/select.htm

**Greenplum**
http://docs.greenplum.org/6-8/ref_guide/sql_commands/SELECT.html

**MySQL**
https://dev.mysql.com/doc/refman/5.6/en/select.html

**Monetdb**
https://www.monetdb.org/Documentation/SQLreference/SQLSyntaxOverview#SELECT

**PostgreSQL**
https://www.postgresql.org/docs/11/queries-limit.html

**Sqlite**
https://www.sqlite.org/lang_select.html

**Vertica**
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Statements/SELECT/OFFSETClause.htm?zoom_highlight=offset

The description for design:
**1**. Consider `OFFSET` as the special case of `LIMIT`. For example:
`SELECT * FROM a limit 10;` similar to `SELECT * FROM a limit 10 offset 0;`
`SELECT * FROM a offset 10;` similar to `SELECT * FROM a limit -1 offset 10;`
**2**. Because the current implement of `LIMIT` has good performance. For example:
`SELECT * FROM a limit 10;` parsed to the logic plan as below:
```
GlobalLimit (limit = 10)
|--LocalLimit (limit = 10)
```
and then the physical plan as below:
```
GlobalLimitExec (limit = 10) // Take the first 10 rows globally
|--LocalLimitExec (limit = 10) // Take the first 10 rows locally
```
This operator reduce massive shuffle and has good performance.
Sometimes, the logic plan transformed to the physical plan as:
```
CollectLimitExec (limit = 10) // Take the first 10 rows globally
```
If the SQL contains order by, such as `SELECT * FROM a order by c limit 10;`.
This SQL will be transformed to the physical plan as below:
```
TakeOrderedAndProjectExec (limit = 10) // Take the first 10 rows after sort globally
```

Based on this situation, this PR produces the following operations. For example:
`SELECT * FROM a limit 10 offset 10;` parsed to the logic plan as below:
```
GlobalLimit (limit = 10)
|--LocalLimit (limit = 10)
   |--Offset (offset = 10)
```
After optimization, the above logic plan will be transformed to:
```
GlobalLimitAndOffset (limit = 10, offset = 10) // Limit clause accompanied by offset clause
|--LocalLimit (limit = 20)   // 10 + offset = 20
```

and then the physical plan as below:
```
GlobalLimitAndOffsetExec (limit = 10, offset = 10) // Skip the first 10 rows and take the next 10 rows globally
|--LocalLimitExec (limit = 20) // Take the first 20(limit + offset) rows locally
```
Sometimes, the logic plan transformed to the physical plan as:
```
CollectLimitExec (limit = 10, offset = 10) // Skip the first 10 rows and take the next 10 rows globally
```
If the SQL contains order by, such as `SELECT * FROM a order by c limit 10 offset 10;`.
This SQL will be transformed to the physical plan as below:
```
TakeOrderedAndProjectExec (limit = 10, offset 10) // Skip the first 10 rows and take the next 10 rows after sort globally
```
**3**.In addition to the above, there is a special case that is only offset but no limit. For example:
`SELECT * FROM a offset 10;` parsed to the logic plan as below:
```
Offset (offset = 10) // Only offset clause
```
If offset is very large, will generate a lot of overhead. So this PR will refuse use offset clause without limit clause, although we can parse, transform and execute it.

A balanced idea is add a configuration item `spark.sql.forceUsingOffsetWithoutLimit` to force running query when user knows the offset is small enough. The default value of `spark.sql.forceUsingOffsetWithoutLimit` is false. This PR just came up with the idea so that it could be implemented at a better time in the future.

Note: The origin PR to support this feature is apache#25416.
Because the origin PR too old, there exists massive conflict which is hard to resolve. So I open this new PR to support this feature.

### Why are the changes needed?
new feature

### Does this PR introduce any user-facing change?
'No'

### How was this patch tested?
Exists and new UT

Closes apache#35975 from beliefer/SPARK-28330.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39057][SQL] Offset could work without Limit

### What changes were proposed in this pull request?
Currently, `Offset` must work with `Limit`. The behavior not allow to use offset alone and add offset API into `DataFrame`.

If we use `Offset` alone, there are two situations:
1. If `Offset` is the last operator, collect the result to the driver and then drop/skip the first n (offset value) rows. Users can test or debug `Offset` in the way.
2. If `Offset` is the intermediate operator, shuffle all the result to one task and drop/skip the first n (offset value) rows and the result will be passed to the downstream operator.

For example, `SELECT * FROM a offset 10; ` parsed to the logic plan as below:
```
Offset (offset = 10) // Only offset clause
|--Relation
```

and then the physical plan as below:
```
CollectLimitExec(limit = -1, offset = 10) // Collect the result to the driver and skip the first 10 rows
|--JDBCRelation
```
or
```
GlobalLimitAndOffsetExec(limit = -1, offset = 10) // Collect the result and skip the first 10 rows
|--JDBCRelation
```

After this PR merged, users could input the SQL show below:
```
SELECT '' AS ten, unique1, unique2, stringu1
 		FROM onek
 		ORDER BY unique1 OFFSET 990;
```

Note: apache#35975 supports offset clause, it create a logical node named
`GlobalLimitAndOffset`. In fact, we can avoid use this node and use `Offset` instead and the latter is good with unify name.

### Why are the changes needed?
Improve the implement of offset clause.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
Exists test cases.

Closes apache#36417 from beliefer/SPARK-28330_followup2.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39159][SQL] Add new Dataset API for Offset

### What changes were proposed in this pull request?
Currently, Spark added `Offset` operator.
This PR try to add `offset` API into `Dataset`.

### Why are the changes needed?
`offset` API is very useful and construct test case more easily.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New tests.

Closes apache#36519 from beliefer/SPARK-39159.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* fix ut

* update spark version

Co-authored-by: Jiaan Geng <beliefer@163.com>
cloud-fan pushed a commit that referenced this pull request Jun 9, 2022
…Column` need be translated to predicate too

### What changes were proposed in this pull request?
#35768 assume the expression in `And`, `Or` and `Not` must be predicate.
#36370 and #36325 supported push down expressions in `GROUP BY` and `ORDER BY`. But the children of `And`, `Or` and `Not` can be `FieldReference.column(name)`.
`FieldReference.column(name)` is not a predicate, so the assert may fail.

### Why are the changes needed?
This PR fix the bug for `PushableColumnWithoutNestedColumn`.

### Does this PR introduce _any_ user-facing change?
'Yes'.
Let the push-down framework more correctly.

### How was this patch tested?
New tests

Closes #36776 from beliefer/SPARK-38997_SPARK-39037_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Jun 9, 2022
…Column` need be translated to predicate too

### What changes were proposed in this pull request?
#35768 assume the expression in `And`, `Or` and `Not` must be predicate.
#36370 and #36325 supported push down expressions in `GROUP BY` and `ORDER BY`. But the children of `And`, `Or` and `Not` can be `FieldReference.column(name)`.
`FieldReference.column(name)` is not a predicate, so the assert may fail.

### Why are the changes needed?
This PR fix the bug for `PushableColumnWithoutNestedColumn`.

### Does this PR introduce _any_ user-facing change?
'Yes'.
Let the push-down framework more correctly.

### How was this patch tested?
New tests

Closes #36776 from beliefer/SPARK-38997_SPARK-39037_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 125555c)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
chenzhx pushed a commit to chenzhx/spark that referenced this pull request Jun 13, 2022
…Column` need be translated to predicate too

### What changes were proposed in this pull request?
apache#35768 assume the expression in `And`, `Or` and `Not` must be predicate.
apache#36370 and apache#36325 supported push down expressions in `GROUP BY` and `ORDER BY`. But the children of `And`, `Or` and `Not` can be `FieldReference.column(name)`.
`FieldReference.column(name)` is not a predicate, so the assert may fail.

### Why are the changes needed?
This PR fix the bug for `PushableColumnWithoutNestedColumn`.

### Does this PR introduce _any_ user-facing change?
'Yes'.
Let the push-down framework more correctly.

### How was this patch tested?
New tests

Closes apache#36776 from beliefer/SPARK-38997_SPARK-39037_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
chenzhx pushed a commit to chenzhx/spark that referenced this pull request Jun 15, 2022
…Column` need be translated to predicate too

### What changes were proposed in this pull request?
apache#35768 assume the expression in `And`, `Or` and `Not` must be predicate.
apache#36370 and apache#36325 supported push down expressions in `GROUP BY` and `ORDER BY`. But the children of `And`, `Or` and `Not` can be `FieldReference.column(name)`.
`FieldReference.column(name)` is not a predicate, so the assert may fail.

### Why are the changes needed?
This PR fix the bug for `PushableColumnWithoutNestedColumn`.

### Does this PR introduce _any_ user-facing change?
'Yes'.
Let the push-down framework more correctly.

### How was this patch tested?
New tests

Closes apache#36776 from beliefer/SPARK-38997_SPARK-39037_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
chenzhx added a commit to Kyligence/spark that referenced this pull request Jun 15, 2022
…mal binary arithmetic (#481)

* [SPARK-39270][SQL] JDBC dialect supports registering dialect specific functions

### What changes were proposed in this pull request?
The build-in functions in Spark is not the same as JDBC database.
We can provide the chance users could register dialect specific functions.

### Why are the changes needed?
JDBC dialect supports registering dialect specific functions

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New tests.

Closes apache#36649 from beliefer/SPARK-39270.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite

### What changes were proposed in this pull request?
`JDBCV2Suite` exists some test case which uses sql keywords are not capitalized.
This PR will capitalize sql keywords in `JDBCV2Suite`.

### Why are the changes needed?
Capitalize sql keywords in `JDBCV2Suite`.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update test cases.

### How was this patch tested?
N/A.

Closes apache#36805 from beliefer/SPARK-39413.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-38997][SPARK-39037][SQL][FOLLOWUP] PushableColumnWithoutNestedColumn` need be translated to predicate too

### What changes were proposed in this pull request?
apache#35768 assume the expression in `And`, `Or` and `Not` must be predicate.
apache#36370 and apache#36325 supported push down expressions in `GROUP BY` and `ORDER BY`. But the children of `And`, `Or` and `Not` can be `FieldReference.column(name)`.
`FieldReference.column(name)` is not a predicate, so the assert may fail.

### Why are the changes needed?
This PR fix the bug for `PushableColumnWithoutNestedColumn`.

### Does this PR introduce _any_ user-facing change?
'Yes'.
Let the push-down framework more correctly.

### How was this patch tested?
New tests

Closes apache#36776 from beliefer/SPARK-38997_SPARK-39037_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

### What changes were proposed in this pull request?

The main change:
- Add a new method `resultDecimalType` in `BinaryArithmetic`
- Add a new expression `DecimalAddNoOverflowCheck` for the internal decimal add, e.g. `Sum`/`Average`, the different with `Add` is:
  - `DecimalAddNoOverflowCheck` does not check overflow
  - `DecimalAddNoOverflowCheck` make `dataType` as its input parameter
- Merge the decimal precision code of `DecimalPrecision` into each arithmetic data type, so every arithmetic should report the accurate decimal type. And we can remove the unused expression `PromotePrecision` and related code
- Merge `CheckOverflow` iinto arithmetic eval and code-gen code path, so every arithmetic can handle the overflow case during runtime

Merge `PromotePrecision` into `dataType`, for example, `Add`:
```scala
override def resultDecimalType(p1: Int, s1: Int, p2: Int, s2: Int): DecimalType = {
  val resultScale = max(s1, s2)
  if (allowPrecisionLoss) {
    DecimalType.adjustPrecisionScale(max(p1 - s1, p2 - s2) + resultScale + 1,
      resultScale)
  } else {
    DecimalType.bounded(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale)
  }
}
```

Merge `CheckOverflow`, for example, `Add` eval:
```scala
dataType match {
  case decimalType: DecimalType =>
    val value = numeric.plus(input1, input2)
    checkOverflow(value.asInstanceOf[Decimal], decimalType)
  ...
}
```

Note that, `CheckOverflow` is still useful after this pr, e.g. `RowEncoder`. We can do further in a separate pr.

### Why are the changes needed?

Fix the bug of `TypeCoercion`, for example:
```sql
SELECT CAST(1 AS DECIMAL(28, 2))
UNION ALL
SELECT CAST(1 AS DECIMAL(18, 2)) / CAST(1 AS DECIMAL(18, 2));
```

Relax the decimal precision at runtime, so we do not need redundant Cast

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

Pass exists test and add some bug fix test in `decimalArithmeticOperations.sql`

Closes apache#36698 from ulysses-you/decimal.

Lead-authored-by: ulysses-you <ulyssesyou18@gmail.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Co-authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* fix ut

Co-authored-by: Jiaan Geng <beliefer@163.com>
Co-authored-by: ulysses-you <ulyssesyou18@gmail.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Co-authored-by: Wenchen Fan <wenchen@databricks.com>
yabola pushed a commit to Kyligence/spark that referenced this pull request Jun 21, 2022
…mal binary arithmetic (#481)

* [SPARK-39270][SQL] JDBC dialect supports registering dialect specific functions

### What changes were proposed in this pull request?
The build-in functions in Spark is not the same as JDBC database.
We can provide the chance users could register dialect specific functions.

### Why are the changes needed?
JDBC dialect supports registering dialect specific functions

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New tests.

Closes apache#36649 from beliefer/SPARK-39270.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite

### What changes were proposed in this pull request?
`JDBCV2Suite` exists some test case which uses sql keywords are not capitalized.
This PR will capitalize sql keywords in `JDBCV2Suite`.

### Why are the changes needed?
Capitalize sql keywords in `JDBCV2Suite`.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update test cases.

### How was this patch tested?
N/A.

Closes apache#36805 from beliefer/SPARK-39413.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-38997][SPARK-39037][SQL][FOLLOWUP] PushableColumnWithoutNestedColumn` need be translated to predicate too

### What changes were proposed in this pull request?
apache#35768 assume the expression in `And`, `Or` and `Not` must be predicate.
apache#36370 and apache#36325 supported push down expressions in `GROUP BY` and `ORDER BY`. But the children of `And`, `Or` and `Not` can be `FieldReference.column(name)`.
`FieldReference.column(name)` is not a predicate, so the assert may fail.

### Why are the changes needed?
This PR fix the bug for `PushableColumnWithoutNestedColumn`.

### Does this PR introduce _any_ user-facing change?
'Yes'.
Let the push-down framework more correctly.

### How was this patch tested?
New tests

Closes apache#36776 from beliefer/SPARK-38997_SPARK-39037_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

### What changes were proposed in this pull request?

The main change:
- Add a new method `resultDecimalType` in `BinaryArithmetic`
- Add a new expression `DecimalAddNoOverflowCheck` for the internal decimal add, e.g. `Sum`/`Average`, the different with `Add` is:
  - `DecimalAddNoOverflowCheck` does not check overflow
  - `DecimalAddNoOverflowCheck` make `dataType` as its input parameter
- Merge the decimal precision code of `DecimalPrecision` into each arithmetic data type, so every arithmetic should report the accurate decimal type. And we can remove the unused expression `PromotePrecision` and related code
- Merge `CheckOverflow` iinto arithmetic eval and code-gen code path, so every arithmetic can handle the overflow case during runtime

Merge `PromotePrecision` into `dataType`, for example, `Add`:
```scala
override def resultDecimalType(p1: Int, s1: Int, p2: Int, s2: Int): DecimalType = {
  val resultScale = max(s1, s2)
  if (allowPrecisionLoss) {
    DecimalType.adjustPrecisionScale(max(p1 - s1, p2 - s2) + resultScale + 1,
      resultScale)
  } else {
    DecimalType.bounded(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale)
  }
}
```

Merge `CheckOverflow`, for example, `Add` eval:
```scala
dataType match {
  case decimalType: DecimalType =>
    val value = numeric.plus(input1, input2)
    checkOverflow(value.asInstanceOf[Decimal], decimalType)
  ...
}
```

Note that, `CheckOverflow` is still useful after this pr, e.g. `RowEncoder`. We can do further in a separate pr.

### Why are the changes needed?

Fix the bug of `TypeCoercion`, for example:
```sql
SELECT CAST(1 AS DECIMAL(28, 2))
UNION ALL
SELECT CAST(1 AS DECIMAL(18, 2)) / CAST(1 AS DECIMAL(18, 2));
```

Relax the decimal precision at runtime, so we do not need redundant Cast

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

Pass exists test and add some bug fix test in `decimalArithmeticOperations.sql`

Closes apache#36698 from ulysses-you/decimal.

Lead-authored-by: ulysses-you <ulyssesyou18@gmail.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Co-authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* fix ut

Co-authored-by: Jiaan Geng <beliefer@163.com>
Co-authored-by: ulysses-you <ulyssesyou18@gmail.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Co-authored-by: Wenchen Fan <wenchen@databricks.com>
leejaywei pushed a commit to Kyligence/spark that referenced this pull request Jul 14, 2022
…mal binary arithmetic (#481)

* [SPARK-39270][SQL] JDBC dialect supports registering dialect specific functions

The build-in functions in Spark is not the same as JDBC database.
We can provide the chance users could register dialect specific functions.

JDBC dialect supports registering dialect specific functions

'No'.
New feature.

New tests.

Closes apache#36649 from beliefer/SPARK-39270.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite

`JDBCV2Suite` exists some test case which uses sql keywords are not capitalized.
This PR will capitalize sql keywords in `JDBCV2Suite`.

Capitalize sql keywords in `JDBCV2Suite`.

'No'.
Just update test cases.

N/A.

Closes apache#36805 from beliefer/SPARK-39413.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-38997][SPARK-39037][SQL][FOLLOWUP] PushableColumnWithoutNestedColumn` need be translated to predicate too

apache#35768 assume the expression in `And`, `Or` and `Not` must be predicate.
apache#36370 and apache#36325 supported push down expressions in `GROUP BY` and `ORDER BY`. But the children of `And`, `Or` and `Not` can be `FieldReference.column(name)`.
`FieldReference.column(name)` is not a predicate, so the assert may fail.

This PR fix the bug for `PushableColumnWithoutNestedColumn`.

'Yes'.
Let the push-down framework more correctly.

New tests

Closes apache#36776 from beliefer/SPARK-38997_SPARK-39037_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

The main change:
- Add a new method `resultDecimalType` in `BinaryArithmetic`
- Add a new expression `DecimalAddNoOverflowCheck` for the internal decimal add, e.g. `Sum`/`Average`, the different with `Add` is:
  - `DecimalAddNoOverflowCheck` does not check overflow
  - `DecimalAddNoOverflowCheck` make `dataType` as its input parameter
- Merge the decimal precision code of `DecimalPrecision` into each arithmetic data type, so every arithmetic should report the accurate decimal type. And we can remove the unused expression `PromotePrecision` and related code
- Merge `CheckOverflow` iinto arithmetic eval and code-gen code path, so every arithmetic can handle the overflow case during runtime

Merge `PromotePrecision` into `dataType`, for example, `Add`:
```scala
override def resultDecimalType(p1: Int, s1: Int, p2: Int, s2: Int): DecimalType = {
  val resultScale = max(s1, s2)
  if (allowPrecisionLoss) {
    DecimalType.adjustPrecisionScale(max(p1 - s1, p2 - s2) + resultScale + 1,
      resultScale)
  } else {
    DecimalType.bounded(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale)
  }
}
```

Merge `CheckOverflow`, for example, `Add` eval:
```scala
dataType match {
  case decimalType: DecimalType =>
    val value = numeric.plus(input1, input2)
    checkOverflow(value.asInstanceOf[Decimal], decimalType)
  ...
}
```

Note that, `CheckOverflow` is still useful after this pr, e.g. `RowEncoder`. We can do further in a separate pr.

Fix the bug of `TypeCoercion`, for example:
```sql
SELECT CAST(1 AS DECIMAL(28, 2))
UNION ALL
SELECT CAST(1 AS DECIMAL(18, 2)) / CAST(1 AS DECIMAL(18, 2));
```

Relax the decimal precision at runtime, so we do not need redundant Cast

yes, bug fix

Pass exists test and add some bug fix test in `decimalArithmeticOperations.sql`

Closes apache#36698 from ulysses-you/decimal.

Lead-authored-by: ulysses-you <ulyssesyou18@gmail.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Co-authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* fix ut

Co-authored-by: Jiaan Geng <beliefer@163.com>
Co-authored-by: ulysses-you <ulyssesyou18@gmail.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Co-authored-by: Wenchen Fan <wenchen@databricks.com>
zheniantoushipashi pushed a commit to Kyligence/spark that referenced this pull request Aug 8, 2022
…mal binary arithmetic (#481)

* [SPARK-39270][SQL] JDBC dialect supports registering dialect specific functions

### What changes were proposed in this pull request?
The build-in functions in Spark is not the same as JDBC database.
We can provide the chance users could register dialect specific functions.

### Why are the changes needed?
JDBC dialect supports registering dialect specific functions

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New tests.

Closes apache#36649 from beliefer/SPARK-39270.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite

### What changes were proposed in this pull request?
`JDBCV2Suite` exists some test case which uses sql keywords are not capitalized.
This PR will capitalize sql keywords in `JDBCV2Suite`.

### Why are the changes needed?
Capitalize sql keywords in `JDBCV2Suite`.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update test cases.

### How was this patch tested?
N/A.

Closes apache#36805 from beliefer/SPARK-39413.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-38997][SPARK-39037][SQL][FOLLOWUP] PushableColumnWithoutNestedColumn` need be translated to predicate too

### What changes were proposed in this pull request?
apache#35768 assume the expression in `And`, `Or` and `Not` must be predicate.
apache#36370 and apache#36325 supported push down expressions in `GROUP BY` and `ORDER BY`. But the children of `And`, `Or` and `Not` can be `FieldReference.column(name)`.
`FieldReference.column(name)` is not a predicate, so the assert may fail.

### Why are the changes needed?
This PR fix the bug for `PushableColumnWithoutNestedColumn`.

### Does this PR introduce _any_ user-facing change?
'Yes'.
Let the push-down framework more correctly.

### How was this patch tested?
New tests

Closes apache#36776 from beliefer/SPARK-38997_SPARK-39037_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

### What changes were proposed in this pull request?

The main change:
- Add a new method `resultDecimalType` in `BinaryArithmetic`
- Add a new expression `DecimalAddNoOverflowCheck` for the internal decimal add, e.g. `Sum`/`Average`, the different with `Add` is:
  - `DecimalAddNoOverflowCheck` does not check overflow
  - `DecimalAddNoOverflowCheck` make `dataType` as its input parameter
- Merge the decimal precision code of `DecimalPrecision` into each arithmetic data type, so every arithmetic should report the accurate decimal type. And we can remove the unused expression `PromotePrecision` and related code
- Merge `CheckOverflow` iinto arithmetic eval and code-gen code path, so every arithmetic can handle the overflow case during runtime

Merge `PromotePrecision` into `dataType`, for example, `Add`:
```scala
override def resultDecimalType(p1: Int, s1: Int, p2: Int, s2: Int): DecimalType = {
  val resultScale = max(s1, s2)
  if (allowPrecisionLoss) {
    DecimalType.adjustPrecisionScale(max(p1 - s1, p2 - s2) + resultScale + 1,
      resultScale)
  } else {
    DecimalType.bounded(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale)
  }
}
```

Merge `CheckOverflow`, for example, `Add` eval:
```scala
dataType match {
  case decimalType: DecimalType =>
    val value = numeric.plus(input1, input2)
    checkOverflow(value.asInstanceOf[Decimal], decimalType)
  ...
}
```

Note that, `CheckOverflow` is still useful after this pr, e.g. `RowEncoder`. We can do further in a separate pr.

### Why are the changes needed?

Fix the bug of `TypeCoercion`, for example:
```sql
SELECT CAST(1 AS DECIMAL(28, 2))
UNION ALL
SELECT CAST(1 AS DECIMAL(18, 2)) / CAST(1 AS DECIMAL(18, 2));
```

Relax the decimal precision at runtime, so we do not need redundant Cast

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

Pass exists test and add some bug fix test in `decimalArithmeticOperations.sql`

Closes apache#36698 from ulysses-you/decimal.

Lead-authored-by: ulysses-you <ulyssesyou18@gmail.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Co-authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* fix ut

Co-authored-by: Jiaan Geng <beliefer@163.com>
Co-authored-by: ulysses-you <ulyssesyou18@gmail.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Co-authored-by: Wenchen Fan <wenchen@databricks.com>
RolatZhang pushed a commit to Kyligence/spark that referenced this pull request Aug 29, 2023
…mal binary arithmetic (#481)

* [SPARK-39270][SQL] JDBC dialect supports registering dialect specific functions

The build-in functions in Spark is not the same as JDBC database.
We can provide the chance users could register dialect specific functions.

JDBC dialect supports registering dialect specific functions

'No'.
New feature.

New tests.

Closes apache#36649 from beliefer/SPARK-39270.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite

`JDBCV2Suite` exists some test case which uses sql keywords are not capitalized.
This PR will capitalize sql keywords in `JDBCV2Suite`.

Capitalize sql keywords in `JDBCV2Suite`.

'No'.
Just update test cases.

N/A.

Closes apache#36805 from beliefer/SPARK-39413.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-38997][SPARK-39037][SQL][FOLLOWUP] PushableColumnWithoutNestedColumn` need be translated to predicate too

apache#35768 assume the expression in `And`, `Or` and `Not` must be predicate.
apache#36370 and apache#36325 supported push down expressions in `GROUP BY` and `ORDER BY`. But the children of `And`, `Or` and `Not` can be `FieldReference.column(name)`.
`FieldReference.column(name)` is not a predicate, so the assert may fail.

This PR fix the bug for `PushableColumnWithoutNestedColumn`.

'Yes'.
Let the push-down framework more correctly.

New tests

Closes apache#36776 from beliefer/SPARK-38997_SPARK-39037_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

The main change:
- Add a new method `resultDecimalType` in `BinaryArithmetic`
- Add a new expression `DecimalAddNoOverflowCheck` for the internal decimal add, e.g. `Sum`/`Average`, the different with `Add` is:
  - `DecimalAddNoOverflowCheck` does not check overflow
  - `DecimalAddNoOverflowCheck` make `dataType` as its input parameter
- Merge the decimal precision code of `DecimalPrecision` into each arithmetic data type, so every arithmetic should report the accurate decimal type. And we can remove the unused expression `PromotePrecision` and related code
- Merge `CheckOverflow` iinto arithmetic eval and code-gen code path, so every arithmetic can handle the overflow case during runtime

Merge `PromotePrecision` into `dataType`, for example, `Add`:
```scala
override def resultDecimalType(p1: Int, s1: Int, p2: Int, s2: Int): DecimalType = {
  val resultScale = max(s1, s2)
  if (allowPrecisionLoss) {
    DecimalType.adjustPrecisionScale(max(p1 - s1, p2 - s2) + resultScale + 1,
      resultScale)
  } else {
    DecimalType.bounded(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale)
  }
}
```

Merge `CheckOverflow`, for example, `Add` eval:
```scala
dataType match {
  case decimalType: DecimalType =>
    val value = numeric.plus(input1, input2)
    checkOverflow(value.asInstanceOf[Decimal], decimalType)
  ...
}
```

Note that, `CheckOverflow` is still useful after this pr, e.g. `RowEncoder`. We can do further in a separate pr.

Fix the bug of `TypeCoercion`, for example:
```sql
SELECT CAST(1 AS DECIMAL(28, 2))
UNION ALL
SELECT CAST(1 AS DECIMAL(18, 2)) / CAST(1 AS DECIMAL(18, 2));
```

Relax the decimal precision at runtime, so we do not need redundant Cast

yes, bug fix

Pass exists test and add some bug fix test in `decimalArithmeticOperations.sql`

Closes apache#36698 from ulysses-you/decimal.

Lead-authored-by: ulysses-you <ulyssesyou18@gmail.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Co-authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* fix ut

Co-authored-by: Jiaan Geng <beliefer@163.com>
Co-authored-by: ulysses-you <ulyssesyou18@gmail.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Co-authored-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants