Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38901][SQL] DS V2 supports push down misc functions #37169

Closed
wants to merge 1 commit into from

Conversation

chenzhx
Copy link
Contributor

@chenzhx chenzhx commented Jul 13, 2022

What changes were proposed in this pull request?

Currently, Spark have some misc functions. Please refer

These functions show below:
AES_ENCRYPT,
AES_DECRYPT,
SHA1,
SHA2,
MD5,
CRC32

Function PostgreSQL ClickHouse H2 MySQL Oracle Redshift Snowflake DB2 Vertica Exasol SqlServer Yellowbrick Mariadb Singlestore
AesEncrypt Yes Yes Yes Yes Yes NO Yes Yes NO NO NO Yes Yes Yes
AesDecrypt Yes Yes Yes Yes Yes NO Yes Yes NO NO NO Yes Yes Yes
Sha1 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Sha2 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Md5 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Crc32 No Yes No Yes NO Yes NO Yes NO NO NO NO NO Yes

DS V2 should supports push down these misc functions.

Why are the changes needed?

DS V2 supports push down misc functions.

Does this PR introduce any user-facing change?

'No'.
New feature.

How was this patch tested?

New tests.

@github-actions github-actions bot added the SQL label Jul 13, 2022
@chenzhx chenzhx force-pushed the misc branch 2 times, most recently from 4815746 to 4eb9229 Compare July 13, 2022 03:22
@beliefer
Copy link
Contributor

ping @huaxingao cc @cloud-fan

@@ -1135,6 +1145,52 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel
checkAnswer(df8, Seq(Row("alex")))
}

test("scan with filter push-down with misc functions") {
val df1 = sql("SELECT name FROM h2.test.binary1 WHERE " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 2 spaces indentation

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if tests pass

@beliefer
Copy link
Contributor

beliefer commented Aug 4, 2022

@chenzhx Please rebase master

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 8a0e299 Aug 5, 2022
@chenzhx
Copy link
Contributor Author

chenzhx commented Aug 7, 2022

@cloud-fan @beliefer Thank you for you review.

chenzhx added a commit to chenzhx/spark that referenced this pull request Aug 15, 2022
### What changes were proposed in this pull request?

Currently, Spark have some misc functions. Please refer
https://github.com/apache/spark/blob/2f8613f22c0750c00cf1dcfb2f31c431d8dc1be7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L688

These functions show below:
`AES_ENCRYPT,`
`AES_DECRYPT`,
`SHA1`,
`SHA2`,
`MD5`,
`CRC32`

Function|PostgreSQL|ClickHouse|H2|MySQL|Oracle|Redshift|Snowflake|DB2|Vertica|Exasol|SqlServer|Yellowbrick|Mariadb|Singlestore|
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
`AesEncrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`AesDecrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`Sha1`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Sha2`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Md5`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Crc32`|No|Yes|No|Yes|NO|Yes|NO|Yes|NO|NO|NO|NO|NO|Yes|

DS V2 should supports push down these misc functions.

### Why are the changes needed?

DS V2 supports push down misc functions.

### Does this PR introduce _any_ user-facing change?

'No'.
New feature.

### How was this patch tested?

New tests.

Closes apache#37169 from chenzhx/misc.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
chenzhx added a commit to Kyligence/spark that referenced this pull request Aug 17, 2022
…aging (Sort with expressions) (#525)

* [SPARK-39784][SQL] Put Literal values on the right side of the data source filter after translating Catalyst Expression to data source filter

### What changes were proposed in this pull request?

Even though the literal value could be on both sides of the filter, e.g. both `a > 1` and `1 < a` are valid, after translating Catalyst Expression to data source filter, we want the literal value on the right side so it's easier for the data source to handle these filters. We do this kind of normalization for V1 Filter. We should have the same behavior for V2 Filter.

Before this PR, for the filters that have literal values on the right side, e.g. `1 > a`, we keep it as is. After this PR, we will normalize it to `a < 1` so the data source doesn't need to check each of the filters (and do the flip).

### Why are the changes needed?
I think we should follow V1 Filter's behavior, normalize the filters during catalyst Expression to DS Filter translation time to make the literal values on the right side, so later on, data source doesn't need to check every single filter to figure out if it needs to flip the sides.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?
new test

Closes apache#37197 from huaxingao/flip.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39836][SQL] Simplify V2ExpressionBuilder by extract common method

### What changes were proposed in this pull request?
Currently, `V2ExpressionBuilder` have a lot of similar code, we can extract them as one common method.

We can simplify the implement with the common method.

### Why are the changes needed?
Simplify `V2ExpressionBuilder` by extract common method.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update inner implementation.

### How was this patch tested?
N/A

Closes apache#37249 from beliefer/SPARK-39836.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39858][SQL] Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

### What changes were proposed in this pull request?
When I using `AliasHelper`, I find that some rules inherit it instead of using it.

This PR removes unnecessary `AliasHelper` or `PredicateHelper` in the following cases:
- The rule inherit `AliasHelper` instead of using it. In this case, we can remove `AliasHelper` directly.
- The rule inherit `PredicateHelper` instead of using it. In this case, we can remove `PredicateHelper` directly.
- The rule inherit `AliasHelper` and `PredicateHelper`. In fact, `PredicateHelper` already extends `AliasHelper`. In this case, we can remove `AliasHelper`.
- The rule inherit `OperationHelper` and `PredicateHelper`. In fact, `OperationHelper` already extends `PredicateHelper`. In this case, we can remove `PredicateHelper`.
- The rule inherit `PlanTest` and `PredicateHelper`. In fact, `PlanTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.
- The rule inherit `QueryTest` and `PredicateHelper`. In fact, `QueryTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.

### Why are the changes needed?
Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

### Does this PR introduce _any_ user-facing change?
'No'.
Just improve the inner implementation.

### How was this patch tested?
N/A

Closes apache#37272 from beliefer/SPARK-39858.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39784][SQL][FOLLOW-UP] Use BinaryComparison instead of Predicate (if) for type check

### What changes were proposed in this pull request?
follow up this [comment](apache#37197 (comment))

### Why are the changes needed?
code simplification

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing test

Closes apache#37278 from huaxingao/followup.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-39909] Organize the check of push down information for JDBCV2Suite

### What changes were proposed in this pull request?
This PR changes the check method from `check(one_large_string)` to `check(small_string1, small_string2, ...)`

### Why are the changes needed?
It can help us check the results individually and make the code more clearer.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
existing tests

Closes apache#37342 from yabola/fix.

Authored-by: chenliang.lu <marssss2929@gmail.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39961][SQL] DS V2 push-down translate Cast if the cast is safe

### What changes were proposed in this pull request?
Currently, DS V2 push-down translate `Cast` only if the ansi mode is true.
In fact, if the cast is safe(e.g. cast number to string, cast int to long), we can translate it too.

This PR will call `Cast.canUpCast` so as we can translate `Cast` to V2 `Cast` safely.

Note: The rule `SimplifyCasts` optimize some safe cast, e.g. cast int to long, so we may not see the `Cast`.

### Why are the changes needed?
Add the range for DS V2 push down `Cast`.

### Does this PR introduce _any_ user-facing change?
'Yes'.
`Cast` could be pushed down to data source in more cases.

### How was this patch tested?
Test cases updated.

Closes apache#37388 from beliefer/SPARK-39961.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-38901][SQL] DS V2 supports push down misc functions

### What changes were proposed in this pull request?

Currently, Spark have some misc functions. Please refer
https://github.com/apache/spark/blob/2f8613f22c0750c00cf1dcfb2f31c431d8dc1be7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L688

These functions show below:
`AES_ENCRYPT,`
`AES_DECRYPT`,
`SHA1`,
`SHA2`,
`MD5`,
`CRC32`

Function|PostgreSQL|ClickHouse|H2|MySQL|Oracle|Redshift|Snowflake|DB2|Vertica|Exasol|SqlServer|Yellowbrick|Mariadb|Singlestore|
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
`AesEncrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`AesDecrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`Sha1`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Sha2`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Md5`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Crc32`|No|Yes|No|Yes|NO|Yes|NO|Yes|NO|NO|NO|NO|NO|Yes|

DS V2 should supports push down these misc functions.

### Why are the changes needed?

DS V2 supports push down misc functions.

### Does this PR introduce _any_ user-facing change?

'No'.
New feature.

### How was this patch tested?

New tests.

Closes apache#37169 from chenzhx/misc.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39964][SQL] DS V2 pushdown should unify the translate path

### What changes were proposed in this pull request?
Currently, DS V2 pushdown have two translate path `DataSourceStrategy.translateAggregate` used to translate aggregate functions and `V2ExpressionBuilder` used to translate other functions and expressions, we can unify them.

After this PR, the translate have only one code path, developers will easy to coding and reading.

### Why are the changes needed?
Unify the translate path for DS V2 pushdown.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update the inner implementation.

### How was this patch tested?
N/A

Closes apache#37391 from beliefer/SPARK-39964.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with expressions)

### What changes were proposed in this pull request?
Currently, DS V2 aggregate push-down cannot work with DS V2 Top N push-down (`ORDER BY col LIMIT m`) or DS V2 Paging push-down (`ORDER BY col LIMIT m OFFSET n`).
If we can push down aggregate with Top N or Paging, it will be better performance.

This PR only let aggregate pushed down with ORDER BY expressions which must be GROUP BY expressions.

The idea of this PR are:
1. When we give an expectation outputs of `ScanBuilderHolder`, holding the map from expectation outputs to origin expressions (contains origin columns).
2. When we try to push down Top N or Paging, we need restore the origin expressions for `SortOrder`.

### Why are the changes needed?
Let DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions), then users can get the better performance.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New test cases.

Closes apache#37320 from beliefer/SPARK-39819_new.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39929][SQL] DS V2 supports push down string  functions(non ANSI)

**What changes were proposed in this pull request?**

support more  commonly used string functions

BIT_LENGTH
CHAR_LENGTH
CONCAT

The mainstream databases support these functions show below.

Function | PostgreSQL | ClickHouse | H2 | MySQL | Oracle | Redshift | Presto | Teradata | Snowflake | DB2 | Vertica | Exasol | SqlServer | Yellowbrick | Impala | Mariadb | Druid | Pig | SQLite | Influxdata | Singlestore | ElasticSearch
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
BIT_LENGTH | Yes | Yes | Yes | Yes | Yes | no | no | no | no | Yes | Yes | Yes | no | Yes | no | Yes | no | no | no | no | no | Yes
CHAR_LENGTH | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | Yes | Yes | Yes | Yes
CONCAT | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | no | no | Yes | Yes

**Why are the changes needed?**
DS V2 supports push down string functions

**Does this PR introduce any user-facing change?**
'No'.
New feature.

How was this patch tested?
New tests.

Closes apache#37427 from zheniantoushipashi/SPARK-39929.

Authored-by: biaobiao.sun <1319027852@qq.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-38899][SQL][FOLLOWUP] Fix bug extract datetime in DS V2 pushdown

### What changes were proposed in this pull request?

[SPARK-38899](apache#36663) supports extract function in JDBC data source.
But the implement is incorrect.
This PR just add a test case and it will be failed !
The test case show below.
```
test("scan with filter push-down with date time functions")  {
    val df9 = sql("SELECT name FROM h2.test.datetime WHERE " +
      "dayofyear(date1) > 100 order by dayofyear(date1) limit 1")
    checkFiltersRemoved(df9)
    val expectedPlanFragment9 =
      "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], " +
      "PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
    checkPushedInfo(df9, expectedPlanFragment9)
    checkAnswer(df9, Seq(Row("alex")))
  }
```

The test case output failure show below.
```
"== Parsed Logical Plan ==
'GlobalLimit 1
+- 'LocalLimit 1
   +- 'Sort ['dayofyear('date1) ASC NULLS FIRST], true
      +- 'Project ['name]
         +- 'Filter ('dayofyear('date1) > 100)
            +- 'UnresolvedRelation [h2, test, datetime], [], false

== Analyzed Logical Plan ==
name: string
GlobalLimit 1
+- LocalLimit 1
   +- Project [name#x]
      +- Sort [dayofyear(date1#x) ASC NULLS FIRST], true
         +- Project [name#x, date1#x]
            +- Filter (dayofyear(date1#x) > 100)
               +- SubqueryAlias h2.test.datetime
                  +- RelationV2[NAME#x, DATE1#x, TIME1#x] h2.test.datetime test.datetime

== Optimized Logical Plan ==
Project [name#x]
+- RelationV2[NAME#x] test.datetime

== Physical Plan ==
*(1) Scan org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$145f6181a [NAME#x] PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [org.apache.spark.sql.connector.expressions.Extract3b95fce9 ASC NULLS FIRST] LIMIT 1, ReadSchema: struct<NAME:string>

" did not contain "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
```

### Why are the changes needed?

Fix an implement bug.
The reason of the bug is the Extract the function does not implement the toString method when pushing down the JDBC data source.

### Does this PR introduce _any_ user-facing change?

'No'.
New feature.

### How was this patch tested?

New test case.

Closes apache#37469 from chenzhx/spark-master.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* code update

Signed-off-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: huaxingao <huaxin_gao@apple.com>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Co-authored-by: chenliang.lu <marssss2929@gmail.com>
Co-authored-by: biaobiao.sun <1319027852@qq.com>
chenzhx added a commit to chenzhx/spark that referenced this pull request Sep 15, 2022
…aging (Sort with expressions) (Kyligence#525)

* [SPARK-39784][SQL] Put Literal values on the right side of the data source filter after translating Catalyst Expression to data source filter

### What changes were proposed in this pull request?

Even though the literal value could be on both sides of the filter, e.g. both `a > 1` and `1 < a` are valid, after translating Catalyst Expression to data source filter, we want the literal value on the right side so it's easier for the data source to handle these filters. We do this kind of normalization for V1 Filter. We should have the same behavior for V2 Filter.

Before this PR, for the filters that have literal values on the right side, e.g. `1 > a`, we keep it as is. After this PR, we will normalize it to `a < 1` so the data source doesn't need to check each of the filters (and do the flip).

### Why are the changes needed?
I think we should follow V1 Filter's behavior, normalize the filters during catalyst Expression to DS Filter translation time to make the literal values on the right side, so later on, data source doesn't need to check every single filter to figure out if it needs to flip the sides.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?
new test

Closes apache#37197 from huaxingao/flip.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39836][SQL] Simplify V2ExpressionBuilder by extract common method

### What changes were proposed in this pull request?
Currently, `V2ExpressionBuilder` have a lot of similar code, we can extract them as one common method.

We can simplify the implement with the common method.

### Why are the changes needed?
Simplify `V2ExpressionBuilder` by extract common method.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update inner implementation.

### How was this patch tested?
N/A

Closes apache#37249 from beliefer/SPARK-39836.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39858][SQL] Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

### What changes were proposed in this pull request?
When I using `AliasHelper`, I find that some rules inherit it instead of using it.

This PR removes unnecessary `AliasHelper` or `PredicateHelper` in the following cases:
- The rule inherit `AliasHelper` instead of using it. In this case, we can remove `AliasHelper` directly.
- The rule inherit `PredicateHelper` instead of using it. In this case, we can remove `PredicateHelper` directly.
- The rule inherit `AliasHelper` and `PredicateHelper`. In fact, `PredicateHelper` already extends `AliasHelper`. In this case, we can remove `AliasHelper`.
- The rule inherit `OperationHelper` and `PredicateHelper`. In fact, `OperationHelper` already extends `PredicateHelper`. In this case, we can remove `PredicateHelper`.
- The rule inherit `PlanTest` and `PredicateHelper`. In fact, `PlanTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.
- The rule inherit `QueryTest` and `PredicateHelper`. In fact, `QueryTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.

### Why are the changes needed?
Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

### Does this PR introduce _any_ user-facing change?
'No'.
Just improve the inner implementation.

### How was this patch tested?
N/A

Closes apache#37272 from beliefer/SPARK-39858.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39784][SQL][FOLLOW-UP] Use BinaryComparison instead of Predicate (if) for type check

### What changes were proposed in this pull request?
follow up this [comment](apache#37197 (comment))

### Why are the changes needed?
code simplification

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing test

Closes apache#37278 from huaxingao/followup.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-39909] Organize the check of push down information for JDBCV2Suite

### What changes were proposed in this pull request?
This PR changes the check method from `check(one_large_string)` to `check(small_string1, small_string2, ...)`

### Why are the changes needed?
It can help us check the results individually and make the code more clearer.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
existing tests

Closes apache#37342 from yabola/fix.

Authored-by: chenliang.lu <marssss2929@gmail.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39961][SQL] DS V2 push-down translate Cast if the cast is safe

### What changes were proposed in this pull request?
Currently, DS V2 push-down translate `Cast` only if the ansi mode is true.
In fact, if the cast is safe(e.g. cast number to string, cast int to long), we can translate it too.

This PR will call `Cast.canUpCast` so as we can translate `Cast` to V2 `Cast` safely.

Note: The rule `SimplifyCasts` optimize some safe cast, e.g. cast int to long, so we may not see the `Cast`.

### Why are the changes needed?
Add the range for DS V2 push down `Cast`.

### Does this PR introduce _any_ user-facing change?
'Yes'.
`Cast` could be pushed down to data source in more cases.

### How was this patch tested?
Test cases updated.

Closes apache#37388 from beliefer/SPARK-39961.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-38901][SQL] DS V2 supports push down misc functions

### What changes were proposed in this pull request?

Currently, Spark have some misc functions. Please refer
https://github.com/apache/spark/blob/2f8613f22c0750c00cf1dcfb2f31c431d8dc1be7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L688

These functions show below:
`AES_ENCRYPT,`
`AES_DECRYPT`,
`SHA1`,
`SHA2`,
`MD5`,
`CRC32`

Function|PostgreSQL|ClickHouse|H2|MySQL|Oracle|Redshift|Snowflake|DB2|Vertica|Exasol|SqlServer|Yellowbrick|Mariadb|Singlestore|
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
`AesEncrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`AesDecrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`Sha1`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Sha2`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Md5`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Crc32`|No|Yes|No|Yes|NO|Yes|NO|Yes|NO|NO|NO|NO|NO|Yes|

DS V2 should supports push down these misc functions.

### Why are the changes needed?

DS V2 supports push down misc functions.

### Does this PR introduce _any_ user-facing change?

'No'.
New feature.

### How was this patch tested?

New tests.

Closes apache#37169 from chenzhx/misc.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39964][SQL] DS V2 pushdown should unify the translate path

### What changes were proposed in this pull request?
Currently, DS V2 pushdown have two translate path `DataSourceStrategy.translateAggregate` used to translate aggregate functions and `V2ExpressionBuilder` used to translate other functions and expressions, we can unify them.

After this PR, the translate have only one code path, developers will easy to coding and reading.

### Why are the changes needed?
Unify the translate path for DS V2 pushdown.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update the inner implementation.

### How was this patch tested?
N/A

Closes apache#37391 from beliefer/SPARK-39964.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with expressions)

### What changes were proposed in this pull request?
Currently, DS V2 aggregate push-down cannot work with DS V2 Top N push-down (`ORDER BY col LIMIT m`) or DS V2 Paging push-down (`ORDER BY col LIMIT m OFFSET n`).
If we can push down aggregate with Top N or Paging, it will be better performance.

This PR only let aggregate pushed down with ORDER BY expressions which must be GROUP BY expressions.

The idea of this PR are:
1. When we give an expectation outputs of `ScanBuilderHolder`, holding the map from expectation outputs to origin expressions (contains origin columns).
2. When we try to push down Top N or Paging, we need restore the origin expressions for `SortOrder`.

### Why are the changes needed?
Let DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions), then users can get the better performance.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New test cases.

Closes apache#37320 from beliefer/SPARK-39819_new.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39929][SQL] DS V2 supports push down string  functions(non ANSI)

**What changes were proposed in this pull request?**

support more  commonly used string functions

BIT_LENGTH
CHAR_LENGTH
CONCAT

The mainstream databases support these functions show below.

Function | PostgreSQL | ClickHouse | H2 | MySQL | Oracle | Redshift | Presto | Teradata | Snowflake | DB2 | Vertica | Exasol | SqlServer | Yellowbrick | Impala | Mariadb | Druid | Pig | SQLite | Influxdata | Singlestore | ElasticSearch
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
BIT_LENGTH | Yes | Yes | Yes | Yes | Yes | no | no | no | no | Yes | Yes | Yes | no | Yes | no | Yes | no | no | no | no | no | Yes
CHAR_LENGTH | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | Yes | Yes | Yes | Yes
CONCAT | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | no | no | Yes | Yes

**Why are the changes needed?**
DS V2 supports push down string functions

**Does this PR introduce any user-facing change?**
'No'.
New feature.

How was this patch tested?
New tests.

Closes apache#37427 from zheniantoushipashi/SPARK-39929.

Authored-by: biaobiao.sun <1319027852@qq.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-38899][SQL][FOLLOWUP] Fix bug extract datetime in DS V2 pushdown

### What changes were proposed in this pull request?

[SPARK-38899](apache#36663) supports extract function in JDBC data source.
But the implement is incorrect.
This PR just add a test case and it will be failed !
The test case show below.
```
test("scan with filter push-down with date time functions")  {
    val df9 = sql("SELECT name FROM h2.test.datetime WHERE " +
      "dayofyear(date1) > 100 order by dayofyear(date1) limit 1")
    checkFiltersRemoved(df9)
    val expectedPlanFragment9 =
      "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], " +
      "PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
    checkPushedInfo(df9, expectedPlanFragment9)
    checkAnswer(df9, Seq(Row("alex")))
  }
```

The test case output failure show below.
```
"== Parsed Logical Plan ==
'GlobalLimit 1
+- 'LocalLimit 1
   +- 'Sort ['dayofyear('date1) ASC NULLS FIRST], true
      +- 'Project ['name]
         +- 'Filter ('dayofyear('date1) > 100)
            +- 'UnresolvedRelation [h2, test, datetime], [], false

== Analyzed Logical Plan ==
name: string
GlobalLimit 1
+- LocalLimit 1
   +- Project [name#x]
      +- Sort [dayofyear(date1#x) ASC NULLS FIRST], true
         +- Project [name#x, date1#x]
            +- Filter (dayofyear(date1#x) > 100)
               +- SubqueryAlias h2.test.datetime
                  +- RelationV2[NAME#x, DATE1#x, TIME1#x] h2.test.datetime test.datetime

== Optimized Logical Plan ==
Project [name#x]
+- RelationV2[NAME#x] test.datetime

== Physical Plan ==
*(1) Scan org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$145f6181a [NAME#x] PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [org.apache.spark.sql.connector.expressions.Extract3b95fce9 ASC NULLS FIRST] LIMIT 1, ReadSchema: struct<NAME:string>

" did not contain "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
```

### Why are the changes needed?

Fix an implement bug.
The reason of the bug is the Extract the function does not implement the toString method when pushing down the JDBC data source.

### Does this PR introduce _any_ user-facing change?

'No'.
New feature.

### How was this patch tested?

New test case.

Closes apache#37469 from chenzhx/spark-master.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* code update

Signed-off-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: huaxingao <huaxin_gao@apple.com>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Co-authored-by: chenliang.lu <marssss2929@gmail.com>
Co-authored-by: biaobiao.sun <1319027852@qq.com>
chenzhx added a commit to Kyligence/spark that referenced this pull request Sep 15, 2022
…aging (Sort with expressions) (#525)

* [SPARK-39784][SQL] Put Literal values on the right side of the data source filter after translating Catalyst Expression to data source filter

### What changes were proposed in this pull request?

Even though the literal value could be on both sides of the filter, e.g. both `a > 1` and `1 < a` are valid, after translating Catalyst Expression to data source filter, we want the literal value on the right side so it's easier for the data source to handle these filters. We do this kind of normalization for V1 Filter. We should have the same behavior for V2 Filter.

Before this PR, for the filters that have literal values on the right side, e.g. `1 > a`, we keep it as is. After this PR, we will normalize it to `a < 1` so the data source doesn't need to check each of the filters (and do the flip).

### Why are the changes needed?
I think we should follow V1 Filter's behavior, normalize the filters during catalyst Expression to DS Filter translation time to make the literal values on the right side, so later on, data source doesn't need to check every single filter to figure out if it needs to flip the sides.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?
new test

Closes apache#37197 from huaxingao/flip.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39836][SQL] Simplify V2ExpressionBuilder by extract common method

### What changes were proposed in this pull request?
Currently, `V2ExpressionBuilder` have a lot of similar code, we can extract them as one common method.

We can simplify the implement with the common method.

### Why are the changes needed?
Simplify `V2ExpressionBuilder` by extract common method.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update inner implementation.

### How was this patch tested?
N/A

Closes apache#37249 from beliefer/SPARK-39836.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39858][SQL] Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

### What changes were proposed in this pull request?
When I using `AliasHelper`, I find that some rules inherit it instead of using it.

This PR removes unnecessary `AliasHelper` or `PredicateHelper` in the following cases:
- The rule inherit `AliasHelper` instead of using it. In this case, we can remove `AliasHelper` directly.
- The rule inherit `PredicateHelper` instead of using it. In this case, we can remove `PredicateHelper` directly.
- The rule inherit `AliasHelper` and `PredicateHelper`. In fact, `PredicateHelper` already extends `AliasHelper`. In this case, we can remove `AliasHelper`.
- The rule inherit `OperationHelper` and `PredicateHelper`. In fact, `OperationHelper` already extends `PredicateHelper`. In this case, we can remove `PredicateHelper`.
- The rule inherit `PlanTest` and `PredicateHelper`. In fact, `PlanTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.
- The rule inherit `QueryTest` and `PredicateHelper`. In fact, `QueryTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.

### Why are the changes needed?
Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

### Does this PR introduce _any_ user-facing change?
'No'.
Just improve the inner implementation.

### How was this patch tested?
N/A

Closes apache#37272 from beliefer/SPARK-39858.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39784][SQL][FOLLOW-UP] Use BinaryComparison instead of Predicate (if) for type check

### What changes were proposed in this pull request?
follow up this [comment](apache#37197 (comment))

### Why are the changes needed?
code simplification

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing test

Closes apache#37278 from huaxingao/followup.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-39909] Organize the check of push down information for JDBCV2Suite

### What changes were proposed in this pull request?
This PR changes the check method from `check(one_large_string)` to `check(small_string1, small_string2, ...)`

### Why are the changes needed?
It can help us check the results individually and make the code more clearer.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
existing tests

Closes apache#37342 from yabola/fix.

Authored-by: chenliang.lu <marssss2929@gmail.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39961][SQL] DS V2 push-down translate Cast if the cast is safe

### What changes were proposed in this pull request?
Currently, DS V2 push-down translate `Cast` only if the ansi mode is true.
In fact, if the cast is safe(e.g. cast number to string, cast int to long), we can translate it too.

This PR will call `Cast.canUpCast` so as we can translate `Cast` to V2 `Cast` safely.

Note: The rule `SimplifyCasts` optimize some safe cast, e.g. cast int to long, so we may not see the `Cast`.

### Why are the changes needed?
Add the range for DS V2 push down `Cast`.

### Does this PR introduce _any_ user-facing change?
'Yes'.
`Cast` could be pushed down to data source in more cases.

### How was this patch tested?
Test cases updated.

Closes apache#37388 from beliefer/SPARK-39961.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-38901][SQL] DS V2 supports push down misc functions

### What changes were proposed in this pull request?

Currently, Spark have some misc functions. Please refer
https://github.com/apache/spark/blob/2f8613f22c0750c00cf1dcfb2f31c431d8dc1be7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L688

These functions show below:
`AES_ENCRYPT,`
`AES_DECRYPT`,
`SHA1`,
`SHA2`,
`MD5`,
`CRC32`

Function|PostgreSQL|ClickHouse|H2|MySQL|Oracle|Redshift|Snowflake|DB2|Vertica|Exasol|SqlServer|Yellowbrick|Mariadb|Singlestore|
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
`AesEncrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`AesDecrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`Sha1`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Sha2`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Md5`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Crc32`|No|Yes|No|Yes|NO|Yes|NO|Yes|NO|NO|NO|NO|NO|Yes|

DS V2 should supports push down these misc functions.

### Why are the changes needed?

DS V2 supports push down misc functions.

### Does this PR introduce _any_ user-facing change?

'No'.
New feature.

### How was this patch tested?

New tests.

Closes apache#37169 from chenzhx/misc.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39964][SQL] DS V2 pushdown should unify the translate path

### What changes were proposed in this pull request?
Currently, DS V2 pushdown have two translate path `DataSourceStrategy.translateAggregate` used to translate aggregate functions and `V2ExpressionBuilder` used to translate other functions and expressions, we can unify them.

After this PR, the translate have only one code path, developers will easy to coding and reading.

### Why are the changes needed?
Unify the translate path for DS V2 pushdown.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update the inner implementation.

### How was this patch tested?
N/A

Closes apache#37391 from beliefer/SPARK-39964.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with expressions)

### What changes were proposed in this pull request?
Currently, DS V2 aggregate push-down cannot work with DS V2 Top N push-down (`ORDER BY col LIMIT m`) or DS V2 Paging push-down (`ORDER BY col LIMIT m OFFSET n`).
If we can push down aggregate with Top N or Paging, it will be better performance.

This PR only let aggregate pushed down with ORDER BY expressions which must be GROUP BY expressions.

The idea of this PR are:
1. When we give an expectation outputs of `ScanBuilderHolder`, holding the map from expectation outputs to origin expressions (contains origin columns).
2. When we try to push down Top N or Paging, we need restore the origin expressions for `SortOrder`.

### Why are the changes needed?
Let DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions), then users can get the better performance.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New test cases.

Closes apache#37320 from beliefer/SPARK-39819_new.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39929][SQL] DS V2 supports push down string  functions(non ANSI)

**What changes were proposed in this pull request?**

support more  commonly used string functions

BIT_LENGTH
CHAR_LENGTH
CONCAT

The mainstream databases support these functions show below.

Function | PostgreSQL | ClickHouse | H2 | MySQL | Oracle | Redshift | Presto | Teradata | Snowflake | DB2 | Vertica | Exasol | SqlServer | Yellowbrick | Impala | Mariadb | Druid | Pig | SQLite | Influxdata | Singlestore | ElasticSearch
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
BIT_LENGTH | Yes | Yes | Yes | Yes | Yes | no | no | no | no | Yes | Yes | Yes | no | Yes | no | Yes | no | no | no | no | no | Yes
CHAR_LENGTH | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | Yes | Yes | Yes | Yes
CONCAT | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | no | no | Yes | Yes

**Why are the changes needed?**
DS V2 supports push down string functions

**Does this PR introduce any user-facing change?**
'No'.
New feature.

How was this patch tested?
New tests.

Closes apache#37427 from zheniantoushipashi/SPARK-39929.

Authored-by: biaobiao.sun <1319027852@qq.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-38899][SQL][FOLLOWUP] Fix bug extract datetime in DS V2 pushdown

### What changes were proposed in this pull request?

[SPARK-38899](apache#36663) supports extract function in JDBC data source.
But the implement is incorrect.
This PR just add a test case and it will be failed !
The test case show below.
```
test("scan with filter push-down with date time functions")  {
    val df9 = sql("SELECT name FROM h2.test.datetime WHERE " +
      "dayofyear(date1) > 100 order by dayofyear(date1) limit 1")
    checkFiltersRemoved(df9)
    val expectedPlanFragment9 =
      "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], " +
      "PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
    checkPushedInfo(df9, expectedPlanFragment9)
    checkAnswer(df9, Seq(Row("alex")))
  }
```

The test case output failure show below.
```
"== Parsed Logical Plan ==
'GlobalLimit 1
+- 'LocalLimit 1
   +- 'Sort ['dayofyear('date1) ASC NULLS FIRST], true
      +- 'Project ['name]
         +- 'Filter ('dayofyear('date1) > 100)
            +- 'UnresolvedRelation [h2, test, datetime], [], false

== Analyzed Logical Plan ==
name: string
GlobalLimit 1
+- LocalLimit 1
   +- Project [name#x]
      +- Sort [dayofyear(date1#x) ASC NULLS FIRST], true
         +- Project [name#x, date1#x]
            +- Filter (dayofyear(date1#x) > 100)
               +- SubqueryAlias h2.test.datetime
                  +- RelationV2[NAME#x, DATE1#x, TIME1#x] h2.test.datetime test.datetime

== Optimized Logical Plan ==
Project [name#x]
+- RelationV2[NAME#x] test.datetime

== Physical Plan ==
*(1) Scan org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$145f6181a [NAME#x] PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [org.apache.spark.sql.connector.expressions.Extract3b95fce9 ASC NULLS FIRST] LIMIT 1, ReadSchema: struct<NAME:string>

" did not contain "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
```

### Why are the changes needed?

Fix an implement bug.
The reason of the bug is the Extract the function does not implement the toString method when pushing down the JDBC data source.

### Does this PR introduce _any_ user-facing change?

'No'.
New feature.

### How was this patch tested?

New test case.

Closes apache#37469 from chenzhx/spark-master.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* code update

Signed-off-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: huaxingao <huaxin_gao@apple.com>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Co-authored-by: chenliang.lu <marssss2929@gmail.com>
Co-authored-by: biaobiao.sun <1319027852@qq.com>
leejaywei pushed a commit to Kyligence/spark that referenced this pull request Oct 18, 2022
…aging (Sort with expressions) (#525)

* [SPARK-39784][SQL] Put Literal values on the right side of the data source filter after translating Catalyst Expression to data source filter

Even though the literal value could be on both sides of the filter, e.g. both `a > 1` and `1 < a` are valid, after translating Catalyst Expression to data source filter, we want the literal value on the right side so it's easier for the data source to handle these filters. We do this kind of normalization for V1 Filter. We should have the same behavior for V2 Filter.

Before this PR, for the filters that have literal values on the right side, e.g. `1 > a`, we keep it as is. After this PR, we will normalize it to `a < 1` so the data source doesn't need to check each of the filters (and do the flip).

I think we should follow V1 Filter's behavior, normalize the filters during catalyst Expression to DS Filter translation time to make the literal values on the right side, so later on, data source doesn't need to check every single filter to figure out if it needs to flip the sides.

no

new test

Closes apache#37197 from huaxingao/flip.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39836][SQL] Simplify V2ExpressionBuilder by extract common method

Currently, `V2ExpressionBuilder` have a lot of similar code, we can extract them as one common method.

We can simplify the implement with the common method.

Simplify `V2ExpressionBuilder` by extract common method.

'No'.
Just update inner implementation.

N/A

Closes apache#37249 from beliefer/SPARK-39836.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39858][SQL] Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

When I using `AliasHelper`, I find that some rules inherit it instead of using it.

This PR removes unnecessary `AliasHelper` or `PredicateHelper` in the following cases:
- The rule inherit `AliasHelper` instead of using it. In this case, we can remove `AliasHelper` directly.
- The rule inherit `PredicateHelper` instead of using it. In this case, we can remove `PredicateHelper` directly.
- The rule inherit `AliasHelper` and `PredicateHelper`. In fact, `PredicateHelper` already extends `AliasHelper`. In this case, we can remove `AliasHelper`.
- The rule inherit `OperationHelper` and `PredicateHelper`. In fact, `OperationHelper` already extends `PredicateHelper`. In this case, we can remove `PredicateHelper`.
- The rule inherit `PlanTest` and `PredicateHelper`. In fact, `PlanTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.
- The rule inherit `QueryTest` and `PredicateHelper`. In fact, `QueryTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.

Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

'No'.
Just improve the inner implementation.

N/A

Closes apache#37272 from beliefer/SPARK-39858.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39784][SQL][FOLLOW-UP] Use BinaryComparison instead of Predicate (if) for type check

follow up this [comment](apache#37197 (comment))

code simplification

No

Existing test

Closes apache#37278 from huaxingao/followup.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-39909] Organize the check of push down information for JDBCV2Suite

This PR changes the check method from `check(one_large_string)` to `check(small_string1, small_string2, ...)`

It can help us check the results individually and make the code more clearer.

no

existing tests

Closes apache#37342 from yabola/fix.

Authored-by: chenliang.lu <marssss2929@gmail.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39961][SQL] DS V2 push-down translate Cast if the cast is safe

Currently, DS V2 push-down translate `Cast` only if the ansi mode is true.
In fact, if the cast is safe(e.g. cast number to string, cast int to long), we can translate it too.

This PR will call `Cast.canUpCast` so as we can translate `Cast` to V2 `Cast` safely.

Note: The rule `SimplifyCasts` optimize some safe cast, e.g. cast int to long, so we may not see the `Cast`.

Add the range for DS V2 push down `Cast`.

'Yes'.
`Cast` could be pushed down to data source in more cases.

Test cases updated.

Closes apache#37388 from beliefer/SPARK-39961.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-38901][SQL] DS V2 supports push down misc functions

Currently, Spark have some misc functions. Please refer
https://github.com/apache/spark/blob/2f8613f22c0750c00cf1dcfb2f31c431d8dc1be7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L688

These functions show below:
`AES_ENCRYPT,`
`AES_DECRYPT`,
`SHA1`,
`SHA2`,
`MD5`,
`CRC32`

Function|PostgreSQL|ClickHouse|H2|MySQL|Oracle|Redshift|Snowflake|DB2|Vertica|Exasol|SqlServer|Yellowbrick|Mariadb|Singlestore|
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
`AesEncrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`AesDecrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`Sha1`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Sha2`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Md5`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Crc32`|No|Yes|No|Yes|NO|Yes|NO|Yes|NO|NO|NO|NO|NO|Yes|

DS V2 should supports push down these misc functions.

DS V2 supports push down misc functions.

'No'.
New feature.

New tests.

Closes apache#37169 from chenzhx/misc.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39964][SQL] DS V2 pushdown should unify the translate path

Currently, DS V2 pushdown have two translate path `DataSourceStrategy.translateAggregate` used to translate aggregate functions and `V2ExpressionBuilder` used to translate other functions and expressions, we can unify them.

After this PR, the translate have only one code path, developers will easy to coding and reading.

Unify the translate path for DS V2 pushdown.

'No'.
Just update the inner implementation.

N/A

Closes apache#37391 from beliefer/SPARK-39964.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with expressions)

Currently, DS V2 aggregate push-down cannot work with DS V2 Top N push-down (`ORDER BY col LIMIT m`) or DS V2 Paging push-down (`ORDER BY col LIMIT m OFFSET n`).
If we can push down aggregate with Top N or Paging, it will be better performance.

This PR only let aggregate pushed down with ORDER BY expressions which must be GROUP BY expressions.

The idea of this PR are:
1. When we give an expectation outputs of `ScanBuilderHolder`, holding the map from expectation outputs to origin expressions (contains origin columns).
2. When we try to push down Top N or Paging, we need restore the origin expressions for `SortOrder`.

Let DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions), then users can get the better performance.

'No'.
New feature.

New test cases.

Closes apache#37320 from beliefer/SPARK-39819_new.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39929][SQL] DS V2 supports push down string  functions(non ANSI)

**What changes were proposed in this pull request?**

support more  commonly used string functions

BIT_LENGTH
CHAR_LENGTH
CONCAT

The mainstream databases support these functions show below.

Function | PostgreSQL | ClickHouse | H2 | MySQL | Oracle | Redshift | Presto | Teradata | Snowflake | DB2 | Vertica | Exasol | SqlServer | Yellowbrick | Impala | Mariadb | Druid | Pig | SQLite | Influxdata | Singlestore | ElasticSearch
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
BIT_LENGTH | Yes | Yes | Yes | Yes | Yes | no | no | no | no | Yes | Yes | Yes | no | Yes | no | Yes | no | no | no | no | no | Yes
CHAR_LENGTH | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | Yes | Yes | Yes | Yes
CONCAT | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | no | no | Yes | Yes

**Why are the changes needed?**
DS V2 supports push down string functions

**Does this PR introduce any user-facing change?**
'No'.
New feature.

How was this patch tested?
New tests.

Closes apache#37427 from zheniantoushipashi/SPARK-39929.

Authored-by: biaobiao.sun <1319027852@qq.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-38899][SQL][FOLLOWUP] Fix bug extract datetime in DS V2 pushdown

[SPARK-38899](apache#36663) supports extract function in JDBC data source.
But the implement is incorrect.
This PR just add a test case and it will be failed !
The test case show below.
```
test("scan with filter push-down with date time functions")  {
    val df9 = sql("SELECT name FROM h2.test.datetime WHERE " +
      "dayofyear(date1) > 100 order by dayofyear(date1) limit 1")
    checkFiltersRemoved(df9)
    val expectedPlanFragment9 =
      "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], " +
      "PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
    checkPushedInfo(df9, expectedPlanFragment9)
    checkAnswer(df9, Seq(Row("alex")))
  }
```

The test case output failure show below.
```
"== Parsed Logical Plan ==
'GlobalLimit 1
+- 'LocalLimit 1
   +- 'Sort ['dayofyear('date1) ASC NULLS FIRST], true
      +- 'Project ['name]
         +- 'Filter ('dayofyear('date1) > 100)
            +- 'UnresolvedRelation [h2, test, datetime], [], false

== Analyzed Logical Plan ==
name: string
GlobalLimit 1
+- LocalLimit 1
   +- Project [name#x]
      +- Sort [dayofyear(date1#x) ASC NULLS FIRST], true
         +- Project [name#x, date1#x]
            +- Filter (dayofyear(date1#x) > 100)
               +- SubqueryAlias h2.test.datetime
                  +- RelationV2[NAME#x, DATE1#x, TIME1#x] h2.test.datetime test.datetime

== Optimized Logical Plan ==
Project [name#x]
+- RelationV2[NAME#x] test.datetime

== Physical Plan ==
*(1) Scan org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$145f6181a [NAME#x] PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [org.apache.spark.sql.connector.expressions.Extract3b95fce9 ASC NULLS FIRST] LIMIT 1, ReadSchema: struct<NAME:string>

" did not contain "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
```

Fix an implement bug.
The reason of the bug is the Extract the function does not implement the toString method when pushing down the JDBC data source.

'No'.
New feature.

New test case.

Closes apache#37469 from chenzhx/spark-master.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* code update

Signed-off-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: huaxingao <huaxin_gao@apple.com>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Co-authored-by: chenliang.lu <marssss2929@gmail.com>
Co-authored-by: biaobiao.sun <1319027852@qq.com>
hellozepp pushed a commit to hellozepp/spark that referenced this pull request Aug 10, 2023
…aging (Sort with expressions) (Kyligence#525)

* [SPARK-39784][SQL] Put Literal values on the right side of the data source filter after translating Catalyst Expression to data source filter

Even though the literal value could be on both sides of the filter, e.g. both `a > 1` and `1 < a` are valid, after translating Catalyst Expression to data source filter, we want the literal value on the right side so it's easier for the data source to handle these filters. We do this kind of normalization for V1 Filter. We should have the same behavior for V2 Filter.

Before this PR, for the filters that have literal values on the right side, e.g. `1 > a`, we keep it as is. After this PR, we will normalize it to `a < 1` so the data source doesn't need to check each of the filters (and do the flip).

I think we should follow V1 Filter's behavior, normalize the filters during catalyst Expression to DS Filter translation time to make the literal values on the right side, so later on, data source doesn't need to check every single filter to figure out if it needs to flip the sides.

no

new test

Closes apache#37197 from huaxingao/flip.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39836][SQL] Simplify V2ExpressionBuilder by extract common method

Currently, `V2ExpressionBuilder` have a lot of similar code, we can extract them as one common method.

We can simplify the implement with the common method.

Simplify `V2ExpressionBuilder` by extract common method.

'No'.
Just update inner implementation.

N/A

Closes apache#37249 from beliefer/SPARK-39836.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39858][SQL] Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

When I using `AliasHelper`, I find that some rules inherit it instead of using it.

This PR removes unnecessary `AliasHelper` or `PredicateHelper` in the following cases:
- The rule inherit `AliasHelper` instead of using it. In this case, we can remove `AliasHelper` directly.
- The rule inherit `PredicateHelper` instead of using it. In this case, we can remove `PredicateHelper` directly.
- The rule inherit `AliasHelper` and `PredicateHelper`. In fact, `PredicateHelper` already extends `AliasHelper`. In this case, we can remove `AliasHelper`.
- The rule inherit `OperationHelper` and `PredicateHelper`. In fact, `OperationHelper` already extends `PredicateHelper`. In this case, we can remove `PredicateHelper`.
- The rule inherit `PlanTest` and `PredicateHelper`. In fact, `PlanTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.
- The rule inherit `QueryTest` and `PredicateHelper`. In fact, `QueryTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.

Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

'No'.
Just improve the inner implementation.

N/A

Closes apache#37272 from beliefer/SPARK-39858.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39784][SQL][FOLLOW-UP] Use BinaryComparison instead of Predicate (if) for type check

follow up this [comment](apache#37197 (comment))

code simplification

No

Existing test

Closes apache#37278 from huaxingao/followup.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-39909] Organize the check of push down information for JDBCV2Suite

This PR changes the check method from `check(one_large_string)` to `check(small_string1, small_string2, ...)`

It can help us check the results individually and make the code more clearer.

no

existing tests

Closes apache#37342 from yabola/fix.

Authored-by: chenliang.lu <marssss2929@gmail.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39961][SQL] DS V2 push-down translate Cast if the cast is safe

Currently, DS V2 push-down translate `Cast` only if the ansi mode is true.
In fact, if the cast is safe(e.g. cast number to string, cast int to long), we can translate it too.

This PR will call `Cast.canUpCast` so as we can translate `Cast` to V2 `Cast` safely.

Note: The rule `SimplifyCasts` optimize some safe cast, e.g. cast int to long, so we may not see the `Cast`.

Add the range for DS V2 push down `Cast`.

'Yes'.
`Cast` could be pushed down to data source in more cases.

Test cases updated.

Closes apache#37388 from beliefer/SPARK-39961.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-38901][SQL] DS V2 supports push down misc functions

Currently, Spark have some misc functions. Please refer
https://github.com/apache/spark/blob/2f8613f22c0750c00cf1dcfb2f31c431d8dc1be7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L688

These functions show below:
`AES_ENCRYPT,`
`AES_DECRYPT`,
`SHA1`,
`SHA2`,
`MD5`,
`CRC32`

Function|PostgreSQL|ClickHouse|H2|MySQL|Oracle|Redshift|Snowflake|DB2|Vertica|Exasol|SqlServer|Yellowbrick|Mariadb|Singlestore|
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
`AesEncrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`AesDecrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`Sha1`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Sha2`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Md5`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Crc32`|No|Yes|No|Yes|NO|Yes|NO|Yes|NO|NO|NO|NO|NO|Yes|

DS V2 should supports push down these misc functions.

DS V2 supports push down misc functions.

'No'.
New feature.

New tests.

Closes apache#37169 from chenzhx/misc.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39964][SQL] DS V2 pushdown should unify the translate path

Currently, DS V2 pushdown have two translate path `DataSourceStrategy.translateAggregate` used to translate aggregate functions and `V2ExpressionBuilder` used to translate other functions and expressions, we can unify them.

After this PR, the translate have only one code path, developers will easy to coding and reading.

Unify the translate path for DS V2 pushdown.

'No'.
Just update the inner implementation.

N/A

Closes apache#37391 from beliefer/SPARK-39964.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with expressions)

Currently, DS V2 aggregate push-down cannot work with DS V2 Top N push-down (`ORDER BY col LIMIT m`) or DS V2 Paging push-down (`ORDER BY col LIMIT m OFFSET n`).
If we can push down aggregate with Top N or Paging, it will be better performance.

This PR only let aggregate pushed down with ORDER BY expressions which must be GROUP BY expressions.

The idea of this PR are:
1. When we give an expectation outputs of `ScanBuilderHolder`, holding the map from expectation outputs to origin expressions (contains origin columns).
2. When we try to push down Top N or Paging, we need restore the origin expressions for `SortOrder`.

Let DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions), then users can get the better performance.

'No'.
New feature.

New test cases.

Closes apache#37320 from beliefer/SPARK-39819_new.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39929][SQL] DS V2 supports push down string  functions(non ANSI)

**What changes were proposed in this pull request?**

support more  commonly used string functions

BIT_LENGTH
CHAR_LENGTH
CONCAT

The mainstream databases support these functions show below.

Function | PostgreSQL | ClickHouse | H2 | MySQL | Oracle | Redshift | Presto | Teradata | Snowflake | DB2 | Vertica | Exasol | SqlServer | Yellowbrick | Impala | Mariadb | Druid | Pig | SQLite | Influxdata | Singlestore | ElasticSearch
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
BIT_LENGTH | Yes | Yes | Yes | Yes | Yes | no | no | no | no | Yes | Yes | Yes | no | Yes | no | Yes | no | no | no | no | no | Yes
CHAR_LENGTH | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | Yes | Yes | Yes | Yes
CONCAT | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | no | no | Yes | Yes

**Why are the changes needed?**
DS V2 supports push down string functions

**Does this PR introduce any user-facing change?**
'No'.
New feature.

How was this patch tested?
New tests.

Closes apache#37427 from zheniantoushipashi/SPARK-39929.

Authored-by: biaobiao.sun <1319027852@qq.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-38899][SQL][FOLLOWUP] Fix bug extract datetime in DS V2 pushdown

[SPARK-38899](apache#36663) supports extract function in JDBC data source.
But the implement is incorrect.
This PR just add a test case and it will be failed !
The test case show below.
```
test("scan with filter push-down with date time functions")  {
    val df9 = sql("SELECT name FROM h2.test.datetime WHERE " +
      "dayofyear(date1) > 100 order by dayofyear(date1) limit 1")
    checkFiltersRemoved(df9)
    val expectedPlanFragment9 =
      "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], " +
      "PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
    checkPushedInfo(df9, expectedPlanFragment9)
    checkAnswer(df9, Seq(Row("alex")))
  }
```

The test case output failure show below.
```
"== Parsed Logical Plan ==
'GlobalLimit 1
+- 'LocalLimit 1
   +- 'Sort ['dayofyear('date1) ASC NULLS FIRST], true
      +- 'Project ['name]
         +- 'Filter ('dayofyear('date1) > 100)
            +- 'UnresolvedRelation [h2, test, datetime], [], false

== Analyzed Logical Plan ==
name: string
GlobalLimit 1
+- LocalLimit 1
   +- Project [name#x]
      +- Sort [dayofyear(date1#x) ASC NULLS FIRST], true
         +- Project [name#x, date1#x]
            +- Filter (dayofyear(date1#x) > 100)
               +- SubqueryAlias h2.test.datetime
                  +- RelationV2[NAME#x, DATE1#x, TIME1#x] h2.test.datetime test.datetime

== Optimized Logical Plan ==
Project [name#x]
+- RelationV2[NAME#x] test.datetime

== Physical Plan ==
*(1) Scan org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$145f6181a [NAME#x] PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [org.apache.spark.sql.connector.expressions.Extract3b95fce9 ASC NULLS FIRST] LIMIT 1, ReadSchema: struct<NAME:string>

" did not contain "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
```

Fix an implement bug.
The reason of the bug is the Extract the function does not implement the toString method when pushing down the JDBC data source.

'No'.
New feature.

New test case.

Closes apache#37469 from chenzhx/spark-master.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* code update

Signed-off-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: huaxingao <huaxin_gao@apple.com>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Co-authored-by: chenliang.lu <marssss2929@gmail.com>
Co-authored-by: biaobiao.sun <1319027852@qq.com>
zheniantoushipashi added a commit to Kyligence/spark that referenced this pull request Aug 21, 2023
…aging (Sort with expressions) (#525)

* [SPARK-39784][SQL] Put Literal values on the right side of the data source filter after translating Catalyst Expression to data source filter

Even though the literal value could be on both sides of the filter, e.g. both `a > 1` and `1 < a` are valid, after translating Catalyst Expression to data source filter, we want the literal value on the right side so it's easier for the data source to handle these filters. We do this kind of normalization for V1 Filter. We should have the same behavior for V2 Filter.

Before this PR, for the filters that have literal values on the right side, e.g. `1 > a`, we keep it as is. After this PR, we will normalize it to `a < 1` so the data source doesn't need to check each of the filters (and do the flip).

I think we should follow V1 Filter's behavior, normalize the filters during catalyst Expression to DS Filter translation time to make the literal values on the right side, so later on, data source doesn't need to check every single filter to figure out if it needs to flip the sides.

no

new test

Closes apache#37197 from huaxingao/flip.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39836][SQL] Simplify V2ExpressionBuilder by extract common method

Currently, `V2ExpressionBuilder` have a lot of similar code, we can extract them as one common method.

We can simplify the implement with the common method.

Simplify `V2ExpressionBuilder` by extract common method.

'No'.
Just update inner implementation.

N/A

Closes apache#37249 from beliefer/SPARK-39836.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39858][SQL] Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

When I using `AliasHelper`, I find that some rules inherit it instead of using it.

This PR removes unnecessary `AliasHelper` or `PredicateHelper` in the following cases:
- The rule inherit `AliasHelper` instead of using it. In this case, we can remove `AliasHelper` directly.
- The rule inherit `PredicateHelper` instead of using it. In this case, we can remove `PredicateHelper` directly.
- The rule inherit `AliasHelper` and `PredicateHelper`. In fact, `PredicateHelper` already extends `AliasHelper`. In this case, we can remove `AliasHelper`.
- The rule inherit `OperationHelper` and `PredicateHelper`. In fact, `OperationHelper` already extends `PredicateHelper`. In this case, we can remove `PredicateHelper`.
- The rule inherit `PlanTest` and `PredicateHelper`. In fact, `PlanTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.
- The rule inherit `QueryTest` and `PredicateHelper`. In fact, `QueryTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.

Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

'No'.
Just improve the inner implementation.

N/A

Closes apache#37272 from beliefer/SPARK-39858.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39784][SQL][FOLLOW-UP] Use BinaryComparison instead of Predicate (if) for type check

follow up this [comment](apache#37197 (comment))

code simplification

No

Existing test

Closes apache#37278 from huaxingao/followup.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-39909] Organize the check of push down information for JDBCV2Suite

This PR changes the check method from `check(one_large_string)` to `check(small_string1, small_string2, ...)`

It can help us check the results individually and make the code more clearer.

no

existing tests

Closes apache#37342 from yabola/fix.

Authored-by: chenliang.lu <marssss2929@gmail.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39961][SQL] DS V2 push-down translate Cast if the cast is safe

Currently, DS V2 push-down translate `Cast` only if the ansi mode is true.
In fact, if the cast is safe(e.g. cast number to string, cast int to long), we can translate it too.

This PR will call `Cast.canUpCast` so as we can translate `Cast` to V2 `Cast` safely.

Note: The rule `SimplifyCasts` optimize some safe cast, e.g. cast int to long, so we may not see the `Cast`.

Add the range for DS V2 push down `Cast`.

'Yes'.
`Cast` could be pushed down to data source in more cases.

Test cases updated.

Closes apache#37388 from beliefer/SPARK-39961.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-38901][SQL] DS V2 supports push down misc functions

Currently, Spark have some misc functions. Please refer
https://github.com/apache/spark/blob/2f8613f22c0750c00cf1dcfb2f31c431d8dc1be7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L688

These functions show below:
`AES_ENCRYPT,`
`AES_DECRYPT`,
`SHA1`,
`SHA2`,
`MD5`,
`CRC32`

Function|PostgreSQL|ClickHouse|H2|MySQL|Oracle|Redshift|Snowflake|DB2|Vertica|Exasol|SqlServer|Yellowbrick|Mariadb|Singlestore|
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
`AesEncrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`AesDecrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`Sha1`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Sha2`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Md5`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Crc32`|No|Yes|No|Yes|NO|Yes|NO|Yes|NO|NO|NO|NO|NO|Yes|

DS V2 should supports push down these misc functions.

DS V2 supports push down misc functions.

'No'.
New feature.

New tests.

Closes apache#37169 from chenzhx/misc.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39964][SQL] DS V2 pushdown should unify the translate path

Currently, DS V2 pushdown have two translate path `DataSourceStrategy.translateAggregate` used to translate aggregate functions and `V2ExpressionBuilder` used to translate other functions and expressions, we can unify them.

After this PR, the translate have only one code path, developers will easy to coding and reading.

Unify the translate path for DS V2 pushdown.

'No'.
Just update the inner implementation.

N/A

Closes apache#37391 from beliefer/SPARK-39964.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with expressions)

Currently, DS V2 aggregate push-down cannot work with DS V2 Top N push-down (`ORDER BY col LIMIT m`) or DS V2 Paging push-down (`ORDER BY col LIMIT m OFFSET n`).
If we can push down aggregate with Top N or Paging, it will be better performance.

This PR only let aggregate pushed down with ORDER BY expressions which must be GROUP BY expressions.

The idea of this PR are:
1. When we give an expectation outputs of `ScanBuilderHolder`, holding the map from expectation outputs to origin expressions (contains origin columns).
2. When we try to push down Top N or Paging, we need restore the origin expressions for `SortOrder`.

Let DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions), then users can get the better performance.

'No'.
New feature.

New test cases.

Closes apache#37320 from beliefer/SPARK-39819_new.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39929][SQL] DS V2 supports push down string  functions(non ANSI)

**What changes were proposed in this pull request?**

support more  commonly used string functions

BIT_LENGTH
CHAR_LENGTH
CONCAT

The mainstream databases support these functions show below.

Function | PostgreSQL | ClickHouse | H2 | MySQL | Oracle | Redshift | Presto | Teradata | Snowflake | DB2 | Vertica | Exasol | SqlServer | Yellowbrick | Impala | Mariadb | Druid | Pig | SQLite | Influxdata | Singlestore | ElasticSearch
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
BIT_LENGTH | Yes | Yes | Yes | Yes | Yes | no | no | no | no | Yes | Yes | Yes | no | Yes | no | Yes | no | no | no | no | no | Yes
CHAR_LENGTH | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | Yes | Yes | Yes | Yes
CONCAT | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | no | no | Yes | Yes

**Why are the changes needed?**
DS V2 supports push down string functions

**Does this PR introduce any user-facing change?**
'No'.
New feature.

How was this patch tested?
New tests.

Closes apache#37427 from zheniantoushipashi/SPARK-39929.

Authored-by: biaobiao.sun <1319027852@qq.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-38899][SQL][FOLLOWUP] Fix bug extract datetime in DS V2 pushdown

[SPARK-38899](apache#36663) supports extract function in JDBC data source.
But the implement is incorrect.
This PR just add a test case and it will be failed !
The test case show below.
```
test("scan with filter push-down with date time functions")  {
    val df9 = sql("SELECT name FROM h2.test.datetime WHERE " +
      "dayofyear(date1) > 100 order by dayofyear(date1) limit 1")
    checkFiltersRemoved(df9)
    val expectedPlanFragment9 =
      "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], " +
      "PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
    checkPushedInfo(df9, expectedPlanFragment9)
    checkAnswer(df9, Seq(Row("alex")))
  }
```

The test case output failure show below.
```
"== Parsed Logical Plan ==
'GlobalLimit 1
+- 'LocalLimit 1
   +- 'Sort ['dayofyear('date1) ASC NULLS FIRST], true
      +- 'Project ['name]
         +- 'Filter ('dayofyear('date1) > 100)
            +- 'UnresolvedRelation [h2, test, datetime], [], false

== Analyzed Logical Plan ==
name: string
GlobalLimit 1
+- LocalLimit 1
   +- Project [name#x]
      +- Sort [dayofyear(date1#x) ASC NULLS FIRST], true
         +- Project [name#x, date1#x]
            +- Filter (dayofyear(date1#x) > 100)
               +- SubqueryAlias h2.test.datetime
                  +- RelationV2[NAME#x, DATE1#x, TIME1#x] h2.test.datetime test.datetime

== Optimized Logical Plan ==
Project [name#x]
+- RelationV2[NAME#x] test.datetime

== Physical Plan ==
*(1) Scan org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$145f6181a [NAME#x] PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [org.apache.spark.sql.connector.expressions.Extract3b95fce9 ASC NULLS FIRST] LIMIT 1, ReadSchema: struct<NAME:string>

" did not contain "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
```

Fix an implement bug.
The reason of the bug is the Extract the function does not implement the toString method when pushing down the JDBC data source.

'No'.
New feature.

New test case.

Closes apache#37469 from chenzhx/spark-master.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* code update

Signed-off-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: huaxingao <huaxin_gao@apple.com>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Co-authored-by: chenliang.lu <marssss2929@gmail.com>
Co-authored-by: biaobiao.sun <1319027852@qq.com>
RolatZhang pushed a commit to Kyligence/spark that referenced this pull request Aug 29, 2023
…aging (Sort with expressions) (#525)

* [SPARK-39784][SQL] Put Literal values on the right side of the data source filter after translating Catalyst Expression to data source filter

Even though the literal value could be on both sides of the filter, e.g. both `a > 1` and `1 < a` are valid, after translating Catalyst Expression to data source filter, we want the literal value on the right side so it's easier for the data source to handle these filters. We do this kind of normalization for V1 Filter. We should have the same behavior for V2 Filter.

Before this PR, for the filters that have literal values on the right side, e.g. `1 > a`, we keep it as is. After this PR, we will normalize it to `a < 1` so the data source doesn't need to check each of the filters (and do the flip).

I think we should follow V1 Filter's behavior, normalize the filters during catalyst Expression to DS Filter translation time to make the literal values on the right side, so later on, data source doesn't need to check every single filter to figure out if it needs to flip the sides.

no

new test

Closes apache#37197 from huaxingao/flip.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39836][SQL] Simplify V2ExpressionBuilder by extract common method

Currently, `V2ExpressionBuilder` have a lot of similar code, we can extract them as one common method.

We can simplify the implement with the common method.

Simplify `V2ExpressionBuilder` by extract common method.

'No'.
Just update inner implementation.

N/A

Closes apache#37249 from beliefer/SPARK-39836.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39858][SQL] Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

When I using `AliasHelper`, I find that some rules inherit it instead of using it.

This PR removes unnecessary `AliasHelper` or `PredicateHelper` in the following cases:
- The rule inherit `AliasHelper` instead of using it. In this case, we can remove `AliasHelper` directly.
- The rule inherit `PredicateHelper` instead of using it. In this case, we can remove `PredicateHelper` directly.
- The rule inherit `AliasHelper` and `PredicateHelper`. In fact, `PredicateHelper` already extends `AliasHelper`. In this case, we can remove `AliasHelper`.
- The rule inherit `OperationHelper` and `PredicateHelper`. In fact, `OperationHelper` already extends `PredicateHelper`. In this case, we can remove `PredicateHelper`.
- The rule inherit `PlanTest` and `PredicateHelper`. In fact, `PlanTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.
- The rule inherit `QueryTest` and `PredicateHelper`. In fact, `QueryTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.

Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

'No'.
Just improve the inner implementation.

N/A

Closes apache#37272 from beliefer/SPARK-39858.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39784][SQL][FOLLOW-UP] Use BinaryComparison instead of Predicate (if) for type check

follow up this [comment](apache#37197 (comment))

code simplification

No

Existing test

Closes apache#37278 from huaxingao/followup.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-39909] Organize the check of push down information for JDBCV2Suite

This PR changes the check method from `check(one_large_string)` to `check(small_string1, small_string2, ...)`

It can help us check the results individually and make the code more clearer.

no

existing tests

Closes apache#37342 from yabola/fix.

Authored-by: chenliang.lu <marssss2929@gmail.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39961][SQL] DS V2 push-down translate Cast if the cast is safe

Currently, DS V2 push-down translate `Cast` only if the ansi mode is true.
In fact, if the cast is safe(e.g. cast number to string, cast int to long), we can translate it too.

This PR will call `Cast.canUpCast` so as we can translate `Cast` to V2 `Cast` safely.

Note: The rule `SimplifyCasts` optimize some safe cast, e.g. cast int to long, so we may not see the `Cast`.

Add the range for DS V2 push down `Cast`.

'Yes'.
`Cast` could be pushed down to data source in more cases.

Test cases updated.

Closes apache#37388 from beliefer/SPARK-39961.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-38901][SQL] DS V2 supports push down misc functions

Currently, Spark have some misc functions. Please refer
https://github.com/apache/spark/blob/2f8613f22c0750c00cf1dcfb2f31c431d8dc1be7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L688

These functions show below:
`AES_ENCRYPT,`
`AES_DECRYPT`,
`SHA1`,
`SHA2`,
`MD5`,
`CRC32`

Function|PostgreSQL|ClickHouse|H2|MySQL|Oracle|Redshift|Snowflake|DB2|Vertica|Exasol|SqlServer|Yellowbrick|Mariadb|Singlestore|
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
`AesEncrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`AesDecrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`Sha1`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Sha2`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Md5`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Crc32`|No|Yes|No|Yes|NO|Yes|NO|Yes|NO|NO|NO|NO|NO|Yes|

DS V2 should supports push down these misc functions.

DS V2 supports push down misc functions.

'No'.
New feature.

New tests.

Closes apache#37169 from chenzhx/misc.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39964][SQL] DS V2 pushdown should unify the translate path

Currently, DS V2 pushdown have two translate path `DataSourceStrategy.translateAggregate` used to translate aggregate functions and `V2ExpressionBuilder` used to translate other functions and expressions, we can unify them.

After this PR, the translate have only one code path, developers will easy to coding and reading.

Unify the translate path for DS V2 pushdown.

'No'.
Just update the inner implementation.

N/A

Closes apache#37391 from beliefer/SPARK-39964.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with expressions)

Currently, DS V2 aggregate push-down cannot work with DS V2 Top N push-down (`ORDER BY col LIMIT m`) or DS V2 Paging push-down (`ORDER BY col LIMIT m OFFSET n`).
If we can push down aggregate with Top N or Paging, it will be better performance.

This PR only let aggregate pushed down with ORDER BY expressions which must be GROUP BY expressions.

The idea of this PR are:
1. When we give an expectation outputs of `ScanBuilderHolder`, holding the map from expectation outputs to origin expressions (contains origin columns).
2. When we try to push down Top N or Paging, we need restore the origin expressions for `SortOrder`.

Let DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions), then users can get the better performance.

'No'.
New feature.

New test cases.

Closes apache#37320 from beliefer/SPARK-39819_new.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39929][SQL] DS V2 supports push down string  functions(non ANSI)

**What changes were proposed in this pull request?**

support more  commonly used string functions

BIT_LENGTH
CHAR_LENGTH
CONCAT

The mainstream databases support these functions show below.

Function | PostgreSQL | ClickHouse | H2 | MySQL | Oracle | Redshift | Presto | Teradata | Snowflake | DB2 | Vertica | Exasol | SqlServer | Yellowbrick | Impala | Mariadb | Druid | Pig | SQLite | Influxdata | Singlestore | ElasticSearch
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
BIT_LENGTH | Yes | Yes | Yes | Yes | Yes | no | no | no | no | Yes | Yes | Yes | no | Yes | no | Yes | no | no | no | no | no | Yes
CHAR_LENGTH | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | Yes | Yes | Yes | Yes
CONCAT | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | no | no | Yes | Yes

**Why are the changes needed?**
DS V2 supports push down string functions

**Does this PR introduce any user-facing change?**
'No'.
New feature.

How was this patch tested?
New tests.

Closes apache#37427 from zheniantoushipashi/SPARK-39929.

Authored-by: biaobiao.sun <1319027852@qq.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-38899][SQL][FOLLOWUP] Fix bug extract datetime in DS V2 pushdown

[SPARK-38899](apache#36663) supports extract function in JDBC data source.
But the implement is incorrect.
This PR just add a test case and it will be failed !
The test case show below.
```
test("scan with filter push-down with date time functions")  {
    val df9 = sql("SELECT name FROM h2.test.datetime WHERE " +
      "dayofyear(date1) > 100 order by dayofyear(date1) limit 1")
    checkFiltersRemoved(df9)
    val expectedPlanFragment9 =
      "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], " +
      "PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
    checkPushedInfo(df9, expectedPlanFragment9)
    checkAnswer(df9, Seq(Row("alex")))
  }
```

The test case output failure show below.
```
"== Parsed Logical Plan ==
'GlobalLimit 1
+- 'LocalLimit 1
   +- 'Sort ['dayofyear('date1) ASC NULLS FIRST], true
      +- 'Project ['name]
         +- 'Filter ('dayofyear('date1) > 100)
            +- 'UnresolvedRelation [h2, test, datetime], [], false

== Analyzed Logical Plan ==
name: string
GlobalLimit 1
+- LocalLimit 1
   +- Project [name#x]
      +- Sort [dayofyear(date1#x) ASC NULLS FIRST], true
         +- Project [name#x, date1#x]
            +- Filter (dayofyear(date1#x) > 100)
               +- SubqueryAlias h2.test.datetime
                  +- RelationV2[NAME#x, DATE1#x, TIME1#x] h2.test.datetime test.datetime

== Optimized Logical Plan ==
Project [name#x]
+- RelationV2[NAME#x] test.datetime

== Physical Plan ==
*(1) Scan org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$145f6181a [NAME#x] PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [org.apache.spark.sql.connector.expressions.Extract3b95fce9 ASC NULLS FIRST] LIMIT 1, ReadSchema: struct<NAME:string>

" did not contain "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
```

Fix an implement bug.
The reason of the bug is the Extract the function does not implement the toString method when pushing down the JDBC data source.

'No'.
New feature.

New test case.

Closes apache#37469 from chenzhx/spark-master.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* code update

Signed-off-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: huaxingao <huaxin_gao@apple.com>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Co-authored-by: chenliang.lu <marssss2929@gmail.com>
Co-authored-by: biaobiao.sun <1319027852@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants