Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect #40359

Closed
wants to merge 18 commits into from

Conversation

beliefer
Copy link
Contributor

What changes were proposed in this pull request?

Currently, the DS V2 pushdown framework pushed offset as OFFSET n in default and pushed it with limit as LIMIT m OFFSET n. But some built-in dialect doesn't support these syntax. So, when Spark pushdown offset into these databases, them throwing errors.

Why are the changes needed?

Fix the bug that pushdown offset or paging is invalid for some built-in dialect.

Does this PR introduce any user-facing change?

'Yes'.
The bug will be fixed.

How was this patch tested?

New test cases.

@github-actions github-actions bot removed the DOCS label Mar 10, 2023
@beliefer
Copy link
Contributor Author

ping @cloud-fan cc @sadikovi

@@ -410,6 +410,15 @@ private[v2] trait V2JDBCTest extends SharedSparkSession with DockerIntegrationFu
assert(sorts.isEmpty)
}

private def checkOffsetPushed(df: DataFrame, offset: Option[Int]): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rename limitPushed to checkLimitPushed and follow the implementation here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do it in another PR.

override def getOffsetClause(offset: Integer): String = {
// Oracle doesn't support OFFSET clause.
// We can use rownum > n to skip some rows in the result set.
// Note: rn is an alias of rownum.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every table in oracle has the rn column?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, I see the implementation.

options.prepareQuery + s"SELECT tab.* FROM ($selectStmt) tab $limitClause"
val finalSelectStmt = if (limit > 0) {
if (offset > 0) {
s"SELECT $columnList FROM (SELECT tab.*, rownum rn FROM ($selectStmt) tab)" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

SELECT * FROM ($selectStmt) tab WHERE rownum > ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is a bug of Oracle.
If we using rownum directly, the result will incorrect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://stackoverflow.com/questions/31186166/oracle-sql-filtering-by-rownum-not-returning-results-when-it-should

Let's mention the reason as well: the rownum is calculated when the value is returned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we can use the new syntax in oracle 12+, which should be the widely used versions today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we can use the new syntax in oracle 12+, which should be the widely used versions today.

Could we upgrade the version of oracle in another PR if it is widely used.

extends JdbcSQLQueryBuilder(dialect, options) {

override def build(): String = {
if (limit < 1 && offset > 0) {
Copy link
Contributor

@cloud-fan cloud-fan Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to have limit = 0? seems safer to use limit < 0 to indicate no limit, as the default value is -1

Copy link
Contributor Author

@beliefer beliefer Mar 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if limit = 0, Optimizer will convert it to empty relation. But it's OK to use limit < 0 too.

@cloud-fan cloud-fan closed this in aea76ce Mar 13, 2023
@cloud-fan
Copy link
Contributor

thanks, merged to master! can you open a backport PR for 3.4?

@beliefer
Copy link
Contributor Author

thanks, merged to master! can you open a backport PR for 3.4?

Thank you! I will create it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants