New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect #40359
Conversation
…lid for some built-in dialect
ping @cloud-fan cc @sadikovi |
@@ -410,6 +410,15 @@ private[v2] trait V2JDBCTest extends SharedSparkSession with DockerIntegrationFu | |||
assert(sorts.isEmpty) | |||
} | |||
|
|||
private def checkOffsetPushed(df: DataFrame, offset: Option[Int]): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we rename limitPushed
to checkLimitPushed
and follow the implementation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do it in another PR.
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala
Outdated
Show resolved
Hide resolved
override def getOffsetClause(offset: Integer): String = { | ||
// Oracle doesn't support OFFSET clause. | ||
// We can use rownum > n to skip some rows in the result set. | ||
// Note: rn is an alias of rownum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
every table in oracle has the rn
column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, I see the implementation.
options.prepareQuery + s"SELECT tab.* FROM ($selectStmt) tab $limitClause" | ||
val finalSelectStmt = if (limit > 0) { | ||
if (offset > 0) { | ||
s"SELECT $columnList FROM (SELECT tab.*, rownum rn FROM ($selectStmt) tab)" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about
SELECT * FROM ($selectStmt) tab WHERE rownum > ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is a bug of Oracle.
If we using rownum directly, the result will incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's mention the reason as well: the rownum
is calculated when the value is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or we can use the new syntax in oracle 12+, which should be the widely used versions today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or we can use the new syntax in oracle 12+, which should be the widely used versions today.
Could we upgrade the version of oracle in another PR if it is widely used.
extends JdbcSQLQueryBuilder(dialect, options) { | ||
|
||
override def build(): String = { | ||
if (limit < 1 && offset > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to have limit = 0? seems safer to use limit < 0
to indicate no limit, as the default value is -1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if limit = 0, Optimizer will convert it to empty relation. But it's OK to use limit < 0
too.
thanks, merged to master! can you open a backport PR for 3.4? |
Thank you! I will create it. |
What changes were proposed in this pull request?
Currently, the DS V2 pushdown framework pushed offset as
OFFSET n
in default and pushed it with limit asLIMIT m OFFSET n
. But some built-in dialect doesn't support these syntax. So, when Spark pushdown offset into these databases, them throwing errors.Why are the changes needed?
Fix the bug that pushdown offset or paging is invalid for some built-in dialect.
Does this PR introduce any user-facing change?
'Yes'.
The bug will be fixed.
How was this patch tested?
New test cases.