New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SQL] Fix parsing JDBCOptions(table=...) containing subquery #2546
[SQL] Fix parsing JDBCOptions(table=...) containing subquery #2546
Conversation
823de3d
to
5682e76
Compare
@@ -91,6 +92,12 @@ void testHandlingJdbcTable() { | |||
|
|||
@Test | |||
void testHandlingJdbcDbTableAsSubQuery() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to create a test testHandlingJdbcDbTableAsSubQueryWithUnknownDialect
? I am not following how testUnknownDialect
verifies a behaviour described within issue description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testUnknownDialect
is just checking that if here dialect
is null
, query still can be parsed using generic dialect:
Lines 105 to 106 in 62276dc
String dialect = extractDialectFromJdbcUrl(relation.jdbcOptions().url()); | |
Optional<SqlMeta> sqlMeta = OpenLineageSql.parse(Collections.singletonList(query), dialect); |
I'm using Clickhouse JDBC driver, and want to be sure that even without Clickhouse dialect support in openlineage-sql, I still can get table names read by Spark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redarding testHandlingJdbcDbTableAsSubQueryWithUnknownDialect
test, I don't think it will make much sense. Both cases table is not set, query=SELECT ...
and table=(SELECT...)
are now handled in the same way regardless the dialect, and covered by tests above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Common practice about fixing issues and unit tests is to: (1) write a failing unit test and (2) fix a code to make a failing test pass.
Issue description is
openlineage-spark returns datasets with names like database.(select * from table where filter='value') T, not database.table
Is there any test verifying this?
How does the introduced test testUnknownDialect
relate to issue description and code that has been modified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) write a failing unit test
(2) fix a code to make a failing test pass
I've updated testHandlingJdbcDbTableAsSubQuery
test to cover this. Previously it covered just the same case as testHandlingJdbcQuery
Signed-off-by: Martynov Maxim <martinov_m_s_@mail.ru>
5682e76
to
fb46c07
Compare
Problem
馃憢 Thanks for opening a pull request! Please include a brief summary of the problem your change is trying to solve, or bug fix. If your change fixes a bug or you'd like to provide context on why you're making the change, please link the issue as follows:
In the following case:
openlineage-spark
returns datasets with names likedatabase.(select * from table where filter='value') T
, notdatabase.table
. This is becauseJDBCOptions.table
is expected to always return a table name, not subquery.Solution
Please describe your change as it relates to the problem, or bug fix, as well as any dependencies. If your change requires a schema change, please describe the schema modification(s) and whether it's a backwards-incompatible or backwards-compatible change, then select one of the following:
If you're contributing a new integration, please specify the scope of the integration and how/where it has been tested (e.g., Apache Spark integration supports
S3
andGCS
filesystem operations, tested with AWS EMR).If
JDBCOptions.table
contains a subquery, fallback to branch with SQL parsing to extract table names from this query.I've also added a test that
openlineage-spark
can parse query of unknown JDBC dialect.One-line summary:
Fix
openlineage-spark
producing datasets with names likedatabase.(select * from table)
for JDBC sources.Checklist
SPDX-License-Identifier: Apache-2.0
Copyright 2018-2023 contributors to the OpenLineage project