-
Notifications
You must be signed in to change notification settings - Fork 75
Refactored DataFrame JDBC API plus DataSource handling #1487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit introduces new DataFrame JDBC extension functions with `DataSource` support, removing redundant duplicate utilities. It includes revisions to streamline table/schema reading and validation features, delegating reusable connection-handling logic to a dedicated utility class. Also refactor file structure for better organization of DB-related code.
… for consistency and improved readability.
…ed file `readDataFrameSchema.kt`, improving organization and code clarity. Converted several helper functions to `internal` visibility for encapsulation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the DataFrame JDBC API by adding DataSource support and streamlining the codebase. It extracts common utilities to dedicated files, removes code duplication, and introduces new extension functions for working with DataSource objects.
- Extracted validation utilities and connection management to separate files for better organization
- Added comprehensive DataSource support through new extension functions
- Reorganized metadata classes into the dedicated db package structure
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
File | Description |
---|---|
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/validationUtil.kt | New file containing extracted validation utilities and connection management functions |
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt | Major refactoring removing duplicate code and adding DataSource extension functions |
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/TableMetadata.kt | New file containing extracted TableMetadata class |
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/TableColumnMetadata.kt | New file containing extracted TableColumnMetadata class |
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/DbConnectionConfig.kt | New file containing extracted DbConnectionConfig class |
Multiple db/*.kt files | Updated import statements to reflect the new package structure |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/validationUtil.kt
Outdated
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/validationUtil.kt
Outdated
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/validationUtil.kt
Outdated
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/validationUtil.kt
Outdated
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/validationUtil.kt
Outdated
Show resolved
Hide resolved
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig | ||
import org.jetbrains.kotlinx.dataframe.io.TableColumnMetadata | ||
import org.jetbrains.kotlinx.dataframe.io.TableMetadata | ||
import org.jetbrains.kotlinx.dataframe.io.db.TableColumnMetadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jupyter integration automatically imports org.jetbrains.kotlinx.dataframe.io
package
If you want to move, add import org.jetbrains.kotlinx.dataframe.io.db
there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it import only the direct package without subpackages? in that case, yes, it should be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, only direct package
…ed methods in tests and main codebase
…ma interface cleanup
…ry limits, and standardize identifier quoting for various database dialects
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/DbType.kt
Outdated
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/TableColumnMetadata.kt
Outdated
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/DbConnectionConfig.kt
Outdated
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readDataFrameSchema.kt
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readDataFrameSchema.kt
Show resolved
Hide resolved
return false | ||
} | ||
|
||
// Check if there are balanced quotes (single and double) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can quotes be escaped in SQL? if so, we shouldn't count them when escaped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, what about backticks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://learnsql.com/cookbook/how-to-escape-single-quotes-in-sql/ According to here, single quotes are usually escaped by writing them twice,
so INSERT INTO customer (id, customer_name) VALUES (502, 'Lay''s');
. In this case your check works.
However, on MySQL and PostgreSQL, you can write INSERT INTO customer (id, customer_name)VALUES (502, 'Lay\'s');
in which case this check will trigger a false negative. There are 3 single quotes but the statement is still valid.
Finally, this is how Oracle does it: INSERT INTO customer (id, customer_name) VALUES (502, q'[Lay's]');
in which case your check will also trigger a false negative.
Similarly, I believe alternating is allowed, so ' hi " there '
is valid, but again a false negative is triggered because there's no even number of double quotes.
I'd remove this check altogether and find a more general solution. There's too many edge cases
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readDataFrameSchema.kt
Outdated
Show resolved
Hide resolved
…es, improve result set processing, and streamline table metadata handling
…mline SQL type handling and improve readability
…abase` utility for improved code reuse and clarity.
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt
Outdated
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt
Outdated
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt
Outdated
Show resolved
Hide resolved
* Examples: | ||
* - PostgreSQL: "tableName" or "schema"."table" | ||
* - MySQL: `tableName` or `schema`.`table` | ||
* - MS SQL: [tableName] or [schema].[table] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
escape []
,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…proved post-processing efficiency and reduced copying. Adjusted related functions to accept mutable lists accordingly.
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/DbType.kt
Outdated
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/DbType.kt
Outdated
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt
Outdated
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/DbType.kt
Show resolved
Hide resolved
I'm not a fan of the introduction of Plus, things can easily get confusing when you mix mutable and immutable lists. Just look at: postProcessColumnValues(values: MutableList<Any?>, ...): List<Any?> How You could decide to save some memory (which I think will be negligible) by making the list mutable, but then, the function should probably not return anything: postProcessColumnValues(values: MutableList<Any?>, ...) However, I'd keep it simple and stick with the immutability of DataFrame: postProcessColumnValues(values: List<Any?>, ...): List<Any?> |
…dDataFrameFromDatabase`, converted data classes to classes with equality and hashCode implementations, added validation for database interaction methods.
@Jolanrensen regarding Mutablity vs Immutability for internals - the initial idea was to follow immutability style for easy-to-mantain approach, but after couple of request and communication, I've decided to optimize some obvious place to avoid additional copying before building a final DataFrame, also I'm planning to provide comment for each such situation to avoid problems in future, the same should be done for the working with metadata |
…or consistency and clarity across the DataFrame JDBC API. Updated all usages and tests accordingly.
…qlQuery` for consistency with updated naming conventions.
…SQL table and query schema generation
…treamline SQL query-related methods, and enhance table schema handling
…o `null` for unlimited row fetching. Updated all related methods and documentation for clarity.
…tive; removed redundant exception handling and updated query-building logic.
…Database` to `executeQueryAndBuildDataFrame` across JDBC methods for improved readability and consistency.
…bleColumns` and `buildDataColumn` functions, streamline column post-processing logic, and enhance modularity across schema and SQL utilities.
// TODO: add a special handler for Blob via Streams | ||
} catch (_: Throwable) { | ||
// TODO: expand for all the types like in generateKType function | ||
if (kType.isSupertypeOf(String::class.starProjectedType)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to avoid ::class
, it's way heavier than typeOf<String>()
. Also, don't you mean subType
? instead of superType
?
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/DbType.kt
Outdated
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/DbType.kt
Outdated
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/DbType.kt
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/DuckDb.kt
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/TableColumnMetadata.kt
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/TableMetadata.kt
Outdated
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/DbConnectionConfig.kt
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/DbConnectionConfig.kt
Show resolved
Hide resolved
…, streamline `TableMetadata` class with compact constructor and copy method, and remove unused `columnMetadata` parameter from `buildDataColumn`.
This commit introduces new DataFrame JDBC extension functions with
DataSource
support, removing redundant duplicate utilities. It includes revisions to streamline table/schema reading and validation features, delegating reusable connection-handling logic to a dedicated utility class. Also refactor file structure for better organization of DB-related code.Closes #1424 #680 #454 #1266