Skip to content

Conversation

zaleslaw
Copy link
Collaborator

@zaleslaw zaleslaw commented Oct 9, 2025

This commit introduces new DataFrame JDBC extension functions with DataSource support, removing redundant duplicate utilities. It includes revisions to streamline table/schema reading and validation features, delegating reusable connection-handling logic to a dedicated utility class. Also refactor file structure for better organization of DB-related code.

Closes #1424 #680 #454 #1266

This commit introduces new DataFrame JDBC extension functions with `DataSource` support, removing redundant duplicate utilities. It includes revisions to streamline table/schema reading and validation features, delegating reusable connection-handling logic to a dedicated utility class. Also refactor file structure for better organization of DB-related code.
… for consistency and improved readability.
@zaleslaw zaleslaw requested a review from Copilot October 9, 2025 15:03
…ed file `readDataFrameSchema.kt`, improving organization and code clarity. Converted several helper functions to `internal` visibility for encapsulation.
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the DataFrame JDBC API by adding DataSource support and streamlining the codebase. It extracts common utilities to dedicated files, removes code duplication, and introduces new extension functions for working with DataSource objects.

  • Extracted validation utilities and connection management to separate files for better organization
  • Added comprehensive DataSource support through new extension functions
  • Reorganized metadata classes into the dedicated db package structure

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/validationUtil.kt New file containing extracted validation utilities and connection management functions
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt Major refactoring removing duplicate code and adding DataSource extension functions
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/TableMetadata.kt New file containing extracted TableMetadata class
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/TableColumnMetadata.kt New file containing extracted TableColumnMetadata class
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/DbConnectionConfig.kt New file containing extracted DbConnectionConfig class
Multiple db/*.kt files Updated import statements to reflect the new package structure

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
import org.jetbrains.kotlinx.dataframe.io.TableColumnMetadata
import org.jetbrains.kotlinx.dataframe.io.TableMetadata
import org.jetbrains.kotlinx.dataframe.io.db.TableColumnMetadata
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jupyter integration automatically imports org.jetbrains.kotlinx.dataframe.io package
If you want to move, add import org.jetbrains.kotlinx.dataframe.io.db there

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it import only the direct package without subpackages? in that case, yes, it should be

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, only direct package

…ry limits, and standardize identifier quoting for various database dialects
@Jolanrensen Jolanrensen self-requested a review October 13, 2025 10:14
return false
}

// Check if there are balanced quotes (single and double)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can quotes be escaped in SQL? if so, we shouldn't count them when escaped

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, what about backticks?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://learnsql.com/cookbook/how-to-escape-single-quotes-in-sql/ According to here, single quotes are usually escaped by writing them twice,
so INSERT INTO customer (id, customer_name) VALUES (502, 'Lay''s');. In this case your check works.
However, on MySQL and PostgreSQL, you can write INSERT INTO customer (id, customer_name)VALUES (502, 'Lay\'s'); in which case this check will trigger a false negative. There are 3 single quotes but the statement is still valid.

Finally, this is how Oracle does it: INSERT INTO customer (id, customer_name) VALUES (502, q'[Lay's]'); in which case your check will also trigger a false negative.

Similarly, I believe alternating is allowed, so ' hi " there ' is valid, but again a false negative is triggered because there's no even number of double quotes.

I'd remove this check altogether and find a more general solution. There's too many edge cases

…es, improve result set processing, and streamline table metadata handling
…mline SQL type handling and improve readability
…abase` utility for improved code reuse and clarity.
* Examples:
* - PostgreSQL: "tableName" or "schema"."table"
* - MySQL: `tableName` or `schema`.`table`
* - MS SQL: [tableName] or [schema].[table]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

escape [],

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image still shows this in the IDE image but it renders correctly, so I'm happy :)

…proved post-processing efficiency and reduced copying. Adjusted related functions to accept mutable lists accordingly.
@Jolanrensen
Copy link
Collaborator

I'm not a fan of the introduction of MutableList in DbType. In many cases, nothing needs to be modified, in which case returning the immutable list is just as efficient, and if values do need to be mapped, they only need to be read once after which they can be garbage collected, so again not much benefit from making it mutable.

Plus, things can easily get confusing when you mix mutable and immutable lists. Just look at:

postProcessColumnValues(values: MutableList<Any?>, ...): List<Any?>

How values will be used is unclear looking at this. Will values inside be overridden? will a new list be returned? will the types change? Very hard to tell.

You could decide to save some memory (which I think will be negligible) by making the list mutable, but then, the function should probably not return anything:

postProcessColumnValues(values: MutableList<Any?>, ...)

However, I'd keep it simple and stick with the immutability of DataFrame:

postProcessColumnValues(values: List<Any?>, ...): List<Any?>

…dDataFrameFromDatabase`, converted data classes to classes with equality and hashCode implementations, added validation for database interaction methods.
@zaleslaw
Copy link
Collaborator Author

@Jolanrensen regarding Mutablity vs Immutability for internals - the initial idea was to follow immutability style for easy-to-mantain approach, but after couple of request and communication, I've decided to optimize some obvious place to avoid additional copying before building a final DataFrame, also I'm planning to provide comment for each such situation to avoid problems in future, the same should be done for the working with metadata

…or consistency and clarity across the DataFrame JDBC API. Updated all usages and tests accordingly.
…qlQuery` for consistency with updated naming conventions.
…treamline SQL query-related methods, and enhance table schema handling
…o `null` for unlimited row fetching. Updated all related methods and documentation for clarity.
…tive; removed redundant exception handling and updated query-building logic.
…Database` to `executeQueryAndBuildDataFrame` across JDBC methods for improved readability and consistency.
…bleColumns` and `buildDataColumn` functions, streamline column post-processing logic, and enhance modularity across schema and SQL utilities.
// TODO: add a special handler for Blob via Streams
} catch (_: Throwable) {
// TODO: expand for all the types like in generateKType function
if (kType.isSupertypeOf(String::class.starProjectedType)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to avoid ::class, it's way heavier than typeOf<String>(). Also, don't you mean subType? instead of superType?

…, streamline `TableMetadata` class with compact constructor and copy method, and remove unused `columnMetadata` parameter from `buildDataColumn`.
@zaleslaw zaleslaw merged commit 77a0d7e into master Oct 16, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add DataSource support for JDBC integration

3 participants