Added JDBC-integration #451

zaleslaw · 2023-09-08T12:47:22Z

Enhanced SQL Database Integration via JDBC API

This PR marks a significant advancement in our project by introducing seamless integration with SQL databases through the JDBC API, addressing issues #212 and #124.

Key Features:

This enhancement empowers our framework to read data from SQL databases with remarkable flexibility, offering the following reading regimes:

Table-based Reading: Retrieve data from specific tables by name.
Database Exploration: Read all non-system tables within the database.
ResultSet Access: Directly access and read data from ResultSet objects.
Complex Query Results: Fetch results from intricate SQL queries, including joins between two or more tables.

Each reading regime allows for either utilizing a provided database connection or establishing a connection internally if the database configuration details are available. Furthermore, every method within these regimes offers two variants: with and without result limits.

Schema Information:

In addition to data retrieval, this enhancement provides the capability to obtain schema information for the results. You can retrieve the schema of individual tables, ResultSet objects, SQL queries, or even acquire schemas for all non-system tables within the database.

Supported Databases:

This implementation currently supports the following databases:

MySQL
MariaDB
SQLite
PostgreSQL
H2

Please note that while PostgreSQL is supported, certain data types may not be fully accommodated yet.

To gain a comprehensive understanding of this new functionality, I strongly recommend consulting the documentation.

Note: The plugin support is a work in progress and not yet complete; kindly disregard this aspect for now. Your feedback and contributions are highly valued as we continue refining this feature.

…tive SQL Query

zaleslaw · 2023-09-15T06:14:14Z

@koperagen @Jolanrensen @ermolenkodev please have a look to be familiar with the code/changes.

Jolanrensen · 2023-09-18T12:23:28Z

Awesome job! I'll leave some small comments here and there for now and play around with it a bit :)

dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt

dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/Jdbc.kt

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/annotations/ImportDataSchema.kt

dataframe-jdbc/src/test/kotlin/org/jetbrains/kotlinx/dataframe/io/imdbTest.kt

Jolanrensen · 2023-09-18T12:08:14Z

...ntegrationTest/kotlin/org/jetbrains/dataframe/gradle/SchemaGeneratorPluginIntegrationTest.kt

@@ -372,6 +375,127 @@ class SchemaGeneratorPluginIntegrationTest : AbstractDataFramePluginIntegrationT
        result.task(":build")?.outcome shouldBe TaskOutcome.SUCCESS
    }

+    @Test
+    // TODO: test is broken


Sounds like a dataframe version mismatch

Sorry, what do you mean?

I think the gradle plugin uses the bootstrap version of dataframe, which does not yet include your added functions (like readSqlTable etc.). I remember from working on OpenAPI and adding support to the Gradle plugin that I continuously had to publishToMavenLocal because the Gradle plugin cannot use the source files directly.

@koperagen am I correct?

...frame-gradle-plugin/src/main/kotlin/org/jetbrains/dataframe/gradle/GenerateDataSchemaTask.kt

Jolanrensen · 2023-09-18T12:18:10Z

dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt

+ * @param [tableName] the name of the table to read data from.
+ * @return the DataFrame containing the data from the SQL table.
+ */
+public fun DataFrame.Companion.readSqlTable(dbConfig: DatabaseConfiguration, tableName: String): AnyFrame {


Could it be possible to define the DatabaseConfiguration in a little DSL as well, instead of a class instance? like:

DataFrame.readSqlTable("tableName") { url = "" user = "" password = "" }

Would be a bit more Dataframe-like

Data class is easier to reuse imo. Like, i tried to experiment with SQLite db: read multiple tables, made multiple readSqlTable calls. With data class i created a variable. With DSL it would be more tiresome, especially in notebook where there're no refactorings (?) like extract

Easier to reuse yes, but not easier to type when calling the function. Maybe we could both support a DatabaseConfiguration and just arguments in the function call?

Again, for now, a lot of functions, we tracked all the ideas here or I created a separate issues for that.

I want to give users ability to taste it with the status experimental to handle more cases

dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt

Jolanrensen · 2023-09-18T14:24:33Z

dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt

+ * @param [dbConfig] the database configuration to connect to the database, including URL, user, and password.
+ * @return a list of [AnyFrame] objects representing the non-system tables from the database.
+ */
+public fun DataFrame.Companion.readAllTables(dbConfig: DatabaseConfiguration): List<AnyFrame> {


readAllSqlTables maybe? This is defined on the entire DataFrame.Companion scope

Also, this doesn't return the names of the tables right? may be better to return a map

Another suggestion for the future: better notebooks support. We can generate a type safe wrapper for this map. From user perspective it'll be:

val tables = DataFrame.readAllTables(...)

next cell (each table has schema):

tables.employee tables.tracks ...

I explored API after proposed changes and found that for meta information better to add separate methods or use methods returning a data schema

When I changed it to map, it overcomplicated a path to the dataframes

But I also thought, that it's great to have a field name for dataframes and one global space of all created dataframes to search between them by name

One global space of all created dataframes would break the functional aspect of Kotlin DataFrame. They are immutable and any modification makes a new instance.
We could have a DataFrame Collection of some kind, but that would not be much different than a Map, or be able to name DataFrames.
Naming a DataFrame is not that far-fetched, since essentially, a DataFrame and ColumnGroup are the same and ColumnGroups can be named too. That said, currently variable names are used to refer to a DataFrame. There would be a mismatch when people would write: val dataFrameName = dataFrameOf(...) named "otherDataFrameName".

Maybe in the meantime, you could return a DataFrame containing a "name: ValueColumn" and "df: FrameColumn" column, which, especially in notebooks, would be a clear way for users to access multiple dataframes using a single one.

PaulWoitaschek · 2023-09-22T07:58:02Z

Thanks for the initiative!

We are using Trino:
https://trino.io/docs/current/client/jdbc.html

Would it be possible to make the DbType not sealed?
Then in the DatabaseConfiguration we could pass in a custom implementation of the DbType, which would superceed the automatic selection.

zaleslaw · 2023-09-22T09:56:43Z

@PaulWoitaschek thanks for the feedback, I will think about making nor sealed

PaulWoitaschek · 2023-09-22T10:02:01Z

@zaleslaw

For inspiration you can also check how exposed did it:
https://github.com/JetBrains/Exposed/blob/30a9ce0f7d6a98630147bcd8da648d974c6b63c0/documentation-website/Writerside/topics/Frequently-Asked-Questions.md?plain=1#L58

dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/MariaDb.kt

koperagen · 2023-09-22T11:38:00Z

dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/H2.kt

+
+    override fun toColumnSchema(tableColumnMetadata: TableColumnMetadata): ColumnSchema {
+        return when (tableColumnMetadata.sqlType) {
+            "CHARACTER", "CHAR" -> ColumnSchema.Value(typeOf<String>())


What about nullability? If i understand correctly, all rs.get* methods might return nulls since SQL tables can store them. Thus, if your table has nulls and you use a generated schema, code will throw an NPE

You mean all the column values with NULL or justa few NULLs in the column?

dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/H2.kt

dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt

dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/Sqlite.kt

dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt

zaleslaw added 30 commits July 17, 2023 13:06

Created a module

d67e25e

Updated a module with the inital integration and test

13c0af2

Added a new complex example for reading with Native SQL Query

b17fdda

Added an implementation for a new complex example for reading with Na…

56b2484

…tive SQL Query

Added an implementation for a new complex example for reading with Na…

24b8b2b

…tive SQL Query

Added idea for test

e477a74

Added mariadb4j integration

8a9b496

Attempt with test containers

b58fae6

Added the H2 support for testing database capabilities

b97a6b5

Set up a draft Kotlin logging

65810a2

Started with ImportDataSchema changes

1306bdc

Started with ImportDataSchema changes

b4acfde

Added missed dependencies

926b540

Added simple generation for one table

bcfe733

Finished simple prototype

7b7cb85

Added some minor ideas

b1ffdb1

Fixed bug in the KNB test

d5ab3a8

Fixed bug in the KNB test

fe96d02

Add JDBC support to dataframe gradle plugin

6f8dabf

Added API methods

d6dcc9f

Finished API methods

bdb91fc

Added import data schema annotation support

6bc8c3c

Added import data schema annotation support

81a84de

Added force-classloading for drivers

a7147a3

Refactored jdbc

0ebe177

Updated tests

01c4e48

Support schema generation for SqlQuery

9bc0ebe

Support schema generation for SqlQuery

b647ace

Added experimental methods

24968e5

Added experimental methods

6498cc0

zaleslaw added the enhancement New feature or request label Sep 14, 2023

zaleslaw self-assigned this Sep 14, 2023

zaleslaw linked an issue Sep 14, 2023 that may be closed by this pull request

JDBC Support #212

Closed

zaleslaw marked this pull request as ready for review September 15, 2023 06:13

zaleslaw requested review from Jolanrensen, koperagen and ermolenkodev September 15, 2023 06:13

Jolanrensen reviewed Sep 18, 2023

View reviewed changes

dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt Show resolved Hide resolved

Jolanrensen reviewed Sep 18, 2023

View reviewed changes

PaulWoitaschek reviewed Sep 22, 2023

View reviewed changes

dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/MariaDb.kt Show resolved Hide resolved

koperagen requested changes Sep 22, 2023

View reviewed changes

zaleslaw added 10 commits October 2, 2023 16:51

Merge branch 'master' into issue-212

f9499b9

Fixed Review Part 1

4a4e988

Merge remote-tracking branch 'fork/issue-212' into issue-212

af0fc0a

Fixed Review Part 1

4ac9dfd

Rename test files to match Kotlin conventions and refactor tests

f427331

Refactor readAllTables method name to readAllSqlTables

97f68f6

Enhance exception messages and add uniqueness check

3d0906b

Added buildTableMetadata method to DbType and handle JSON type issues

4e22936

Ignore test cases due to configuration issues.

f1763c5

Remove SQL reading features

42dfe6d

zaleslaw merged commit ce45ed4 into Kotlin:master Oct 6, 2023
1 check passed

This was referenced Oct 6, 2023

Support result sets #124

Closed

Add SQL database reading to documentation #464

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added JDBC-integration #451

Added JDBC-integration #451

zaleslaw commented Sep 8, 2023 •

edited

zaleslaw commented Sep 15, 2023

Jolanrensen commented Sep 18, 2023

Jolanrensen Sep 18, 2023

zaleslaw Oct 3, 2023

Jolanrensen Oct 6, 2023

Jolanrensen Oct 6, 2023

Jolanrensen Sep 18, 2023

koperagen Sep 22, 2023

Jolanrensen Oct 2, 2023

zaleslaw Oct 4, 2023

Jolanrensen Sep 18, 2023

Jolanrensen Sep 18, 2023

koperagen Sep 22, 2023

zaleslaw Oct 3, 2023

Jolanrensen Oct 6, 2023

PaulWoitaschek commented Sep 22, 2023

zaleslaw commented Sep 22, 2023

PaulWoitaschek commented Sep 22, 2023

koperagen Sep 22, 2023

zaleslaw Oct 3, 2023

zaleslaw Oct 3, 2023

Added JDBC-integration #451

Added JDBC-integration #451

Conversation

zaleslaw commented Sep 8, 2023 • edited

zaleslaw commented Sep 15, 2023

Jolanrensen commented Sep 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PaulWoitaschek commented Sep 22, 2023

zaleslaw commented Sep 22, 2023

PaulWoitaschek commented Sep 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zaleslaw commented Sep 8, 2023 •

edited