## DuckDB and dataframe-jdbc

[DuckDB](https://duckdb.org/), for now, is an officially unsupported JDBC source for DataFrame.
However, a source being officially unsupported does not mean that it is not possible to use it with DataFrame :).

For DuckDB, we actually have two ways to support it:

The first one is to use `dataframe-arrow`, as DuckDB has a [bridge to Arrow](https://duckdb.org/docs/stable/clients/java.html#arrow-methods). [Click here for en example of how to do it](https://github.com/Kotlin/dataframe/blob/62b48942b1aef35f939f2d0aff407872028fd177/dataframe-arrow/src/test/kotlin/org/jetbrains/kotlinx/dataframe/io/ArrowKtTest.kt#L637). This might work for you, but it is less flexible, as you're forced to use Arrow types.

The second way, which we will show here, is to use `dataframe-jdbc`, as DuckDB has a [JDBC driver](https://duckdb.org/docs/stable/clients/java.html) too.
This is a more direct approach and shows how you can add other JDBC sources as well, so without any further ado, let's get started!

We're adding the DuckDB dependency before adding DataFrame to make sure classloading works well

In [9]:
USE {
    dependencies {
        implementation(group = "org.duckdb", artifact = "duckdb_jdbc", version = "1.2.2.0")
    }
}

And to make sure we use the latest version of DataFrame, we'll use the `%useLatestDescriptors` magic.

In [10]:
%useLatestDescriptors
%use dataframe

Following the [custom JDBC database documentation](https://kotlin.github.io/dataframe/readsqlfromcustomdatabase.html), it seems we need to create our own `DbType` object.

We will skip the `convertSqlTypeToKType` and `convertSqlTypeToColumnSchemaValue` functions for now. Let's see how the defaults fare.

In [11]:
import org.jetbrains.kotlinx.dataframe.io.db.DbType
import org.jetbrains.kotlinx.dataframe.io.getSchemaForAllSqlTables
import org.jetbrains.kotlinx.dataframe.io.readAllSqlTables
import org.jetbrains.kotlinx.dataframe.schema.ColumnSchema
import java.sql.ResultSet
import kotlin.reflect.KType

object DuckDb1 : DbType("duckdb") {

    /** the name of the class of the DuckDB JDBC driver */
    override val driverClassName = "org.duckdb.DuckDBDriver"

    /**
     * TODO
     * How a column type from JDBC, [tableColumnMetadata], is read in Java/Kotlin.
     * The returned type must exactly follow [ResultSet.getObject] of your specific database's JDBC driver.
     * Returning `null` defer the implementation to the default one (which may not always be correct).
     */
    override fun convertSqlTypeToKType(tableColumnMetadata: TableColumnMetadata): KType? = null

    /**
     * TODO
     * How a column from JDBC should be represented as DataFrame (value) column
     * See [convertSqlTypeToKType].
     */
    override fun convertSqlTypeToColumnSchemaValue(tableColumnMetadata: TableColumnMetadata): ColumnSchema? = null

    /** How to filter out system tables from user-created ones when using [DataFrame.readAllSqlTables] and [DataFrame.getSchemaForAllSqlTables]. */
    override fun isSystemTable(tableMetadata: TableMetadata): Boolean =
        tableMetadata.schemaName?.lowercase()?.contains("information_schema") == true ||
            tableMetadata.schemaName?.lowercase()?.contains("system") == true ||
            tableMetadata.name.lowercase().contains("system_")

    /** How to retrieve the correct table metadata when using [DataFrame.readAllSqlTables] and [DataFrame.getSchemaForAllSqlTables]. */
    override fun buildTableMetadata(tables: ResultSet): TableMetadata =
        TableMetadata(
            name = tables.getString("TABLE_NAME"),
            schemaName = tables.getString("TABLE_SCHEM"),
            catalogue = tables.getString("TABLE_CAT"),
        )
}

In [12]:
val URL = "jdbc:duckdb:"

In [5]:
import java.sql.DriverManager

DriverManager.getConnection(URL).use { connection ->
    connection.prepareStatement(
        """
        CREATE TABLE IF NOT EXISTS test_table (
            id INTEGER PRIMARY KEY,
            name VARCHAR,
            age INTEGER,
            salary DOUBLE,
            hire_date DATE
        )
        """.trimIndent(),
    ).executeUpdate()

    connection.prepareStatement(
        """
        INSERT INTO test_table (id, name, age, salary, hire_date)
        VALUES
            (1, 'John Doe', 30, 50000.00, '2020-01-15'),
            (2, 'Jane Smith', 28, 55000.00, '2021-03-20'),
            (3, 'Bob Johnson', 35, 65000.00, '2019-11-10'),
            (4, 'Alice Brown', 32, 60000.00, '2020-07-01')
        """.trimIndent(),
    ).executeUpdate()

    DataFrame.readAllSqlTables(connection, dbType = DuckDb1)
//    DataFrame.readSqlTable(connection, "test_table", dbType = DuckDb1)
}

{}

In [6]:
val config = DbConnectionConfig(URL)

DataFrame.readAllSqlTables(config, dbType = DuckDb1)

{}