Skip to content

[FEATURE REQUEST] Add ADBC (Arrow Database Connectivity) Data Source #54603

@tokoko

Description

@tokoko

Add a native ADBC (Arrow Database Connectivity) data source to Spark, similar in spirit to the existing JDBC data source but built on the Arrow-native ADBC API.

ADBC is a database connectivity API standard under the Apache Arrow project. It provides a vendor-neutral, columnar alternative to JDBC/ODBC specifically designed for analytical workloads. ADBC drivers return result sets as streams of Arrow data rather than row-by-row, which eliminates expensive row-to-columnar conversions. Since spark itself is row-based, the effect is not as dramatic, but still noticeable.

Why (now):

  • There are mature native drivers for PostgreSQL, SQLite, DuckDB, Flight SQL, Snowflake, BigQuery, MySQL, SQL Server, Databricks and so on. It's also very easy to install (and locate) them on a system with dbc cli tool.
  • There is now good support for invoking ADBC from Java via JNI bindings to the C++ ADBC driver manager (see blog). This makes it practical to integrate ADBC into Spark's JVM-based architecture. Technically drivers can be implemented in java as well, but the quality of java implementations is pretty low, realistically one will almost almost use a native driver.
  • ADBC fits well with spark's columnar read support in data source v2. Generating ArrowColumnVectors from adbc is pretty straightforward. It can be a benefit for external spark accelerators like comet and (presumably photon).

I have a proof-of-concept implementation at spark-adbc that demonstrates the basic read path and not so scientific benchmarks vs jdbc. I'm willing to incrementally implement ADBC data source support upstream if there's interest from the community.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions