[FEATURE REQUEST] Add ADBC (Arrow Database Connectivity) Data Source

Add a native ADBC (Arrow Database Connectivity) data source to Spark, similar in spirit to the existing JDBC data source but built on the Arrow-native [ADBC](https://arrow.apache.org/adbc/) API.

ADBC is a database connectivity API standard under the Apache Arrow project. It provides a vendor-neutral, columnar alternative to JDBC/ODBC specifically designed for analytical workloads. ADBC drivers return result sets as streams of Arrow data rather than row-by-row, which eliminates expensive row-to-columnar conversions. Since spark itself is row-based, the effect is not as dramatic, but still noticeable.   

Why (now):
- There are mature native drivers for PostgreSQL, SQLite, DuckDB, Flight SQL, Snowflake, BigQuery, MySQL, SQL Server, Databricks and so on. It's also very easy to install (and locate) them on a system with [dbc](https://columnar.tech/dbc/) cli tool.
- There is now good support for invoking ADBC from Java via JNI bindings to the C++ ADBC driver manager (see [blog](https://columnar.tech/blog/adbc-java/)). This makes it practical to integrate ADBC into Spark's JVM-based architecture. Technically drivers can be implemented in java as well, but the quality of java implementations is pretty low, realistically one will almost almost use a native driver.
- ADBC fits well with spark's columnar read support in data source v2. Generating ArrowColumnVectors from adbc is pretty straightforward. It can be a benefit for external spark accelerators like comet and (presumably photon). 

I have a proof-of-concept implementation at [spark-adbc](https://github.com/tokoko/spark-adbc) that demonstrates the basic read path and not so scientific benchmarks vs jdbc. I'm willing to incrementally implement ADBC data source support upstream if there's interest from the community.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE REQUEST] Add ADBC (Arrow Database Connectivity) Data Source #54603

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE REQUEST] Add ADBC (Arrow Database Connectivity) Data Source #54603

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions