feat(c/driver/hologres): add Hologres ADBC driver by TimothyDing · Pull Request #4266 · apache/arrow-adbc

TimothyDing · 2026-04-22T23:47:37Z

Summary

Add a new ADBC driver for Hologres, Alibaba Cloud's real-time data warehouse service built on PostgreSQL. This driver enables high-performance columnar data access to Hologres through the standard ADBC interface.

Components

C Driver (c/driver/hologres/) — ~20K lines of new code

HologresDatabase: Connection management with automatic Hologres/PostgreSQL version detection and type resolver initialization
HologresConnection: Full ADBC metadata API (GetInfo, GetObjects, GetTableSchema, GetTableTypes, GetStatistics)
HologresStatement: Query execution, parameterized queries, and two bulk ingestion paths:
- COPY mode (default): Standard PostgreSQL COPY FROM STDIN binary protocol
- Stage mode: Hologres-native stage-based ingestion via Arrow IPC upload with configurable concurrency, batch sizing, and file targeting
ArrowCopyReader: Reads query results via COPY TO STDOUT in Arrow IPC format (arrow or arrow_lz4), bypassing row-by-row binary parsing for significantly better read performance
TupleReader: Reads query results via standard PostgreSQL binary COPY TO STDOUT with nanoarrow-based batch assembly
ON_CONFLICT support: IGNORE (skip conflicts) and UPDATE (upsert) modes for both COPY and Stage ingestion
Automatic application_name tagging (adbc_hologres_<version>) for server-side observability

Hologres-specific data type support:

Standard PostgreSQL types: bool, int2/4/8, float4/8, numeric, text, bytea, date, time, timestamp, timestamptz, interval, uuid
Array types: int2[], int4[], int8[], float4[], float8[], bool[], text[], bytea[]
Extended types: JSON, JSONB (with version byte prefix), CHAR(n), VARCHAR(n), roaringbitmap
Type conversions for Stage mode: timestamptz, large_binary, large_string

Vendored dependency (c/vendor/nanoarrow/) — nanoarrow IPC

Vendored nanoarrow IPC reader/writer and flatcc runtime for Arrow IPC serialization/deserialization, used by both the ArrowCopyReader (reading Arrow IPC from COPY protocol) and StageWriter (serializing Arrow batches for Stage upload)

Python package (python/adbc_driver_hologres/) — ~3.6K lines

adbc_driver_hologres: Python bindings with DBAPI 2.0 support via adbc_driver_manager
Enums: HologresOnConflict, HologresIngestMode, StatementOptions
Integration tests covering COPY and Stage modes across all supported types
ASV benchmark suites for read/write performance profiling

Build system:

CMake integration with ADBC_DRIVER_HOLOGRES option
pkg-config support (adbc-driver-hologres.pc)
Python setuptools with shared library bundling

Key design decisions

Forked from PostgreSQL driver: Core PostgreSQL utilities (postgres_type.h, copy/reader.h, copy/writer.h, etc.) are copied into the Hologres driver rather than shared, to allow independent evolution for Hologres-specific type handling (JSONB version byte, roaringbitmap, etc.)
Default COPY read format is arrow_lz4: Hologres supports native Arrow IPC output in its COPY protocol. The arrow_lz4 format avoids row-by-row binary parsing and leverages LZ4 compression, providing better throughput for analytical queries. Falls back to standard binary format via adbc.hologres.copy_format option.
Stage ingestion for large datasets: The Stage writer serializes Arrow batches into IPC format, uploads them via dedicated FixedFE connections with configurable concurrency (default: 4 threads), and commits atomically. This path is optimized for bulk loading scenarios where COPY throughput is insufficient.

Testing

C unit tests (~8.6K lines): Comprehensive coverage for all modules — database, connection, statement, COPY reader/writer, Arrow COPY reader, Stage writer, bind stream, error handling, PostgreSQL type resolver, and utility functions
Python integration tests (~2.2K lines): End-to-end tests covering DBAPI 2.0 compliance, COPY/Stage ingestion for all supported types, ON_CONFLICT modes, and edge cases
Python benchmarks: ASV benchmark suites for read (binary, arrow, arrow_lz4) and write (COPY, Stage) performance at various row counts (1K–10M)

Configuration options

Option	Values	Default	Description
`adbc.hologres.copy_format`	`binary`, `arrow`, `arrow_lz4`	`arrow_lz4`	COPY TO STDOUT read format
`adbc.hologres.ingest_mode`	`copy`, `stage`	`copy`	Bulk ingestion method
`adbc.hologres.use_copy`	`true`, `false`	`true`	Enable COPY optimization for ingestion
`adbc.hologres.on_conflict`	`none`, `ignore`, `update`	`none`	Conflict resolution for ingestion
`adbc.hologres.batch_size_hint_bytes`	integer	`16777216`	Target batch size hint for reads

Test plan

C unit tests pass: cd build && ctest --test-dir . -R hologres
Python integration tests pass against a live Hologres instance: cd python/adbc_driver_hologres && pytest tests/
Build succeeds with -DADBC_DRIVER_HOLOGRES=ON
Python package installs and connects successfully

Add a new independent ADBC driver for Hologres with stub implementations of Database, Connection, and Statement classes. The driver compiles as a standalone library (adbc_driver_hologres) without modifying the existing PostgreSQL driver. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…river Copy error handling, result helpers, type system, bind stream, result reader, and COPY protocol files from the PostgreSQL driver. These files retain the adbcpq namespace and are compiled as part of the Hologres driver library to avoid modifying the PostgreSQL driver. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ction Add real connection management (PGconn), Hologres version detection via SELECT hg_version(), PostgreSQL type resolver, and MakeFixedFeUri() for Stage mode FixedFE connections. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ueries Implement the full HologresConnection class with Hologres-specific behavior: - GetInfo returns Hologres vendor name and parsed version - Commit/Rollback return NOT_IMPLEMENTED (Hologres is always autocommit) - SetOption rejects disabling autocommit - GetObjects/GetTableSchema/GetTableTypes use PG-compatible system catalog queries - HologresGetObjectsHelper for metadata enumeration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tion and COPY ingest Complete HologresStatement implementation: - TupleReader for streaming query results via COPY protocol - SQL query execution with COPY and PqResultArrayReader paths - Parameter binding (Bind/BindStream) and ExecuteBind - ExecuteSchema for schema inference - Bulk ingest via COPY FROM STDIN with STREAM_MODE and ON_CONFLICT - CreateBulkTable with CREATE/APPEND/REPLACE/CREATE_APPEND modes - Hologres-specific options: on_conflict, ingest_mode, batch_size_hint - OnConflictMode and HologresIngestMethod enums Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implements Hologres internal Stage ingestion pathway: - StageConnection: thread-safe libpq wrapper for Stage COPY operations - StageWriter: parallel Arrow IPC upload with FSL→LIST conversion - ExecuteIngestStage: orchestrates FixedFE + regular FE connections - Vendor nanoarrow IPC support (flatcc + IPC encoder/decoder) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

70 tests covering BufferQueue thread safety, Stage create/drop/upload, Arrow IPC serialization (int64, string, boolean, date32, binary, list, FSL→LIST conversion with slicing), mock-based ingestion flow, and Hologres option enums. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…iver Provides adbc-driver-hologres Python package with: - HologresOnConflict and HologresIngestMode enums - StatementOptions for driver-specific configuration - connect() function for low-level ADBC access - DBAPI 2.0 compatible interface via dbapi module Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add 7 new test files covering database, connection, statement, postgres_util, postgres_type, copy reader, and error modules (~170 tests). Tests cover pure functions, option handling, type mapping, and COPY binary parsing without requiring a live database connection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add ASV benchmarks for query fetch and bulk ingestion performance, comparing ADBC (COPY/Stage modes with ON_CONFLICT variants) against asyncpg, psycopg2, and DuckDB. Includes vector ingestion benchmarks for high-dimensional FLOAT4[] data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…e ON_CONFLICT UPDATE Add 32 integration tests covering connection metadata, queries, COPY/Stage ingestion, ON_CONFLICT modes, batch size hints, and statistics. Tests run against a live Hologres instance via ADBC_HOLOGRES_TEST_URI. Fix a bug in Stage mode where ON_CONFLICT UPDATE was not implemented: InsertFromStage now queries pg_index for primary key columns and generates the proper ON CONFLICT (pk) DO UPDATE SET clause. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…large_binary, large_string Hologres Stage/EXTERNAL_FILES does not natively support Arrow timestamp[us, tz=*], large_binary, or large_string types. Add conversion logic that transforms these types before IPC serialization and restores them afterward: - TIMESTAMPTZ: timestamp[us, tz=*] → date64[ms] (divide microseconds by 1000) - BYTEA: large_binary → binary (narrow int64 offsets to int32) - TEXT: large_string → string/utf8 (narrow int64 offsets to int32) Also add comprehensive integration tests covering temporal, numeric, JSON, binary, list, dictionary, and operational scenarios for both COPY and Stage ingest modes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… COPY and Stage modes Add test_ingest_list_float4 for COPY mode and test_stage_list_float_types for Stage mode to cover FLOAT4[] array type round-trip, complementing the existing FLOAT8[] tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The COPY writer was writing JSONB values as plain strings, but the PostgreSQL binary protocol requires a 0x01 version byte prefix for JSONB. This caused "unsupported jsonb version number 123" errors (the first JSON character '{' = 0x7B being read as version byte). Add PostgresCopyJsonbFieldWriter that prepends the 0x01 version byte, and modify MakeCopyFieldWriter to accept PostgresType so it can select the JSONB writer when the target column type is kJsonb. The target table column types are resolved via pg_attribute before entering COPY mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… 77.0.1 setuptools 77.0.0 was yanked from PyPI, causing ASV environment creation to fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…on> on connect Automatically inject application_name=adbc_hologres_<version> into the connection URI so Hologres can identify ADBC driver connections in pg_stat_activity. User-specified application_name is preserved. Also fix ADBC_INFO_DRIVER_VERSION to return the actual version instead of "unknown". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix strncmp(sqlstate, "42", 0) always matching (should be length 2) - Fix SQL injection in Stage PK query by using PQexecParams - Add null checks for PQescapeIdentifier return values - Add Hologres >= 4.1 version gate for Stage ingestion mode - Ensure DropStage cleanup on all error paths after CreateStage - Remove redundant 3x pg_type query execution in RebuildTypeResolver - Make open_connections_ atomic to prevent data races Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…aringbitmap in Stage mode Query actual column types from pg_catalog.pg_attribute using format_type() and rebuild the EXTERNAL_FILES AS clause with correct type declarations. For types that EXTERNAL_FILES cannot auto-cast (json, jsonb, roaringbitmap), use explicit SELECT casts instead of SELECT *. Also fix std::atomic<int32_t> copy issue in Database::Release() by adding .load() for variadic printf args. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…4 formats Add ArrowCopyReader that decodes Hologres Arrow IPC streams wrapped in PG binary COPY framing, supporting both uncompressed (arrow) and LZ4-block-compressed (arrow_lz4) formats. Key Hologres compatibility workarounds: bypass flatcc verification for pre-1.0 IPC messages, ignore false LZ4_FRAME body compression declarations in RecordBatch metadata, and add LZ4 block fallback in nanoarrow codecs for implementations that report LZ4_FRAME but send block-compressed data. Includes parameterized ExecuteCopy, CopyFormat statement option, JSONB rejection for arrow formats, and 9 integration tests covering 23 types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…at benchmarks Add time_pandas_adbc_arrow and time_pandas_adbc_arrow_lz4 benchmarks to HologresBenchmarkBase, enabling side-by-side read performance comparison of binary, arrow, and arrow_lz4 COPY formats across OneColumn and MultiColumn suites. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add ~120 offline unit tests across all testable modules to achieve 90%+ coverage. New test files: copy_writer_test.cc (55 tests for all COPY writer types), arrow_copy_reader_test.cc (26 tests for IPC decoding, LZ4 decompression, and stream trampolines), bind_stream_test.cc (15 tests for bind/iterate lifecycle). Extend existing tests for statement, connection, database, error, copy reader, and postgres_type modules. Extract shared MockTypeResolver into test_util.h. Fix null pointer dereference in ArrowCopyReader::ReleaseTrampoline when called with self=nullptr. Fix incorrect test expectation for PostgreSQL interval type: Arrow format should be "tin" (interval_month_day_nano), not "tDn" (duration_nanosecond). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… benchmarks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ion, and improve RAII safety - Fix O(n^2) digit insertion in Decimal COPY writer by using push_back + reverse - Eliminate per-row heap allocations in numeric serialization via member variable reuse - Add upfront ArrowBufferReserve in tuple writer and list writer for fewer allocation checks - Add digits_.reserve() and early-return for special numeric values in COPY reader - Replace memmove+resize pattern in ArrowCopyReader::Compact() - Add scope guards to SerializeArrayToIpcBuffer for safer multi-resource cleanup - Extract GetCurrentSchema() helper to replace 3 duplicate query blocks - Add PqEscapedString RAII wrapper to replace 10 manual PQescapeIdentifier+PQfreemem sites - Extract JoinUploadThreads() to replace 12 duplicate thread-join loops in stage_writer - Consolidate MockTypeResolver, CopyReaderTester, CopyWriterTester into test_util.h - Unify PG COPY binary signature constant via copy_common.h - Remove redundant HologresVersion() in favor of VendorVersion() - Name magic numbers in HologresStageConfig with constexpr constants - Replace stringstream with string_view loop in ParseTextArray - Promote is_null_param to BindStream member to avoid per-row vector allocation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…s not exist" errors Decouple table creation (DDL) from data generation in ingest benchmark suites by introducing setup_cache(). Previously, setup() generated potentially huge vector data before creating tables — if data generation failed (OOM/memory pressure), tables were never created, and teardown() dropped all tables after each benchmark method, compounding the issue. Now setup_cache() creates all empty tables once upfront, setup() only handles data generation and connections, and teardown() frees memory via gc.collect() instead of dropping tables. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…nd add comprehensive unit tests Add JSONB support for Arrow/Arrow_LZ4 COPY read path by wrapping queries to cast JSONB columns to TEXT, matching the approach used by Java holo-client. Also add comprehensive unit tests across multiple modules to improve coverage. Key changes: - Add BuildJsonbWrapperQuery() to construct wrapper queries that cast JSONB columns to ::text with proper identifier escaping - Replace JSONB blocking logic in ExecuteQuery with transparent cast wrapping - Add result_helper_test.cc (22 tests for PqRecord parsing) - Expand connection_test.cc, copy_test.cc, postgres_type_test.cc, postgres_util_test.cc, and statement_test.cc with additional test cases Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…row counts Skip TEXT[] at 10M rows in HologresMultiColumnSuite to prevent OOM that kills the setup_cache subprocess and skips all benchmarks. Limit COPY binary vector ingest benchmarks to 1M rows max since COPY builds the entire data stream in memory; Stage mode benchmarks remain at 10M rows. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…to arrow_lz4 Benchmarks show arrow_lz4 outperforms binary in nearly all read scenarios (e.g. 10M-row single INT column: 358ms vs 2.72s, ~7.6x faster), while LZ4 compression also reduces network transfer size. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(c/driver/hologres): add Hologres ADBC driver

Add comprehensive documentation covering architecture, development workflow, testing, features, configuration reference, usage examples across Python/Java/C, known limitations, and release process. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

metegenez · 2026-04-23T09:04:42Z

do you use this library internally already? maybe it can help the reviewers if it is a bit battletested.

xborder · 2026-04-23T10:20:12Z

assume this is a question from someone that doesn't know the database. Isn't Hologres PSQL compliant?
Is there anything specific that requires a new driver?

TimothyDing · 2026-04-23T13:32:47Z

assume this is a question from someone that doesn't know the database. Isn't Hologres PSQL compliant?
Is there anything specific that requires a new driver?

Hi， @xborder
Nice Question! Although Hologres is compatible with the PostgreSQL ecosystem, we've implemented many Arrow-related features. For instance, we support Arrow format (and compressed Arrow) during COPY OUT operations, as well as a Snowflake-like stage mode for imports. Since these capabilities are not supported by standard PostgreSQL, I created a dedicated driver based on the PostgreSQL code."

TimothyDing · 2026-04-23T13:34:40Z

do you use this library internally already? maybe it can help the reviewers if it is a bit battletested.

Hi， @metegenez
We do have an official Java SDK, but its integration with the Arrow ecosystem is somewhat limited. I stumbled upon ADBC (an official library) the other day and got really interested! We are definitely looking to embrace the Arrow ecosystem!

metegenez · 2026-04-23T19:09:17Z

do you use this library internally already? maybe it can help the reviewers if it is a bit battletested.

Hi， @metegenez We do have an official Java SDK, but its integration with the Arrow ecosystem is somewhat limited. I stumbled upon ADBC (an official library) the other day and got really interested! We are definitely looking to embrace the Arrow ecosystem!

Good to hear. We are doing arrow work at Huawei, too. Would love to help on the review but im still learning ADBC internals myself. Good luck with the PR!

Btw, is this an open source DB or closed source like GaussDB?

TimothyDing · 2026-04-24T00:13:49Z

Good to hear. We are doing arrow work at Huawei, too. Would love to help on the review but im still learning ADBC internals myself. Good luck with the PR!

Btw, is this an open source DB or closed source like GaussDB?

Nice to meet you! @metegenez ，my email is ding_ye_timo@163.com
Hologres is a data warehousing product from Alibaba Cloud (similar to Snowflake or Apache Doris), and it is a closed-source product. We have a wide range of commercial customers, including Kering, LVMH, and Volkswagen.

TimothyHologres · 2026-04-27T04:15:26Z

@lidavidm Could you help me to review it?

TimothyDing and others added 30 commits April 18, 2026 22:22

bench(python/adbc_driver_hologres): add ASV benchmark configuration

5f3dac6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

build(python/adbc_driver_hologres): bump setuptools requirement to >=…

f61a826

… 77.0.1 setuptools 77.0.0 was yanked from PyPI, causing ASV environment creation to fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

bench(python/adbc_driver_hologres): add 10M row count to read benchmarks

e775e97

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

bench(python/adbc_driver_hologres): add TEXT and TEXT[] types to read…

464e31a

… benchmarks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge pull request #1 from TimothyDing/feature/add_hologres_driver

98cbc44

feat(c/driver/hologres): add Hologres ADBC driver

TimothyDing requested a review from lidavidm as a code owner April 22, 2026 23:47

Merge branch 'apache:main' into main

0baff18

TimothyDing closed this Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(c/driver/hologres): add Hologres ADBC driver#4266

feat(c/driver/hologres): add Hologres ADBC driver#4266
TimothyDing wants to merge 32 commits into
apache:mainfrom
TimothyDing:main

TimothyDing commented Apr 22, 2026

Uh oh!

metegenez commented Apr 23, 2026

Uh oh!

xborder commented Apr 23, 2026

Uh oh!

TimothyDing commented Apr 23, 2026 •

edited

Loading

Uh oh!

TimothyDing commented Apr 23, 2026 •

edited

Loading

Uh oh!

metegenez commented Apr 23, 2026

Uh oh!

TimothyDing commented Apr 24, 2026 •

edited

Loading

Uh oh!

TimothyHologres commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

TimothyDing commented Apr 22, 2026

Summary

Components

Key design decisions

Testing

Configuration options

Test plan

Uh oh!

metegenez commented Apr 23, 2026

Uh oh!

xborder commented Apr 23, 2026

Uh oh!

TimothyDing commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TimothyDing commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

metegenez commented Apr 23, 2026

Uh oh!

TimothyDing commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TimothyHologres commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TimothyDing commented Apr 23, 2026 •

edited

Loading

TimothyDing commented Apr 23, 2026 •

edited

Loading

TimothyDing commented Apr 24, 2026 •

edited

Loading