VGI (Vector Gateway Interface) is an Apache Arrow–based protocol for extending DuckDB using any language — no C++/C/Rust/Zig and no compilation or linking required.
This repository contains the DuckDB extension (the C++ side of VGI). It loads into
DuckDB and lets you ATTACH a worker — a program written in Python, TypeScript, Go, or
anything that speaks the protocol — and then call the scalar, table, and aggregate
functions that worker exposes as if they were native DuckDB functions. Data moves between
DuckDB and the worker over Apache Arrow IPC, across a subprocess pipe, an HTTP connection,
or a Unix domain socket.
The reference worker SDK is vgi-python — pip install vgi.
Created by Query.Farm.
| Traditional DuckDB extensions | VGI workers |
|---|---|
| C/C++ compilation required | Any language — write a script or ship a binary |
| Tied to a specific DuckDB version | Version independent |
| Complex build / release cycle | Ship a script or executable |
| Runs in-process | Process isolation |
| Single-threaded by default | Parallel pooled workers |
Use cases: call REST APIs from SQL, run ML inference (PyTorch, scikit-learn), process data with Python/pandas/numpy, build custom ETL transforms, or expose external data sources as queryable tables and views.
Install the extension and attach a worker:
-- First time only
INSTALL vgi FROM community;
LOAD vgi;
-- Attach a worker as a catalog
ATTACH 'my_funcs' (TYPE vgi, LOCATION './my_worker.py');
-- Call the functions it exposes
SELECT upper_case(name) FROM users;
SELECT * FROM my_table_function('arg');A minimal Python worker (using vgi-python):
# my_worker.py
from typing import Annotated
import pyarrow as pa
import pyarrow.compute as pc
from vgi import ScalarFunction, Param, Returns, Worker
class UpperCase(ScalarFunction):
"""Convert string values to uppercase."""
@classmethod
def compute(
cls,
value: Annotated[pa.StringArray, Param(doc="String value to uppercase")],
) -> Annotated[pa.StringArray, Returns()]:
return pc.utf8_upper(value)
class MyWorker(Worker):
catalog_name = "my_funcs"
functions = [UpperCase]
if __name__ == "__main__":
MyWorker().run()- Function shapes — scalar, table, table-in-out (streaming), buffered table (see-every-row-before-output), aggregate, and windowed aggregate functions, all defined in the worker and surfaced as native DuckDB functions.
- Full catalog integration — workers expose schemas, tables, views, and functions; DuckDB lazily loads catalog metadata, column statistics, and supports multi-branch (UNION-ALL) tables.
- Multiple transports —
LOCATIONaccepts a bare command (subprocess, pooled per DuckDB process),http(s)://(HTTP),unix:///path/to.sock(AF_UNIX), orlaunch:<argv>(launcher-managed shared worker). - Pushdown — projection, filter,
ORDER BY+LIMIT, and join-key pushdown to workers that opt in. - Worker pooling — subprocess workers are pooled and reused across queries, with
diagnostics (
vgi_worker_subprocess_pool(),vgi_worker_pool_stats()). - Performance — optional POSIX shared-memory transport for zero-copy batch transfer.
- Auth — per-catalog OAuth / bearer tokens; OIDC identity introspection via
vgi_catalog_identity().
See the docs/ directory for deep dives on
multi-branch tables,
the launcher protocol,
catalog profiling, and more.
This extension uses VCPKG for dependency management and is built with multiple modules,
so set USE_MERGED_VCPKG_MANIFEST=1. The Makefile auto-detects the VCPKG toolchain from
vcpkg/ in the project tree.
git clone --recurse-submodules https://github.com/Query-farm/vgi.git
cd vgi
# Debug build
USE_MERGED_VCPKG_MANIFEST=1 GEN=ninja make debug
# Release build
USE_MERGED_VCPKG_MANIFEST=1 GEN=ninja make releaseInstalling ccache and ninja is strongly recommended for fast incremental rebuilds.
The build produces:
./build/release/duckdb # DuckDB shell with the extension preloaded
./build/release/test/unittest # test runner
./build/release/extension/vgi/vgi.duckdb_extension # the loadable extensionThe extension supports two transports; subprocess is the faster default. Tests prefer the release build:
# Subprocess transport (default, faster)
make test_subprocess
# HTTP transport
make test_http
# Both
make test_allThe VGI_TEST_WORKER environment variable controls which worker is used and defaults to
the vgi-python fixture worker. See the project root CLAUDE.md for the full matrix of
test targets, debug builds, and environment variables.
This project is licensed under the Query Farm Source-Available License, Version 1.0
(Licensor: Query Farm LLC). Non-production use is freely permitted, as is production use —
except for offering a Competing Offering or operating a Commercial Marketplace built on
the Licensed Work, which require a separate commercial license. Each version converts to
Apache 2.0 on its Change Date (the tenth anniversary of that version's first public
release). See LICENSE for the full terms.
For commercial or custom licensing, contact hello@query.farm.