Skip to content

Latest commit

 

History

History
184 lines (164 loc) · 10.3 KB

README.md

File metadata and controls

184 lines (164 loc) · 10.3 KB

This readme explains what types of extensions there are in DuckDB and how to build them.

What are DuckDB extensions?

DuckDB extensions are libraries containing additional DuckDB functionality separate from the main codebase. These extensions can provide added functionality to DuckDB that can/should not live in DuckDB main code for various reasons. DuckDB extensions can be built in two ways. Firstly, they can be statically linked into DuckDBs executables (duckdb cli, unittest binary, benchmark runner binary, etc). Doing so will automatically make them available when using these binaries. Secondly, DuckDB has an extension loading mechanism to dynamically load extension binaries.

Extension Types

DuckDB Extensions can de divided into different types: In-tree extensions and out-of-tree extensions. These types refer to where the extensions live and who maintains them.

In-tree extensions

In-tree extensions are extensions that live in the main DuckDB repository. These extensions are considered fundamental to DuckDB and/or tie into to DuckDB so deeply that changes to DuckDB are expected to regularly break them. We aim to keep the amount of in-tree extensions to a minimum and strive to move extensions out-of-tree where possible.

Out-of-tree Extensions (OOTEs)

Out-of-tree extensions live in separate repositories outside the main DuckDB repository. The reasons for moving extensions out-of-tree can vary. Firstly, moving extensions out of the main DuckDB code-base keeps the core DuckDB code smaller and less complex. Secondly, keeping extensions out-of-tree can be useful for licensing reasons.

There are two main types of OOTEs. Firstly, there are the DuckDB Managed OOTEs. These are distributed through the main DuckDB CI. These extensions are signed using DuckDBs signing key and are maintained by the DuckDB team. Some examples are the sqlite_scanner and postgres_scanner extensions. The DuckDB Managed OOTEs are distributed automatically with every release of DuckDB. For the current list of extensions in this category check out .github/config/out_of_tree_extensions.cmake

Secondly, there are External OOTEs. Extensions in this category are not tied to the DuckDB CI, but instead their CI/CD runs in their own repository. The maintainer of the external OOTE repo is responsible for testing, distribution and making sure that an up-to-date version of the extension is available. Depending on who maintains the extension, these extensions may or may not be signed.

Building extensions

Under the hood, all types of extensions are built the same way, which is using the DuckDB's root CMakeLists.txt file as root CMake file and passing the extensions that should be build to it. DuckDB has various methods to configure which extensions to build. Additionally, we can configure for each extension how we want to build it: for example, whether to only build the loadable extension, or also link the extension in the DuckDB binaries. There's different ways to load extensions in DuckDB with various

Makefile/Cmake variables

The simplest way to specify which extensions to load is using the DUCKDB_EXTENSIONS variable. To specify which extensions to build when making duckdb set the extensions variable to a ; separated list of extensions names. For example:

DUCKDB_EXTENSIONS='json;icu' make

The DUCKDB_EXTENSIONS variable is simply passed to a CMake variable BUILD_EXTENSIONS which can also be invoked directly:

cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_EXTENSIONS='parquet;icu;tpch;tpcds;fts;json'

Makefile environment variables

Another way to specify building an extension is with the BUILD_<extension name> variables defined in the root Makefile in this repository. For example, to build the JSON extension, simply run BUILD_JSON=1 make. These Makevars should be added manually for each extension and are simply syntactic sugar around the DUCKDB_EXTENSIONS variable.

Config files

To have more control over how in-tree extensions are built, extension config files should be used. These config files are simply CMake files that are included by DuckDB's CMake build. There are 4 different places that will be searched for config files:

  1. The base configuration extension/extension_config.cmake. The extensions specified here will be built every time DuckDB is built. This configuration is always loaded.
  2. (Optional) The client specific extensions specification in tools/*/duckdb_extension_config.cmake. These config specify which extensions are built and linked into each client.
  3. (Optional) The local configuration file extension/extension_config_local.cmake This is where you would specify extensions you need included in your local/custom/dev build of DuckDB. This file is gitignored and to be created by the developer.
  4. (Optional) Additional configuration files passed to the DUCKDB_EXTENSION_CONFIGS parameter. This can be used to point DuckDB to config files stored anywhere on the machine.

DuckDB will load these config files in reverse order and ignore subsequent calls to load an extension with the same name. This allows overriding the base configuration of an extension by providing a different configuration in the local config. For example, currently the parquet extension is always statically linked into DuckDB, because of this line in extension/extension_config.cmake:

duckdb_extension_load(parquet)

Now say we want to build DuckDB with our custom parquet extension, and we also don't want to link this statically in DuckDB, but only produce the loadable binary. We can achieve this creating the extension/extension_config_local.cmake file and adding:

duckdb_extension_load(parquet
    DONT_LINK
    SOURCE_DIR /path/to/my/custom/parquet
)

Now when we run make cmake will output:

-- Building extension 'parquet' from 'path/to/my/custom/parquet'
-- Extensions built but not linked: parquet

Using extension config files

The duckdb_extension_load function is used in the configuration files to specify how an extension should be loaded. There are 3 different ways this can be done. For some examples, check out .github/config/*.cmake. These are the configurations used in DuckDBs CI to select which extensions are built.

Automatic loading

The simplest way to load an extension is just passing the extension name. This will automatically try to load the extension. Optionally, the DONT_LINK parameter can be passed to disable linking the extension into DuckDB.

duckdb_extension_load(<extension_name> (DONT_LINK))

This configuration of duckdb_extension_load will search the ./extension and ./extension_external directories for extensions and attempt to load them if possible. Note that the extension_external directory does not exist but should be created and populated with the out-of-tree extensions that should be built. Extensions based on the extension-template should work out of the box using this automatic loading when placed in the extension_external directory.

Custom path

When extensions are located in a path or their project structure is different from that the extension-template, the SOURCE_DIR and INCLUDE_DIR variables can be used to tell DuckDB how to load the extension:

duckdb_extension_load(<extension_name>
    (DONT_LINK)
    SOURCE_DIR <absolute_path_to_extension_root>
    (INCLUDE_DIR <absolute_path_to_extension_header>)
)

Remote GitHub repo

Directly installing extensions from GitHub repositories is also supported. This will download the extension to the current cmake build directory and build it from there:

duckdb_extension_load(postgres_scanner
    (DONT_LINK)
    GIT_URL https://github.com/duckdb/postgres_scanner
    GIT_TAG cd043b49cdc9e0d3752535b8333c9433e1007a48
)

Explicitly disabling extensions

Because the sometimes you may want to override extensions set by other configurations, explicitly disabling extensions is also possible using the DONT_BUILD flag. This will disable the extension from being built all together. For example, to build DuckDB without the parquet extension which is enabled by default, in extension/extension_config_local.cmake specify:

duckdb_extension_load(parquet DONT_BUILD)

Note that this can also be done from the Makefile:

DUCKDB_EXTENSIONS='tpch;json' SKIP_EXTENSIONS=parquet make

results in:

...
-- Building extension 'tpch' from '/Users/sam/Development/duckdb/extensions'
-- Building extension 'json' from '/Users/sam/Development/duckdb/extensions'
-- Extensions linked into DuckDB: tpch, json
-- Extensions explicitly skipped: parquet
...

VCPKG dependency management

DuckDB extensions can use VCPKG to manage their dependencies. Check out the Extension Template for an example on how to set up vcpkg in extensions.

Building DuckDB with multiple extensions that use vcpkg

To build duckdb with multiple extensions that all use vcpkg, some extra steps are required. This is due to the fact that each extension will specify their own vcpkg.json manifest for their dependencies, but vcpkg allows only a single manifest. The workaround here is to merge the dependencies from the manifests of all extensions being built. This repo contains a script to do automatically perform this merge.

Example build with 2 extensions using vcpkg

For example, lets say we want to create a DuckDB binary which has two extensions statically linked that each use vcpkg. The first step is to add the two extensions to extension/extension_config_local.cmake:

duckdb_extension_load(extension_1
    GIT_URL https://github.com/example/extension_1
    GIT_TAG some_git_hash
)
duckdb_extension_load(extension_2
    GIT_URL https://github.com/example/extension_2
    GIT_TAG some_git_hash
)

Now to merge the vcpkg.json manifests from these two extension run:

make extension_configuration

This will create a merged manifest in ./build/extension_configuration/vcpkg.json.

Next, run:

USE_MERGED_VCPKG_MANIFEST=1 VCPKG_TOOLCHAIN_PATH="/path/to/your/vcpkg/installation" make

which will use the merged manifest to install all required dependencies, build extension_1 and extension_2, build DuckDB, and finally link both extensions into DuckDB.