Skip to content

Autoloading mechanism for extensions #8732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 105 commits into from
Sep 1, 2023

Conversation

samansmink
Copy link
Contributor

Basics

This PR introduces a first version of an autoloading mechanism to DuckDB extensions. This can remove the need to run install/load <extension_name> in many use-cases.


There are 2 main settings controlling the mechanism: autoload_known_extensions and autoinstall_known_extensions. Both these settings default to true for the following builds:

  • CLI on win/osx/linux
  • Python/R/Node/Java

Note that the defaults for these settings are still behind a flag, since the default when compiling duckdb manually should probably be false.

How to use

With these settings on, DuckDB will now attempt to automatically install and load an extension when a query requires it without it being loaded already. So for example instead of:

INSTALL httpfs; LOAD httpfs;
INSTALL json; LOAD json;
SELECT * FROM read_json_auto('https://duckdb.org/some_file.json');

you can now just do

SELECT * FROM read_json_auto('https://duckdb.org/some_file.json');

And under the hood, DuckDB would install and load the httpfs and json extension.

Note that you could also disable autoinstall_known_extensions: then just a single install is required and then all consequent calls to functions using that extension will automatically load the extension from ~/.duckdb/extensions

What can trigger autoloading?

There are many types of queries that can now trigger an autoload/install:

  • Using a type
  • Using a (Table/Scalar/Aggregate/Pragma/Copy) function
  • Using a setting
  • Reading a file with a specific prefix
  • Replacement scans

The mapping of these functions is currently hardcoded in the extension_entries.hpp header file. For example, one of the maps in extension_entries.hpp is currently:

...
static constexpr ExtensionEntry EXTENSION_COPY_FUNCTIONS[] = {
	{"parquet", "parquet"},
	{"json", "json"}
};
...

When a catalog lookup for a copy function now fails, it will look in this map to see if there is an extension that contains a copy function with that name. Note that this mechanism already previously exists, but then only to throw informed errors in some cases.

Notable yak shavings

INSTALL x FROM y

Firstly, a new syntax was added that allows specifying the endpoint from which to install an extension from. This was already previously possible using set custom_extension_repository='.....'; but that can now be simplified using: INSTALL <ext_name> FROM 'http://url.com'

Installing extensions from local directories

Extension repositories can now both be remote http server and a local directory, this is used for testing, but we can expand it to later support using httpfs for extensions.

Centralized config for builtin extensions

In .github/config/bundled_extensions.cmake we now specify which extensions are bundled. The idea is to create a central config that specifies which extensions are included in the binary distributions. Note that some extensions are built but not linked, this is because they are used during testing of the to be distributed binaries.

Improved scripts/run_tests_one_by_one.py

This now prints the number of assertions that are made in each test. This is crucial to confirm that the test you think runs in CI, also actually runs:

[264/520]: test/sql/copy/s3/url_encode.test (0 assertions)
[265/520]: test/sql/create/create_database.test (3 assertions)
[266/520]: test/sql/create/create_table_compression.test (126 assertions)

How is this all tested?

As this is quite a invasive feature, its important to test this well. To achieve that, we basically run all SQLLogicTests in autoloading mode. An additional test run is launched in .github/actions/build_extensions/action.yml. What this does is:

  • Copy all built extensions to a local folder in the extension repository structure (./duckdb_version/duckdb_arch/name.duckdb_extension)
  • Rebuild duckdb without any extension linked, but with all tests included (so also for the out of tree extensions)
  • A list of tests is created that includes all SQLLogic tests with a require <> statement together with some autoloading specific tests
  • The unittester is now run in autoloading mode. This means that the require statement no longer loads the extension, instead it only check if the extension is autoloadable, if not the test is skipped
  • Now the tests are run printing the assertions made for each test.

TODOs

Adding out of tree extensions

Well since this initial PR only adds the in-tree extensions to this mechanism, we can also start adding out-of-tree extensions now. All infrastructure should be there to do easily: When an extension is added and marked as autoloadable to the autoloading entries in extension_entries.hpp, it should be immediataly autoloadable. Also, since the autoloading test step in CI also loads the tests for out-of-tree extensions, it should be automatically be tested as autoloadable by running its tests.

Making ICU and INET autoloadable

These two in-tree extensions are currently not yet autoloadable: The problem is that to autoload an extension, the extension's init function can not start a transaction. Instead it should use the ExtensionUtils register functions as these use the system transaction making them available immediately in the transaction triggering the autoload. For ICU and INET however, this is currently not possible yet.

@samansmink samansmink marked this pull request as ready for review August 31, 2023 10:02
@github-actions github-actions bot marked this pull request as draft August 31, 2023 12:33
@samansmink samansmink marked this pull request as ready for review August 31, 2023 12:49
@github-actions github-actions bot marked this pull request as draft August 31, 2023 13:55
@samansmink samansmink marked this pull request as ready for review August 31, 2023 13:55
@github-actions github-actions bot marked this pull request as draft August 31, 2023 21:55
@samansmink samansmink marked this pull request as ready for review August 31, 2023 21:55
@github-actions github-actions bot marked this pull request as draft September 1, 2023 09:24
@samansmink samansmink marked this pull request as ready for review September 1, 2023 09:28
@Mytherin Mytherin merged commit 44fec4a into duckdb:main Sep 1, 2023
@Mytherin
Copy link
Collaborator

Mytherin commented Sep 1, 2023

Thanks!

krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
…ions-2

Autoloading mechanism for extensions
krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
…ions-2

Autoloading mechanism for extensions
krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
…ions-2

Autoloading mechanism for extensions
krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
…ions-2

Autoloading mechanism for extensions
krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
- Merge pull request duckdb/duckdb#8732 from samansmink/autoload-extensions-2
krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
- Merge pull request duckdb/duckdb#8732 from samansmink/autoload-extensions-2
krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
- Merge pull request duckdb/duckdb#8732 from samansmink/autoload-extensions-2
krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
- Merge pull request duckdb/duckdb#8732 from samansmink/autoload-extensions-2
krlmlr pushed a commit to duckdb/duckdb-r that referenced this pull request Sep 5, 2023
- Merge pull request duckdb/duckdb#8732 from samansmink/autoload-extensions-2
krlmlr pushed a commit to duckdb/duckdb-r that referenced this pull request Sep 6, 2023
- Merge pull request duckdb/duckdb#8732 from samansmink/autoload-extensions-2
krlmlr pushed a commit to duckdb/duckdb-r that referenced this pull request Sep 7, 2023
- Merge pull request duckdb/duckdb#8732 from samansmink/autoload-extensions-2
krlmlr pushed a commit to duckdb/duckdb-r that referenced this pull request Sep 7, 2023
- Merge pull request duckdb/duckdb#8732 from samansmink/autoload-extensions-2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants