Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension Updating #11677

Merged
merged 62 commits into from
May 21, 2024
Merged

Conversation

samansmink
Copy link
Contributor

@samansmink samansmink commented Apr 16, 2024

This PR adds better support for Extension updating. It adds on top of @carlopi's work with #11515

Features

Extension Repositories

This PR introduces extension repositories. This is a concepts that was effectively already existing through the custom_extension_repository setting. But this is now made a little more concrete.

5 repositories have been added here:

static constexpr const char *CORE_REPOSITORY_URL = "http://extensions.duckdb.org";
static constexpr const char *CORE_NIGHTLY_REPOSITORY_URL = "http://nightly-extensions.duckdb.org";
static constexpr const char *COMMUNITY_REPOSITORY_URL = "http://community-extensions.duckdb.org";
static constexpr const char *BUILD_DEBUG_REPOSITORY_PATH = "./build/debug/repository";
static constexpr const char *BUILD_RELEASE_REPOSITORY_PATH = "./build/release/repository";

Repositories are currently only used as an alias for an http endpoint to download extensions from in a structured way. They can be used to select where to install an extension from:

INSTALL aws FROM core_nightly
INSTALL azure FROM core
INSTALL json FROM local_build_debug

Note here that the core repository is the default one. Also note that the community repository does not exist (yet).

Extension installation metadata

In this PR, we add the concept of ExtensionInstallInfo:

class ExtensionInstallInfo {
public:
	//! How the extension was installed
	ExtensionInstallMode mode;
	//! Full path where the extension was generated from
	string full_path;
	//! (optional) Repository url where the extension came from
	string repository_url;
	//! (optional) Version of the extension
	string version;
}

The ExtensionInstallInfo is created on installation of an extension. And written to ~/.duckdb/extensions/<duck_version>/<platform>/<ext_name>.duckdb_extension.info.

This information is then used during updating and for the duckdb_extensions() table function.

UPDATE EXTENSIONS syntax

The main thing this PR adds the UPDATE EXTENSIONS statement. This statement adds support for updating extensions in a slightly more clever way than the FORCE INSTALL that is now used to update an extension.

UPDATE EXTENSIONS will go through all currently installed extensions and try to update them, then return a table containing the information on what happened for each extensions that it tried to update:

D UPDATE EXTENSIONS;
┌────────────────┬────────────────────────────┬─────────────────────┬──────────────────┬─────────────────┐
│ extension_name │         repository         │    update_result    │ previous_version │ current_version │
│    varcharvarcharvarcharvarcharvarchar     │
├────────────────┼────────────────────────────┼─────────────────────┼──────────────────┼─────────────────┤
│ httpfs         │ ./build/release/repository │ NO_UPDATE_AVAILABLE │ f00322b7ca       │ f00322b7ca      │
│ json           │ ./build/release/repository │      UPDATED        │ 1de4405ead       │ f00322b7ca      │
└────────────────┴────────────────────────────┴─────────────────────┴──────────────────┴─────────────────┘

Passing a list of extensions to be explicitly updated is also possible using:

UPDATE EXTENSIONS (json, httpfs);

update_result is an enum printed using its ToString of the following class:

enum class ExtensionUpdateResultTag : uint8_t {
	// Fallback for when installation information is missing
	UNKNOWN = 0,

	// Either a fresh file was downloaded and versions are identical
	NO_UPDATE_AVAILABLE = 1,
	// Only extensions from repositories can be updated
	NOT_A_REPOSITORY = 2,
	// Only known, currently installed extensions can be updated
	NOT_INSTALLED = 3,
	// Statically loaded extensions can not be updated; they are baked into the DuckDB executable
	STATICALLY_LOADED = 4,
	// This means the .info file written during installation was missing or malformed
	MISSING_INSTALL_INFO = 5,

	// The extension was re-downloaded from the repository, but due to a lack of version information
	// its impossible to tell if the extension is actually updated
	REDOWNLOADED = 254,
	// The version was updated to a new version
	UPDATED = 255,
};

Finally note that when installing extensions directly using a full path, they will be marked as a CUSTOM_PATH extension that will be disregarded during updating. This differentiation is printed by duckdb_extensions() to make this clear and the NOT_A_REPOSITORY update_result will be printed when UPDATE EXTENSIONS is run with an extension installed through a direct path.

Install specific version of an extension

Another syntax addition that was added in this PR is the possibility to install a specific version of an extension, using:

INSTALL some_extension FROM core_nightly VERSION 'v0.0.1-dev';

What this allows us to do is have multiple versions of an extension be available at the same time. Note that this is currently not done, nor necessarily desired. It is mostly a useful function to have working as a workaround for when extensions get more complex/mature in the future.

Errors on invalid extensions

With this PR, behaviour for extension installation has changed to be more strict. The metadata in the extension footer is checked both on installation and loading. Note that there is a setting allow bypassing these checks: allow_extensions_metadata_mismatch. Note that when loading and extension, allow_extensions_metadata_mismatch only works when duckdb is started with allow_unsigned_extensions;

Write order of Metadata file

Before this PR, extension installation was atomic: the extensions were written to a tmp file, then copied using the filesystem. With the introduction of the metadata file, this is no longer the case. Therefore, its important that we properly handle the various cases things can get corrupted. The file write code is as follows:

static void WriteExtensionFiles(FileSystem &fs, const string &temp_path, const string &local_extension_path,
                                void *in_buffer, idx_t file_size, ExtensionInstallInfo &info) {
	// Write extension to tmp file
	WriteExtensionFileToDisk(fs, temp_path, in_buffer, file_size);

	// Write metadata to tmp file
	auto metadata_tmp_path = temp_path + ".info";
	auto metadata_file_path = local_extension_path + ".info";
	WriteExtensionMetadataFileToDisk(fs, metadata_tmp_path, info);

	// First remove the local extension we are about to replace
	if (fs.FileExists(local_extension_path)) {
		fs.RemoveFile(local_extension_path);
	}

	// Then remove the old metadata file
	if (fs.FileExists(metadata_file_path)) {
		fs.RemoveFile(metadata_file_path);
	}

	fs.MoveFile(temp_path, local_extension_path);
	fs.MoveFile(metadata_tmp_path, metadata_file_path);
}

To properly handle corruption caused by crashes in this function, DuckDB should:

  • handle extensions with missing metadata files
  • only read metadata files when there is a corresponding extension file on disk

Note that all these cases are tested in the update_extensions_ci.test described below

Testing

Extension updating is a little annoying to test but I've done my best. The crux is tested by test/extension/update_extensions_ci.test. This test runs in a separate ci job in the nightly workflow. This test should quite extensively test this mechanism testing:

  • installation using the default http path with a minio-hosted repository
  • installation using various local repositories
  • installing gzipped and non gzipped
  • updating
  • handling missing metadata
  • handling corrupt metadata
  • handling direct installation (without repository)

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes! Looks great - some minor comments then this is good to go

extension-updating:
name: Extension updating test
runs-on: ubuntu-20.04
# needs: linux-memory-leaks TODO revert
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

test/extension/update_extensions_ci.test Outdated Show resolved Hide resolved
third_party/libpg_query/include/nodes/parsenodes.hpp Outdated Show resolved Hide resolved
src/parser/parsed_data/load_info.cpp Outdated Show resolved Hide resolved
src/parser/parsed_data/load_info.cpp Outdated Show resolved Hide resolved
src/main/extension/extension_install.cpp Outdated Show resolved Hide resolved
src/main/extension/extension_install.cpp Outdated Show resolved Hide resolved
src/main/extension/extension_install.cpp Outdated Show resolved Hide resolved
src/main/extension/extension_install.cpp Outdated Show resolved Hide resolved
@Mytherin Mytherin marked this pull request as ready for review May 17, 2024 22:10
@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 18, 2024 15:34
@samansmink samansmink marked this pull request as ready for review May 19, 2024 12:52
@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 20, 2024 18:16
@samansmink samansmink marked this pull request as ready for review May 20, 2024 18:16
@Mytherin Mytherin merged commit 0b3576e into duckdb:main May 21, 2024
54 of 65 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

@samansmink samansmink deleted the install-extension-version-merged branch May 21, 2024 07:59
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request May 21, 2024
Merge pull request duckdb/duckdb#11677 from samansmink/install-extension-version-merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants