Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for attaching multiple DuckDB Databases #5764

Merged
merged 107 commits into from
Dec 22, 2022

Conversation

Mytherin
Copy link
Collaborator

@Mytherin Mytherin commented Dec 22, 2022

Closes #5048
Closes #1985

This PR adds support for using ATTACH and DETACH to attach multiple DuckDB databases to the same running database instance. This is a major refactor that does not just add support for reading external databases - but adds full support for operating on multiple databases concurrently, including creating new tables, views, schemas, inserting data, updating and deleting data, altering tables, etc.

Example Usage

ATTACH 'new_db.db';
CREATE TABLE new_db.tbl(i INTEGER);
INSERT INTO new_db.tbl SELECT * FROM range(1000);
DETACH new_db;

A list of all attached databases can be obtained using the command SHOW databases.

SHOW databases;
┌─────────┐
│  name   │
│ varchar │
├─────────┤
│ memory  │
│ system  │
│ temp    │
└─────────┘

Object Resolution & Qualified Names

The full qualified name of all objects now contains the catalog in addition to the schema. For example:

ATTACH 'new_db.db';
CREATE SCHEMA new_db.my_schema;
CREATE TABLE new_db.my_schema.my_table(col INTEGER);
SELECT new_db.my_schema.my_table.col FROM new_db.my_schema.my_table;

The catalog search path determines the set of schemas in which objects are looked up by default. The default catalog search path includes the system catalog, the temporary catalog and the initially attached database together with the main schema.

We can change the default database + schema pair using the USE command

-- need to qualify fully
SELECT * FROM new_db.my_schema.my_table;

-- after changing the default database, we only need to provide the schema
USE new_db;
SELECT * FROM my_schema.my_table;

-- after changing the default database + schema, we don't need to qualify at all
USE new_db.my_schema;
SELECT * FROM my_table;

When providing only a single qualification, the system can interpret this as either a catalog or a schema, as long as there are no conflicts. For example:

ATTACH 'new_db.db';
CREATE SCHEMA my_schema;
-- creates a table in database "new_db"
CREATE TABLE new_db.tbl(i INTEGER);
-- creates a table in schema "my_schema"
CREATE TABLE my_schema.tbl(i INTEGER);

If we create a conflict (i.e. we have both a schema and a catalog witht he same name) the system requests that a fully qualified path is used instead:

CREATE SCHEMA new_db;
CREATE TABLE new_db.tbl(i INTEGER);
Error: Binder Error: Ambiguous reference to catalog or schema "new_db" - use a fully qualified path like "memory.new_db"

Database Manager & Attached Databases

Adding support for attaching multiple databases has several interesting consequences for system design. After this rework, DuckDB now supports having multiple catalogs, multiple storage managers, and multiple active (running) transactions.

This is achieved by moving the Catalog, StorageManager and TransactionManager classes out of the DatabaseInstance class and into a separate AttachedDatabase class. The DatabaseManager class is added to manage the set of currently attached databases.

Transactions

The Transaction object inside the connection has been replaced with a MetaTransaction - which is responsible for managing the (potentially multiple) active transactions when reading and writing to different attached databases. The actual underlying transactions are started lazily. That is to say, calling BEGIN TRANSACTION no longer beings an actual transaction in a database but only starts a MetaTransaction. When an attached database is referenced a transaction is started in that attached database. On commit or rollback all active transactions are ended.

SET immediate_transaction_mode=true can be toggled to change this behavior to eagerly start transactions in all attached databases instead. This is primarily useful for writing tests involving multiple transactions.

While multiple transactions can be active at a time - the system only supports writing to a single attached database in a single transaction. If you try to write to multiple attached databases in a single transaction the following error will be thrown:

Attempting to write to database "memory" in a transaction that has already modified database "database" - a single transaction can only write to a single attached database.

Catalogs & System Functions

As there are now multiple catalogs - the Catalog::GetCatalog(context) function has been removed. It has been replaced by two functions:

// obtains a reference to the *system* catalog
Catalog &GetSystemCatalog(ClientContext &context);
// obtains a reference to the given catalog by name
Catalog &GetCatalog(ClientContext &context, const string &catalog_name);

Similarly, Transaction::Get now requires a catalog to be specified:

Transaction &Get(ClientContext &context, Catalog &catalog);

The system catalog is a database that is always attached on start-up that holds all system data - including system functions, built-in views, etc. The system catalog is special in that it does not have any attached storage - and hence does not support storing tables. Extensions also generally register their new functions in the system catalog.

Temporary Entries

In addition to the system catalog, temporary objects have also been moved from a schema within the main catalog to a separate catalog. Each client creates their own attached in-memory catalog called temp that holds temporary objects.

…llow creating non-internal entries in system catalog and vice versa
…d CatalogTransaction to avoid having to create a connection + transaction to fill built-in functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DuckDB Reader & Attach Functionality
1 participant