Skip to content

feat: add multiple catalogs functionality to MotherDuck connection #3484

Merged
izeigerman merged 12 commits intoSQLMesh:mainfrom
naoyak:enable_duckdb_with_motherduck
Dec 18, 2024
Merged

feat: add multiple catalogs functionality to MotherDuck connection #3484
izeigerman merged 12 commits intoSQLMesh:mainfrom
naoyak:enable_duckdb_with_motherduck

Conversation

@naoyak
Copy link
Contributor

@naoyak naoyak commented Dec 6, 2024

This modifies the MotherDuckConnectionConfig to enable using MotherDuck in conjunction with multiple attached catalogs (e.g. postgres, local duckdb, other MotherDuck databases, etc.). This way you can run models inside MD and join external attached data without injecting the ATTACH statements inside Jinja blocks.

Mostly this was done by moving most of the logic inside DuckDBConnectionConfig upstream into the base class BaseDuckDBConnectionConfig since vanilla DuckDB and MotherDuck are mostly at feature parity at the moment.

So you can configure as follows:

gateways:
  local:
    connection:
      type: motherduck
      # token: {{ env_var('MOTHERDUCK_TOKEN') }}
      catalogs:
        my_db:
          type: motherduck
          path: 'md:my_db'
        local_duckdb:
          type: duckdb
          path: 'local.ddb'
        postgres_db:
          type: postgres
          path: 'dbname=postgres user=postgres password=postgres port=5555 host=0.0.0.0'
default_gateway: local
model_defaults:
  dialect: duckdb
  start: 2024-12-08

Let me know if this works!

@CLAassistant
Copy link

CLAassistant commented Dec 6, 2024

CLA assistant check
All committers have signed the CLA.

@naoyak naoyak force-pushed the enable_duckdb_with_motherduck branch 3 times, most recently from c04068d to 082b609 Compare December 9, 2024 19:27
@naoyak naoyak changed the title [ENH] enable DuckDB catalog with MotherDuck connection [ENH] add multiple catalogs functionality to MotherDuck connection Dec 9, 2024
@naoyak naoyak force-pushed the enable_duckdb_with_motherduck branch 2 times, most recently from 15fcb3f to bf32bf9 Compare December 9, 2024 21:53
@naoyak
Copy link
Contributor Author

naoyak commented Dec 9, 2024

So far it seems to work except when running a model through a duckdb connection with an attached postgres database.

error traceback ``` Traceback (most recent call last): File "/Users/naoya/.pyenv/versions/sqlmesh-dev/bin/sqlmesh", line 8, in sys.exit(cli()) ^^^^^ File "/Users/naoya/.pyenv/versions/3.12.7/envs/sqlmesh-dev/lib/python3.12/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/naoya/.pyenv/versions/3.12.7/envs/sqlmesh-dev/lib/python3.12/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "/Users/naoya/.pyenv/versions/3.12.7/envs/sqlmesh-dev/lib/python3.12/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/naoya/.pyenv/versions/3.12.7/envs/sqlmesh-dev/lib/python3.12/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/naoya/.pyenv/versions/3.12.7/envs/sqlmesh-dev/lib/python3.12/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/naoya/.pyenv/versions/3.12.7/envs/sqlmesh-dev/lib/python3.12/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/naoya/repos/sqlmesh/sqlmesh/cli/__init__.py", line 31, in wrapper return handler(sqlmesh_context, lambda: func(*args, **kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/naoya/repos/sqlmesh/sqlmesh/cli/__init__.py", line 40, in _default_exception_handler return func() ^^^^^^ File "/Users/naoya/repos/sqlmesh/sqlmesh/cli/__init__.py", line 31, in return handler(sqlmesh_context, lambda: func(*args, **kwargs)) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/naoya/repos/sqlmesh/sqlmesh/core/analytics/__init__.py", line 82, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/naoya/repos/sqlmesh/sqlmesh/cli/main.py", line 430, in plan context.plan( File "/Users/naoya/repos/sqlmesh/sqlmesh/core/analytics/__init__.py", line 110, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/naoya/repos/sqlmesh/sqlmesh/core/context.py", line 1136, in plan self.console.plan( File "/Users/naoya/repos/sqlmesh/sqlmesh/core/console.py", line 696, in plan self._show_options_after_categorization( File "/Users/naoya/repos/sqlmesh/sqlmesh/core/console.py", line 791, in _show_options_after_categorization self._prompt_backfill(plan_builder, auto_apply, default_catalog) File "/Users/naoya/repos/sqlmesh/sqlmesh/core/console.py", line 947, in _prompt_backfill plan_builder.apply() File "/Users/naoya/repos/sqlmesh/sqlmesh/core/plan/builder.py", line 211, in apply self._apply(self.build()) File "/Users/naoya/repos/sqlmesh/sqlmesh/core/context.py", line 1387, in apply raise e File "/Users/naoya/repos/sqlmesh/sqlmesh/core/context.py", line 1378, in apply self._apply(plan, circuit_breaker) File "/Users/naoya/repos/sqlmesh/sqlmesh/core/context.py", line 1937, in _apply self._scheduler.create_plan_evaluator(self).evaluate( File "/Users/naoya/repos/sqlmesh/sqlmesh/core/plan/evaluator.py", line 120, in evaluate self._push(plan, snapshots, deployability_index_for_creation) File "/Users/naoya/repos/sqlmesh/sqlmesh/core/plan/evaluator.py", line 233, in _push self.snapshot_evaluator.create( File "/Users/naoya/repos/sqlmesh/sqlmesh/core/snapshot/evaluator.py", line 310, in create for objs in concurrent_apply_to_values( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/naoya/repos/sqlmesh/sqlmesh/utils/concurrency.py", line 257, in concurrent_apply_to_values return [fn(value) for value in values] ^^^^^^^^^ File "/Users/naoya/repos/sqlmesh/sqlmesh/core/snapshot/evaluator.py", line 312, in lambda s: _get_data_objects(s, gateway_by_schema[s]), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/naoya/repos/sqlmesh/sqlmesh/core/snapshot/evaluator.py", line 304, in _get_data_objects objs = self._get_adapter(gateway).get_data_objects(schema, tables_by_schema[schema]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/naoya/repos/sqlmesh/sqlmesh/core/engine_adapter/shared.py", line 302, in internal_wrapper return func(*list_args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/naoya/repos/sqlmesh/sqlmesh/core/engine_adapter/base.py", line 1890, in get_data_objects obj for batch in batches for obj in self._get_data_objects(schema_name, set(batch)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/naoya/repos/sqlmesh/sqlmesh/core/engine_adapter/shared.py", line 338, in internal_wrapper resp = func(*list_args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/naoya/repos/sqlmesh/sqlmesh/core/engine_adapter/duckdb.py", line 112, in _get_data_objects df = self.fetchdf(query) ^^^^^^^^^^^^^^^^^^^ File "/Users/naoya/repos/sqlmesh/sqlmesh/core/engine_adapter/base.py", line 1934, in fetchdf df = self._fetch_native_df(query, quote_identifiers=quote_identifiers) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/naoya/repos/sqlmesh/sqlmesh/core/engine_adapter/base.py", line 1927, in _fetch_native_df self.execute(query, quote_identifiers=quote_identifiers) File "/Users/naoya/repos/sqlmesh/sqlmesh/core/engine_adapter/base.py", line 2060, in execute self._execute(sql, **kwargs) File "/Users/naoya/repos/sqlmesh/sqlmesh/core/engine_adapter/base.py", line 2066, in _execute self.cursor.execute(sql, **kwargs) duckdb.duckdb.CatalogException: Catalog Error: Table with name tables does not exist! Did you mean "system.information_schema.tables"? LINE 1: ...MPORARY' THEN 'table' END AS type FROM information_schema.tables WHERE (table_... ```

@naoyak naoyak changed the title [ENH] add multiple catalogs functionality to MotherDuck connection feat: add multiple catalogs functionality to MotherDuck connection Dec 9, 2024
@naoyak naoyak force-pushed the enable_duckdb_with_motherduck branch from bf32bf9 to 392131c Compare December 9, 2024 23:49
@naoyak naoyak force-pushed the enable_duckdb_with_motherduck branch 3 times, most recently from 878dbf6 to 520813c Compare December 11, 2024 05:54
@naoyak
Copy link
Contributor Author

naoyak commented Dec 11, 2024

@izeigerman would you mind taking a look? I ended up basically moving all the logic from DuckDBConnectionConfig into the base config, but wasn't sure whether to enforce that any config using MD should be using MotherDuckConnectionConfig.

@naoyak naoyak force-pushed the enable_duckdb_with_motherduck branch from 520813c to 6fec0db Compare December 13, 2024 07:24
@naoyak naoyak requested a review from izeigerman December 13, 2024 19:15
@naoyak naoyak force-pushed the enable_duckdb_with_motherduck branch from 2f11d96 to e60bb0d Compare December 17, 2024 06:03
connection_str = f"md:{self.database}"
custom_user_agent_config = {"custom_user_agent": f"SQLMesh/{__version__}"}
if not self.database:
return {"config": custom_user_agent_config}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would the motherduck configuraion not have a database specified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the user now has the option of passing catalogs to the MotherDuck config instead of a single database like the duckdb config (i.e. the main feat addition in this PR).

@naoyak naoyak force-pushed the enable_duckdb_with_motherduck branch from e60bb0d to 91faba4 Compare December 17, 2024 17:31
@naoyak naoyak force-pushed the enable_duckdb_with_motherduck branch from 91faba4 to 04336ae Compare December 18, 2024 17:56
Copy link
Collaborator

@izeigerman izeigerman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing comments!

@izeigerman izeigerman merged commit 25df941 into SQLMesh:main Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants