-
Notifications
You must be signed in to change notification settings - Fork 467
Description
dlt version
1.8.0
Describe the problem
I want to load data from a DuckDB/ MotherDuck source using either the sql_database or sql_table functions.
Both fail as dlt fails to resolve DuckDBs catalog inspection to correctly scope table and column qualifications.
I am using https://github.com/Mause/duckdb_engine verions 0.15.0.
Expected behavior
dlt is generating SQL which correctly identifies a specified table based on DuckDB's catalog conventions and the pipeline completes without error.
Steps to reproduce
-
Create a DuckDB instance with a synthetic test table as described here: https://duckdb.org/docs/stable/guides/snippets/create_synthetic_data.html
-
Create a pipeline which uses DuckDB as a source like so:
conn_string = f"duckdb:///md:{db_name}{access_token_suffix}"
credentials = create_engine(
conn_string,
connect_args={
'read_only': True,
}
)
pipeline = dlt.pipeline(
pipeline_name=pipeline_name,
destination='postgres',
dataset_name=dataset_name,
progress="log",
)
source1 = sql_database(
credentials,
schema=schema_name, # with our without, same effect
table_names=['foobar'], # or schema_name.foobar or db_name.schema_name.foobar
reflection_level="full",
)
source2 = sql_database(
credentials,
schema=schema_name, # with our without, same effect
reflection_level="full",
).with_resources('foobar')
source3 = sql_table(
credentials,
schema="info",
table="foobar",
)
load_info1 = pipeline.run(source1, write_disposition=write_disposition)
print(load_info1)
load_info2 = pipeline.run(source2, write_disposition=write_disposition)
print(load_info2)
load_info3 = pipeline.run(source3, write_disposition=write_disposition)
print(load_info3)Expected errors to see:
dlt.pipeline.exceptions.PipelineStepFailed: Pipeline execution failed at stage extract when processing package 1742387100.2242408 with exception:
<class 'dlt.extract.exceptions.ResourceExtractionError'>
In processing pipe foo: extraction of resource foo in generator table_rows caused an exception: (duckdb.duckdb.CatalogException) Catalog Error: Table with name foo does not exist!
Did you mean "db_name.schema_name.foo"?
LINE 2: FROM foo
^
[SQL: SELECT foo.id, foo.created_at, foo.bar
FROM foo]
(Background on this error at: https://sqlalche.me/e/20/f405)
When fully qualifying the table name in the functions like db_name.schema_name.table_name, the error returned is:
sqlalchemy.exc.InvalidRequestError: Could not reflect: requested table(s) not available in Engine(duckdb:///foobar): db_name.schema_name.table_name
Operating system
macOS
Runtime environment
Local
Python version
3.12
dlt data source
sql_database
dlt destination
Postgres
Other deployment details
No response
Additional information
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status