Query parquet dataset using relational API in Python? #3637
-
Hi, I'm querying a parquet dataset, which works in principle import duckdb
import pyarrow.dataset as ds
con = duckdb.connect()
ds_arrow = ds.dataset(path, format="parquet", partitioning="hive")
### Directly query using SQL
con.execute("SELECT * FROM ds_arrow").fetchone()
# works Sometimes, it would be nice to be able to use the relational API, the following shows my attempts at doing so: ### Try to query using relational API
con.register("ds_table", ds_arrow)
rel = con.table("ds_table")
# RuntimeError: Table does not exist!
con.execute("CREATE VIEW ds_view AS SELECT * FROM ds_table")
rel = con.view("ds_view")
# works Two questions arise:
Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
I've just tried this out with In any case you can directly create a relation from a python object with |
Beta Was this translation helpful? Give feedback.
-
That works, thanks! Somehow, I must have forgotten to try this...
Alright, thanks for tracking the issue. However, given that |
Beta Was this translation helpful? Give feedback.
I've just tried this out with
con.view()
and looks like a bug.In any case you can directly create a relation from a python object with
con.from_arrow(ds_arrow)