-
Notifications
You must be signed in to change notification settings - Fork 72
Description
Is your feature request related to a problem? Please describe.
Mostly for testing purposes, it would be great if dask-sql would support creating tables of inline data using the VALUES
keyword.
Describe the solution you'd like
In many SQL dialects (including DataFusion, Postgres, and DuckDB), it's possible to construct tables from literal values using the VALUES
keyword. See https://www.postgresql.org/docs/current/queries-values.html.
For example:
SELECT * FROM (VALUES (1, 2), (1, 3)) as tbl(column1, column2)
In the DataFusion CLI, this evaluates to
+---------+---------+
| column1 | column2 |
+---------+---------+
| 1 | 2 |
| 1 | 3 |
+---------+---------+
This isn't currently supported in dask-sql. For example:
from dask_sql import Context
c = Context()
result = c.sql(r"""
SELECT * FROM (VALUES (1, 2), (1, 3)) as tbl(column1, column2)
""")
...
NotImplementedError: No relational conversion for node type Values available (yet).
Describe alternatives you've considered
None
Additional context
I'm in the early stages of adding SQL support to VegaFusion, and I'd like to test SQL dialect generation using self-contained queries that include small inline datasets.
Activity
ayushdg commentedon Jan 31, 2023
Thanks for raising this issue. As mentioned the datafusion planner supports these kind of nodes and it shouldn't be too hard to add an implementation on the dask-sql side.
Is your current plan to use these queries with cpu backed dask dataframes or gpu backed dask-cudf dataframes as well?
The primary reason I'm asking is because it's easier to default to creating cpu backed dask dataframes by default, since we don't have a good api today to allow users specifying gpu tables for inline cases like these.
jonmmease commentedon Feb 1, 2023
For my purposes the CPU backend would be preferable, and since these would necessarily be small datasets, my guess is that this is probably appropriate in general.