Skip to content

[ENH] Support VALUES query #1013

@jonmmease

Description

@jonmmease

Is your feature request related to a problem? Please describe.
Mostly for testing purposes, it would be great if dask-sql would support creating tables of inline data using the VALUES keyword.

Describe the solution you'd like
In many SQL dialects (including DataFusion, Postgres, and DuckDB), it's possible to construct tables from literal values using the VALUES keyword. See https://www.postgresql.org/docs/current/queries-values.html.

For example:

SELECT * FROM (VALUES (1, 2), (1, 3)) as tbl(column1, column2)

In the DataFusion CLI, this evaluates to

+---------+---------+
| column1 | column2 |
+---------+---------+
| 1       | 2       |
| 1       | 3       |
+---------+---------+

This isn't currently supported in dask-sql. For example:

from dask_sql import Context
c = Context()
result = c.sql(r"""
SELECT * FROM (VALUES (1, 2), (1, 3)) as tbl(column1, column2)
""")
...
NotImplementedError: No relational conversion for node type Values available (yet).

Describe alternatives you've considered
None

Additional context
I'm in the early stages of adding SQL support to VegaFusion, and I'd like to test SQL dialect generation using self-contained queries that include small inline datasets.

Activity

ayushdg

ayushdg commented on Jan 31, 2023

@ayushdg
Collaborator

Thanks for raising this issue. As mentioned the datafusion planner supports these kind of nodes and it shouldn't be too hard to add an implementation on the dask-sql side.
Is your current plan to use these queries with cpu backed dask dataframes or gpu backed dask-cudf dataframes as well?

The primary reason I'm asking is because it's easier to default to creating cpu backed dask dataframes by default, since we don't have a good api today to allow users specifying gpu tables for inline cases like these.

added
pythonAffects Python API
and removed
needs triageAwaiting triage by a dask-sql maintainer
on Jan 31, 2023
jonmmease

jonmmease commented on Feb 1, 2023

@jonmmease
Author

For my purposes the CPU backend would be preferable, and since these would necessarily be small datasets, my guess is that this is probably appropriate in general.

linked a pull request that will close this issue on Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestpythonAffects Python API

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @jonmmease@ayushdg

      Issue actions

        [ENH] Support VALUES query · Issue #1013 · dask-contrib/dask-sql