feat: Parse idents with `*` as quoted #1516

max-sixty · 2023-01-15T21:56:28Z

I'm not sure this is a good idea. It solves #1498 is a hacky way.

The tradeoff is that:

from `schema.table`

...needs to compile to

SELECT * FROM 'schema'.'table' -- (or just `schema.table`, no backticks, but *not* 'schema.table' with quotes)

But:

from `dir/*.parquet`

...needs to compile to

SELECT * FROM 'dir/*.parquet' -- not 'dir/*'.'parquet'

And:

from schema.table

will be parsed as a namespace schema and a column name table, which then breaks.

So this implements a hack where anything with * in gets compiled to a single ident, and so fixes the case in #1498. But the semantics are quite confusing.

I think ideally — but it might be a lot of work — we would instead

parse the argument in from and join as namespaces
then from schema.table would be parsed as a single namespace schema.table, not a namespace and column
but select table.column would still be parsed as a namespace table & column column

And then any ident in backticks could be opaque — we wouldn't need to split by ., we'd take from schema.table, not from `schema.table` and produce the correct thing.

In the meantime, I'm +0.3 on merging this, since it solves the immediate problem.

Past discussions on this include #822 & #852

for more information, see https://pre-commit.ci

aljazerzen · 2023-01-16T08:00:34Z

prql-compiler/src/semantic/context.rs

    fn resolve_ident_wildcard(&mut self, ident: &Ident) -> Result<Ident, String> {
-        if ident.name != "*" {
-            return Err("Unsupported feature: advanced wildcard column matching".to_string());
-        }
-
        let (mod_ident, mod_decl) = {


The intention with this was to do wildcard column matching:

from albums select [a*_id] # this selects artist_id and album_id

We'd need full knowledge of the column for that, so let's leave it for now.

But this feature does interfere with this PR

Oh wait, but if you use backticks, wildcard matching shouldn't be used...

To confirm, would the wildcard column matching be a PRQL or DB feature?

PRQL of course. I don't know any DB that support this.

https://duckdb.org/docs/sql/query_syntax/select.html

SELECT COLUMNS('number\d+') FROM addresses;

!

aljazerzen · 2023-01-16T08:04:14Z

I quite dislike the idea that dir/*.parquet can be an ident, but hey we support backticks, so it should work...

I'm +0.

snth · 2023-01-16T08:13:02Z

I can't comment on the code or the parsing implications but just to say that in #286 we discussed possibly adding a from_file transform in the future. See for example #286 (comment).

Would that not maybe make the compiler/parser code simpler?

While DuckDB is quite flexible and allows SELECT * FROM 'dir/*.parquet', they also have read_csv and read_parquet functions which can take additional parameters which could be added to from_file.

You could then keep your existing identifier rules for use with from.

from_file could then also error out when compiling to a target other than DuckDB or one that doesn't allow reading csv files.

max-sixty · 2023-01-16T20:50:02Z

I see from_file as a PRQL function which reads a file and uses its data in a query, like from_text does. Otherwise why not just defer to the DB?

Regardless, we still have a somewhat complex system for parsing idents — I think the middle section in the issue would solve all of these, but be some work to implement.

I'll merge this and have opened a new issue for that, in #1535

max-sixty and others added 2 commits January 15, 2023 13:54

feat: Parse idents with * as quoted

8841e68

[pre-commit.ci] auto fixes from pre-commit.com hooks

1832ce7

for more information, see https://pre-commit.ci

max-sixty mentioned this pull request Jan 15, 2023

Can I use file paths as table names for DuckDB? #1498

Closed

aljazerzen reviewed Jan 16, 2023

View reviewed changes

max-sixty mentioned this pull request Jan 16, 2023

Resolve opaqueness of idents #1535

Closed

.

af75463

max-sixty added 2 commits January 16, 2023 12:50

Merge branch 'main' into ident-parsing

7645090

.

63c7d40

max-sixty enabled auto-merge (squash) January 16, 2023 20:53

max-sixty merged commit e1ad3e2 into PRQL:main Jan 16, 2023

max-sixty deleted the ident-parsing branch January 16, 2023 20:54

snth mentioned this pull request Mar 22, 2023

Offer a format param to from, for DuckDB? #2168

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Parse idents with `*` as quoted #1516

feat: Parse idents with `*` as quoted #1516

max-sixty commented Jan 15, 2023 •

edited

Loading

aljazerzen Jan 16, 2023

aljazerzen Jan 16, 2023

aljazerzen Jan 16, 2023

max-sixty Jan 16, 2023

aljazerzen Jan 16, 2023

max-sixty Jan 16, 2023

aljazerzen commented Jan 16, 2023 •

edited

Loading

snth commented Jan 16, 2023

max-sixty commented Jan 16, 2023

feat: Parse idents with * as quoted #1516

feat: Parse idents with * as quoted #1516

Conversation

max-sixty commented Jan 15, 2023 • edited Loading

aljazerzen Jan 16, 2023

Choose a reason for hiding this comment

aljazerzen Jan 16, 2023

Choose a reason for hiding this comment

aljazerzen Jan 16, 2023

Choose a reason for hiding this comment

max-sixty Jan 16, 2023

Choose a reason for hiding this comment

aljazerzen Jan 16, 2023

Choose a reason for hiding this comment

max-sixty Jan 16, 2023

Choose a reason for hiding this comment

aljazerzen commented Jan 16, 2023 • edited Loading

snth commented Jan 16, 2023

max-sixty commented Jan 16, 2023

feat: Parse idents with `*` as quoted #1516

feat: Parse idents with `*` as quoted #1516

max-sixty commented Jan 15, 2023 •

edited

Loading

aljazerzen commented Jan 16, 2023 •

edited

Loading