Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Parse idents with * as quoted #1516

Merged
merged 5 commits into from
Jan 16, 2023
Merged

Conversation

max-sixty
Copy link
Member

@max-sixty max-sixty commented Jan 15, 2023

I'm not sure this is a good idea. It solves #1498 is a hacky way.

The tradeoff is that:

from `schema.table`

...needs to compile to

SELECT * FROM 'schema'.'table' -- (or just `schema.table`, no backticks, but *not* 'schema.table' with quotes)

But:

from `dir/*.parquet`

...needs to compile to

SELECT * FROM 'dir/*.parquet' -- not 'dir/*'.'parquet'

And:

from schema.table

will be parsed as a namespace schema and a column name table, which then breaks.

So this implements a hack where anything with * in gets compiled to a single ident, and so fixes the case in #1498. But the semantics are quite confusing.


I think ideally — but it might be a lot of work — we would instead

  • parse the argument in from and join as namespaces
  • then from schema.table would be parsed as a single namespace schema.table, not a namespace and column
  • but select table.column would still be parsed as a namespace table & column column

And then any ident in backticks could be opaque — we wouldn't need to split by ., we'd take from schema.table, not from `schema.table` and produce the correct thing.


In the meantime, I'm +0.3 on merging this, since it solves the immediate problem.

Past discussions on this include #822 & #852

Comment on lines 244 to 245
fn resolve_ident_wildcard(&mut self, ident: &Ident) -> Result<Ident, String> {
if ident.name != "*" {
return Err("Unsupported feature: advanced wildcard column matching".to_string());
}

let (mod_ident, mod_decl) = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention with this was to do wildcard column matching:

from albums
select [a*_id]
# this selects artist_id and album_id

We'd need full knowledge of the column for that, so let's leave it for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this feature does interfere with this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, but if you use backticks, wildcard matching shouldn't be used...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm, would the wildcard column matching be a PRQL or DB feature?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRQL of course. I don't know any DB that support this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://duckdb.org/docs/sql/query_syntax/select.html

SELECT COLUMNS('number\d+') FROM addresses;

!

@aljazerzen
Copy link
Member

aljazerzen commented Jan 16, 2023

I quite dislike the idea that dir/*.parquet can be an ident, but hey we support backticks, so it should work...

I'm +0.

@snth
Copy link
Member

snth commented Jan 16, 2023

I can't comment on the code or the parsing implications but just to say that in #286 we discussed possibly adding a from_file transform in the future. See for example #286 (comment).

Would that not maybe make the compiler/parser code simpler?

While DuckDB is quite flexible and allows SELECT * FROM 'dir/*.parquet', they also have read_csv and read_parquet functions which can take additional parameters which could be added to from_file.

You could then keep your existing identifier rules for use with from.

from_file could then also error out when compiling to a target other than DuckDB or one that doesn't allow reading csv files.

@max-sixty
Copy link
Member Author

I see from_file as a PRQL function which reads a file and uses its data in a query, like from_text does. Otherwise why not just defer to the DB?

Regardless, we still have a somewhat complex system for parsing idents — I think the middle section in the issue would solve all of these, but be some work to implement.

I'll merge this and have opened a new issue for that, in #1535

@max-sixty max-sixty enabled auto-merge (squash) January 16, 2023 20:53
@max-sixty max-sixty merged commit e1ad3e2 into PRQL:main Jan 16, 2023
@max-sixty max-sixty deleted the ident-parsing branch January 16, 2023 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants