Resolve opaqueness of idents #1535

max-sixty · 2023-01-16T20:37:44Z

Edit: Renamed from "Resolve idents based on type"; new proposal below. Original issue follows:

Continuing on from #1516

Currently backticks are required in:

from `schema.table`

...which compiles to

SELECT * FROM 'schema'.'table' -- (or just `schema.table`, no backticks, but *not* 'schema.table' with quotes)

That's because the resolver will try to resolve...

from schema.table

...as a namespace schema and a column name table, which then breaks.

I think ideally — we would instead resolve the argument based on its type:

from & join would resolve to namespaces
select & derive etc would resolve to columns

Then:

from schema.table would be parsed as a single namespace schema.table, not a namespace schema and column table.
select table.column would still be parsed as a namespace table & column column, because that's typed as a column

And then:

any ident in backticks could be opaque — we wouldn't need to split by ., clearing up feat: Parse idents with * as quoted #1516
we'd take from schema.table, not from `schema.table` and produce the correct thing

The text was updated successfully, but these errors were encountered:

max-sixty · 2023-03-19T16:55:21Z

@aljazerzen as discussed

max-sixty · 2023-03-21T06:47:25Z

We discussed this on the dev call and I said I'd think more about the tradeoffs. I think this issue demonstrates some weaknesses is our current name handling. We have a tri-lemma; we want:

The ability to specify an identifier as opaque.
- This:
```
from `https://example.com/foo.parquet` 
```
  should definitely not compile to
```
FROM "https://example"."com/foo".parquet
```
To not need to manage database schema hierarchy in PRQL
- Ideally we could avoid managing more than a) Table and b) Column.
  - Specifically, all identifiers are $namespace.$column, where $namespace can be opaque
- i.e. from foo.bar can be handled as the table foo.bar; select baz.foo.bar can just be the table baz.foo & the column bar.
- NB: Databases require breaking up the idents when quoted — i.e. SELECT * FROM "foo.bar" will try and query a Table named foo.bar, not a schema named foo and table named bar
  - So currently we split up the namespace by periods when compiling to SQL
Names work similarly wherever they're used
- This was the point @aljazerzen made on the dev call — with the approach this issue proposed above, we'd be handling identifiers differently depending on where they were used. from schema.table would not attempt to split schema.table, but select table.column would attempt to look up column in table.

The trilemma is that we can't avoid splitting https://example.com/foo.parquet, while splitting select baz.foo.bar into "bar"."foo"."bar", unless we handle select and from differently, which we don't want to do.

I think the current state is quite bad, and it's worth adjusting & investing to fix it, even if it means breaking things.

The only option I can think of is that we give up the "To not need to manage database schema hierarchy in PRQL". So, we would:

Maintain a nested namespace of objects
- The biggest open question is whether this is quagmire.... Do we need to start understanding whether bar in from foo.bar.baz | select bar.baz is the same in both references??
Anything within backticks would be opaque
- So from `https://example.com/foo.parquet` would compile correctly
To compile to FROM "project-foo".dataset.table we'd supply from `project-foo`.dataset.table
- and not `from `project-foo.dataset.table` as it is now

As long as it's do-able, I think this would be an excellent result — a simpler model, with fewer surprises, and much better for things like URLs.

The main internal change is that Namespaces would need to become hierarchical. Any thoughts? We can discuss when \@aljazerzen is back

I'm not sure this was correct prior; at least it didn't pass these tests, and I couldn't work out why it started with `!`. This is part of the refactoring in PRQL#1535

I'm not sure this was correct prior; at least it didn't pass these tests, and I couldn't work out why it started with `!`. This is part of the refactoring in #1535

Part of PRQL#1535, this changes our Idents in PL from `$namespace.$name` to an arbitrary hierachy. It's full of `.clone()` & `.into()` -- I've been doing this by replacing one definition and then adding lots of `.into()` untils it passes (not the most conceptual work!). Doing it incrementally at least means I can't end up in a quagmire of not knowing why the new version doesn't work; it's easy to revert to the last known good state. We can do another pass to improve the rust / reduce allocations. The plan here would be to: - replace any other usages in the compilation prior to the ident being resolved. Once it's resolved, it doesn't necessarily need a full hierarchy. Possibly we can remove the old `Ident` / rename the new one to `Ident`. - adjust how the resolution works so we can have arbitrary hierarchies of schema (`a.b.c`). Still some work to think about how this should work (some initial comments in PRQL#1535) - make backticks fully opaque, so PRQL#1535 works -- then using parquet files will be easy, we won't need to quote schemas, the semantics will be simple & consistent

max-sixty · 2023-03-27T15:47:47Z

I'm closing #2305 but pasting the next steps from that here, as those are still required:

adjust how the resolution works so we can have arbitrary hierarchies of schema (a.b.c). Still some work to think about how this should work (some initial comments in Resolve opaqueness of idents #1535)

make backticks fully opaque, so Resolve opaqueness of idents #1535 works -- then using parquet files will be easy, we won't need to quote schemas, the semantics will be simple & consistent

Trying to understand these as part of PRQL#1535; GPT-4 helped with some tests. More to come

max-sixty added the compiler label Jan 16, 2023

This was referenced Jan 16, 2023

feat: Parse idents with * as quoted #1516

Merged

Can I use file paths as table names for DuckDB? #1498

Closed

aljazerzen added language-design Changes to PRQL-the-language needs-discussion Undecided dilemma labels Jan 17, 2023

aljazerzen removed the needs-discussion Undecided dilemma label Feb 13, 2023

eitsupi mentioned this issue Mar 14, 2023

Offer a format param to from, for DuckDB? #2168

Closed

max-sixty added the priority label Mar 19, 2023

max-sixty changed the title ~~Resolve idents based on type~~ Resolve opaqueness of idents Mar 22, 2023

max-sixty mentioned this issue Mar 24, 2023

fix: Ident::starts_with #2303

Merged

max-sixty added a commit to max-sixty/prql that referenced this issue Mar 24, 2023

fix: Ident::starts_with

7430941

I'm not sure this was correct prior; at least it didn't pass these tests, and I couldn't work out why it started with `!`. This is part of the refactoring in PRQL#1535

max-sixty added a commit that referenced this issue Mar 24, 2023

fix: Ident::starts_with (#2303)

7500a4f

I'm not sure this was correct prior; at least it didn't pass these tests, and I couldn't work out why it started with `!`. This is part of the refactoring in #1535

max-sixty mentioned this issue Mar 24, 2023

refactor: Implement Ident as vec #2305

Closed

max-sixty mentioned this issue Mar 31, 2023

internal: Remove unneeded expression #2360

Merged

max-sixty added a commit to max-sixty/prql that referenced this issue Apr 2, 2023

test: Add some unit tests for Module

6af1be7

Trying to understand these as part of PRQL#1535; GPT-4 helped with some tests. More to come

This was referenced Apr 2, 2023

test: Add some unit tests for Module #2367

Merged

internal: Add a docstring & question to Module #2326

Merged

This was referenced Apr 11, 2023

docs: RFC modules #2129

Merged

fix: [Early WIP] Allow Idents to contain namespaces #2431

Closed

max-sixty mentioned this issue May 8, 2023

Query starts with "from starwars.csv" is not working #2554

Closed

2 tasks

aljazerzen mentioned this issue May 11, 2023

feat!: infering modules and treating them as SQL schemas #2563

Merged

aljazerzen closed this as completed in #2563 May 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve opaqueness of idents #1535

Resolve opaqueness of idents #1535

max-sixty commented Jan 16, 2023 •

edited

Loading

max-sixty commented Mar 19, 2023

max-sixty commented Mar 21, 2023 •

edited

Loading

max-sixty commented Mar 27, 2023

Resolve opaqueness of idents #1535

Resolve opaqueness of idents #1535

Comments

max-sixty commented Jan 16, 2023 • edited Loading

max-sixty commented Mar 19, 2023

max-sixty commented Mar 21, 2023 • edited Loading

max-sixty commented Mar 27, 2023

max-sixty commented Jan 16, 2023 •

edited

Loading

max-sixty commented Mar 21, 2023 •

edited

Loading