feat!: require explict database references by aljazerzen · Pull Request #4223 · PRQL/prql

aljazerzen · 2024-02-13T17:40:27Z

This PR removes std.from and renames default_db module to from. This is the effect:

from.x
join from.y (==id)

aljazerzen · 2024-02-13T17:50:47Z

I've tried a few different ways to implement the special case name-resolving behavior of std.from and std.join, but I could not consider any of them as anything else than a hack that works only most of the time.

So I give up of trying to make this behavior work and am instead removing it.

I don't accept this PR to be merged as it has arguably worse aesthetics than what we had before. If it doesn't I don't mind, but I would continue to develop a fork of PRQL as my pet language.

As @max-sixty pointed out the other day, current state of prqlc is such that most things work, but when you get off the beaten track, things start to break with weird error messages. I attribute this to the fact that we have a few "hackish" parts of the codebase, which are necessary for a few features that I consider "inconsistent with the rest of the language".

It is probably possible to make these feature work correctly, but I don't have the motivation to spend enormous amounts of time to do that.

max-sixty · 2024-02-13T19:19:14Z

I don't accept this PR to be merged as it has arguably worse aesthetics than what we had before. If it doesn't I don't mind, but I would continue to develop a fork of PRQL as my pet language.

Can we discuss more?¹ Happy to speak this week on Discord.

You're contributed more to PRQL than anyone, and so your view should have a huge weight. I do think it would be a shame for both PRQL and PRQL-pet-lang — one would lose most of its code momentum, and the other would lose most of its community momentum. Projects really need both to be successful. I accept we've been a bit stuck for a while, without the user momentum we'd expect from the community + code momentum.

I admittedly don't get the from.x. We did initially have no from as an aesthetic choice, but had broad consensus that leading transforms on every line were easier to read. This basically removes that, and mandates a form of default_db.foo throughout.

If we're looking for smaller changes interface changes which achieve the same goal, could we restrict from foo to always be from default_db.foo And then a prefix is required to escape that? (I'm not sure this is great but possibly there are similar things)

As @max-sixty pointed out the other day, current state of prqlc is such that most things work, but when you get off the beaten track, things start to break with weird error messages. I attribute this to the fact that we have a few "hackish" parts of the codebase, which are necessary for a few features that I consider "inconsistent with the rest of the language".

If we think we can't get PRQL to be robust with the current strategy and we need to make bigger changes, there are things we could do to restrict the scope and make it more robust.

For example at the extreme, we could force having a schema in order to compile PRQL. This would be a big change, and would significantly reduce its flexibility, but it would also have the potential to become really robust in the short-medium term. (And it wouldn't completely preclude us from removing this constraint in the future)

FWIW I do think it's a bit extreme to say "if this isn't merged then I'm forking", but I realize we haven't spoken in a while, and I haven't contributed to the compiler itself at all recently, so maybe this is what's necessary to inspire the change... ↩

eitsupi · 2024-02-14T03:44:07Z

Thank you for the enormous amount of work and consideration on these.
IIUC, I feel that it might make sense to remove it because currently from is not doing anything (and may work without it).

However, I feel that there is a better alternative to the database than from.<table_name>.
For example, dummy.<table_name>.

aljazerzen · 2024-02-14T14:47:41Z

Mentioning the word "fork" was probably not very wise of me :D Please understand my frustration when rewriting the name resolution for the 3rd time and hitting a wall again.

Of course we can talk about what to do about the issue of from and how to make it more ergonomic, but at least for now, I will drop the restriction that "the name resolution of an expression depends on the function call that the expression is in".

In practical terms for PRQL, this means that in this example:

let b = read_parquet a

let c = from a

... variable a will use the same name resolution rules regardless of it is used in read_parquet or in from.

Currently this does not hold, as the example above would report a missing error for a in read_parquet, but would infer that a is a table when used in from.

My solution is simple: require the module name in all table references.

In the example above, that would mean this:

let b = read_parquet a

let c = from default_db.a

... because default_db is currently the name of the module that infers all table declarations by default.

(This is the required part, now follows what can easily be changed)

To make this more ergonomic:

I've removed from as it is really a no-op now,
I've also changed name default_db to from, so keep the resemblance to the current aesthetics.

# proposed change
from.my_table
select {a, b}
join from.other_table

But this is not the only way, we could have:

# alternative 1
db.my_table
select {a, b}
join db.other_table

... or ...

# alternative 2
from db.my_table
select {a, b}
join db.other_table

max-sixty · 2024-02-14T21:57:41Z

Please understand my frustration when rewriting the name resolution for the 3rd time and hitting a wall again.

Totally, no stress my end.

And to reiterate — sometimes saying the extreme thing is the way to get change! It's often underrated, particularly when others aren't listening well enough.

In the example above, that would mean this:
let b = read_parquet a

let c = from default_db.a

Yes, good point.

If we really have to choose something in the list above, I would vote for:

# alternative 2
from db.my_table
select {a, b}
join db.other_table

I think the natural question from this is — "but why not just assume the db. unless it's ambiguous, i.e. unless it's an existing variable"?

I think the answer is — "because we want to resolve the a before we know it's part of from", is that right? (And we don't want to assume that in derive x = y, y is default_db.y...)

In thinking through this, I'm less opposed to it than I was. I do feel like we lose some of the beautiful simplicity that drew people in to PRQL. But plausibly prefixing db. to every table name isn't that bad.

Two hail-marys before we make the change:

Might types help here at all? Specifically that would do name resolution after we know types; so that from a would be typed as a table, while derive x = y wouldn't be
Would using .column syntax help, jq-style? I don't think so, because a still needs to be resolved between a table and a declared variable.

I really don't get the from.my_table, I don't think from works well as a namespace...

If we also want to drop from, I'm not strongly opposed to it. But folks do generally seem to like it. I think it partly depends on whether we're a simple language with each line starting with pre-existing transforms, like:

from foo      # nice to have each line starting with a transform
derive (bar = baz + 5)
...etc

...or we're a more complicated language with variables and custom functions, like:

let foo = # some table definition
let bar = # some function

foo | bar     # lines aren't starting with transforms anyway

aljazerzen · 2024-02-15T08:27:47Z

"but why not just assume the db.?"
I think the answer is — "because we want to resolve the a before we know it's part of from", is that right? (And we don't want to assume that in derive x = y, y is default_db.y...)

Exactly, well explained.

Might types help here at all? Specifically that would do name resolution after we know types; so that from a would be typed as a table, while derive x = y wouldn't be

I think you are right, we it would not help much. Ideally, name resolution would happen before and independent of type resolution. It would be weird that adding a type annotation would make some name point to completely different variable.

Would using .column syntax help, jq-style? I don't think so, because a still needs to be resolved between a table and a declared variable.

Yup, it wouldn't solve this issue.

I really don't get the from.my_table, I don't think from works well as a namespace...

Looking at code examples with from.x (two days after I made this PR) it does seem very weird. Using a preposition for a namespace now feels like a crime.

If we also want to drop from, I'm not strongly opposed to it. But folks do generally seem to like it

I'm not opposed to it either. It is unnecessary, but it does look nice:

from db.x
join db.y
into a

from a

... as opposed to:

db.x
join db.y
into a

a

Ok, I'll update this PR to "alternative 2".

aljazerzen · 2024-02-15T08:29:32Z

Another thing: std.from would have a utility of easily adding type annotations. If you start a pipeline with std.from, that would signal that this pipeline operates on relations and not arrays of ints for example.

snth · 2024-02-15T10:43:49Z

I'm in favour of "alternative 2" as well.

"but why not just assume the db.?"

I agree with the name resolution reasons discussed above (although I'm not really qualified to comment on them the explanations do make sense), but more importantly for me (given that I mostly wear the "Advanced User" rather than "Compiler Engineer" hat), I think it improves the user experience as well. In particular it makes it clearer what are database tables/views and what are derived relations. I was going to add an example for this but Aljaz has already provided the perfect one:

from db.x
join db.y
into a

from a

I think this nicely separates derived relation a from database relations x and y.

db is a nice, short prefix that's easy to type and perfectly communicates that this is a database relation that we're dealing with.

I would strongly prefer to keep the from, mostly from an aesthetic point of view but I think there might be a technical reason as well, namely how would aliasing work otherwise?

from x=db.long_legacy_system_table_name
join y=db.other_long_legacy_system_table_name
...

What would that be without from? Maybe

x=db.long_legacy_system_table_name
join y=db.other_long_legacy_system_table_name
...

but then one might think shouldn't that have a let in front of the first line but that would actually change the meaning then and would no longer "pipe" records into the join statement, i.e. the following means something different (and presumably is an error?):

let x=db.long_legacy_system_table_name
join y=db.other_long_legacy_system_table_name
...

Where would the alias go with the from.my_table syntax? Maybe

x=from.my_table
...

but that has the same let problem as the previous example.

snth · 2024-02-15T11:08:34Z

Of course aliasing could be solved with separate let declarations but that seems more clumsy to me, e.g. compare

from x=db.long_legacy_system_table_name
join y=db.other_long_legacy_system_table_name
...

with

let x=db.long_legacy_system_table_name

x
join y=db.other_long_legacy_system_table_name
...

And what about role playing aliases, e.g. manager and employee?

let employee=employees

employee
join manager=employees (employee.manager_id==manager.id)
...

Again, I think the following is much better:

from employee=employees
join manager=employees (employee.manager_id==manager.id)

max-sixty · 2024-02-16T19:10:28Z

I'll help clear this up

We should run pr-nightly for these sorts of PRs, since the bindings will break without their paths being changed; so the tests won't run automatically

max-sixty · 2024-02-16T19:37:20Z

#4241

This reverts commit 3dcace7.

aljazerzen added 3 commits February 16, 2024 12:34

feat!: remove from and rename default_db to from

ebb4d70

rename from->db, add std.from

d172f08

fix a few tests

857de65

aljazerzen force-pushed the feat-no-from branch from 415dd6e to 857de65 Compare February 16, 2024 12:52

more tests

091256a

aljazerzen force-pushed the feat-no-from branch from e03689e to 091256a Compare February 16, 2024 13:29

fix

7ffdab7

aljazerzen changed the title ~~feat!: remove from and rename default_db to from~~ feat!: require explict database references Feb 16, 2024

changelog

035e7ac

aljazerzen enabled auto-merge (squash) February 16, 2024 17:37

aljazerzen merged commit 3dcace7 into main Feb 16, 2024

aljazerzen deleted the feat-no-from branch February 16, 2024 17:48

max-sixty added a commit to max-sixty/prql that referenced this pull request Mar 20, 2024

Revert "feat!: require explict database references (PRQL#4223)"

6df7009

This reverts commit 3dcace7.

prql-bot mentioned this pull request May 11, 2026

docs: drop stale noop mention in append.md #5873

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: require explict database references#4223

feat!: require explict database references#4223
aljazerzen merged 6 commits into
mainfrom
feat-no-from

aljazerzen commented Feb 13, 2024 •

edited

Loading

Uh oh!

aljazerzen commented Feb 13, 2024

Uh oh!

max-sixty commented Feb 13, 2024

Uh oh!

eitsupi commented Feb 14, 2024

Uh oh!

aljazerzen commented Feb 14, 2024

Uh oh!

max-sixty commented Feb 14, 2024

Uh oh!

aljazerzen commented Feb 15, 2024

Uh oh!

aljazerzen commented Feb 15, 2024

Uh oh!

snth commented Feb 15, 2024

Uh oh!

snth commented Feb 15, 2024

Uh oh!

max-sixty commented Feb 16, 2024

Uh oh!

max-sixty commented Feb 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

aljazerzen commented Feb 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aljazerzen commented Feb 13, 2024

Uh oh!

max-sixty commented Feb 13, 2024

Footnotes

Uh oh!

eitsupi commented Feb 14, 2024

Uh oh!

aljazerzen commented Feb 14, 2024

Uh oh!

max-sixty commented Feb 14, 2024

Uh oh!

aljazerzen commented Feb 15, 2024

Uh oh!

aljazerzen commented Feb 15, 2024

Uh oh!

snth commented Feb 15, 2024

Uh oh!

snth commented Feb 15, 2024

Uh oh!

max-sixty commented Feb 16, 2024

Uh oh!

max-sixty commented Feb 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aljazerzen commented Feb 13, 2024 •

edited

Loading