-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: RFC modules #2129
docs: RFC modules #2129
Conversation
``` | ||
mod my_module { | ||
let bangers = (from tracks | take 10) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Syntax is similar to current discussion on tuples/arrays, but I don't think it will clash on the parser-level. That's because this is on the "level" of the statements and those two are both on the "level" of expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to ask about that. Are modules not also structs where the keys are names and the values are either (sub)modules, relations or functions?
Shouldn't we therefore be consistent in the use of symbols and use []
possibly rather than {}
? Your current proposal reads a lot more naturally ofc but curious about the consistency angle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, semantically modules are not equivalent to structs. They don't have a type, cannot be referred to as a value and have different parsing rules.
So I think that here it's ok to choose the syntax that looks best, instead of seeking consistency with structs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also moot if we use just files / #2129 (comment))
web/book/src/internals/modules.md
Outdated
} | ||
``` | ||
|
||
## File loading |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This design is heavily plagiarized from how Rust modules work. Thank god for opensource!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Rust is poor choice to draw inspiration from for a module system. In my opinion it is very confusing and one of the worst parts of Rust.
I think the module system should aim to be as simple and predictable as possible.
I think it would be better to look at PHP and it's include
statement which is dead simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to redesign it to come closer to PHP's include()
or Common.js's require()
and I've realized that this is a very close already.
The main distinctions are:
- we have
module hello
declaration instead of a functioninclude("hello.prql")
, - we have more restrictions on importing that make the module structure more "normalized" (see goal 4),
- these restrictions also allow cyclic references between modules,
- child modules don't have to re-import modules that are already imported into parent modules:
# parent.prql module child1 module child2
# child1.prql # no need to re-import child2, as it is already in the module structure let _ = (from child2.some_decl | ...)
web/book/src/internals/modules.md
Outdated
mod default_db { | ||
# all inferred tables and defined CTEs | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I don't remember why this module is needed. Maybe we could get away by having the tables on the top-level.
It would simplify the compiler a bit.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great!
I found this quite insightful: https://matklad.github.io/2021/11/27/notes-on-module-system.html. It recommends some simplifications to the module system:
- No
mod {}
— is it worth the complication, as opposed to just files? - No
mod.rs
— insteadfoo/foo.rs
(it actually has_foo.rs
, but why the leading underscore?) — I like this, it reduces the number ofmod.rs
files
Are you thinking that imports would be implicit? Or would that be a separate discussion?
web/book/src/internals/modules.md
Outdated
When compilation is invoked compiler will search for the root of the module | ||
structure of the project: | ||
|
||
1. attempt to load `./project.prql`. If successful use it as the root, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you see as the difference between a project and a module? If we don't need a distinction, we can avoid having one at the moment, and just use mod
for both...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was for project to represent the root of the module tree.
This may not be needed if we would also have super
(a reference to parent module), which would be nice to have in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does there need to be a "roof of the module"?
Shouldn't one module always be just one file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @max-sixty about not having mod {}
and mod.rs
, but I think just foo.rs
might be enough and not have it's own directory such as foo/foo.rs
.
One module would always be just one file.
Good idea, it does not bring much to the table. We can also add it later without breaking changes.
I also like this. The pain with The underscore is here so it's first alphabetically.
(We can discuss here) I'm not exactly sure what do you mean: Implicit Implicit |
Thinking about it I realize that similar attacks could be possible by injecting Huh. I'll do another pass on the proposal. |
So are you thinking that we'd have
Hmmm, I don't love the underscore aesthetically, given it's the "main" file, but only aesthetically, low weight. |
We don't have to add it now, because you can use longer idents:
I agree, it doesn't look nice. We could do without it, but that:
|
lol, this is technically possible, it just requires doing
OK, so we do have implicit imports — i.e. someone can call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I found this a bit less clear than the other RFC and feel that it could be fleshed out a bit more. I had to read a few sections two or three times to try and ensure that I understood it correctly.
``` | ||
mod my_module { | ||
let bangers = (from tracks | take 10) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to ask about that. Are modules not also structs where the keys are names and the values are either (sub)modules, relations or functions?
Shouldn't we therefore be consistent in the use of symbols and use []
possibly rather than {}
? Your current proposal reads a lot more naturally ofc but curious about the consistency angle.
The documention doesn't mention how to load a module. I expected to find a |
Do we even really needed modules in paths like that? |
It does, but perhaps not clearly enough. You just use
I don't think we can afford that. It would be a ruff arbitrary prohibition that may lead to IDE's overwriting your source files when you rename some module. And we (almost) run into such case in prql-compiler: we have a module with path |
web/book/src/internals/modules.md
Outdated
If an identifier cannot be resolved relative to the current module, it tries to | ||
resolve relative to the parent module. This is repeated, stepping up the module | ||
hierarchy until a match is found or root of the module structure is reached. | ||
|
||
```prql_no_test | ||
module my_playlists { | ||
let decl_1 = ... | ||
|
||
module soundtracks { | ||
let decl_2 = ... | ||
} | ||
|
||
module upbeat_rock { | ||
let decl_3 = ... | ||
|
||
from decl_1 | join soundtracks.decl2 | join decl_3 | ||
} | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this.
Upsides:
- the std module can be defined just in the root module and be accessible
everywhere under namestd
,
Downsides:
- we need to find the root of the module structure,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(edited after reading more)
When does this know when to stop going up the file tree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some possible modules-as-inheritance approaches from the implicit shadowing.
But also possible some quite confusing error messages when a parent changes or adds a definition!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does this know when to stop going up the file tree?
Well, at the root? So std
can be defined in the root only.
But also possible some quite confusing error messages when a parent changes or adds a definition!
If we give declarations in child priority over declarations in parent, no change in parent will be able to overshadow the child declaration.
But it could happen that a module would be referring to a grand-parent module and a change in parent would break the child.
grandparent -> decl
parent -> we add decl here
child -> ref to decl now resolves to parent instead of grandparent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does this know when to stop going up the file tree?
Well, at the root? So
std
can be defined in the root only.
This is moot if we have a file that can only be at the root, such as _project.prql
or project.prql
— we can stop there
But otherwise we need some way of knowing when to stop searching upwards, so we don't keep going to /
. We could do that when we no longer find a match of $dir
containing _$dir.prql
...
I also think this "keep searching up" would be fine to push into a later version if it's not trivial to implement — very reasonable to adjust at implementation time. Though we'd need a way to reference a parent module, such as module ..foo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is a tad different thing I was talking about here.
Here I meant name resolution only, which happens after the module structure has been read from files, parsed and resolved.
When deciding what files to load, the compiler should get the path to the root module and then find references to other files which need to be loaded from there.
I've updated the RFC with all the suggestions. I hope I've also made it more readable @snth :D |
Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
} | ||
``` | ||
|
||
## Example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re all the confusion above — sorry — the example is actually the most anchored thing to discuss from; I should have started here.
My main question is:
Why place the module foo
declarations in the directory modules, rather than have import statements near the code that uses them?
- It doesn't supply that much new information — just that we should use the file that's already there. We're replicating the file structure in declarations that are arguably duplicative
- Instead, the import statement can just look up the file — same as python.
- How do we discriminate between
from foo.bar
wherefoo
is a DB schema vs. a module (assuming I eventually solve Resolve opaqueness of idents #1535)? If we haveuse foo
in the file itself, it's explicit.
So by moving from "declarations in directory modules" to "import statements at the call site", we'd have something like:
# _project.prql
# We can have code here, but there's no need for `module` declarations
# This is identical to below
# employees.prql
let employees = (...)
let salaries = (...)
let departments = (...)
# sales/orders.prql
use ../employees # or some variant of `../` that shows it's a level above
let current_year = (...)
let archived = (...)
let by_employee = (from orders | join employees.employees ...)
WDYT? I hope I'm making more sense now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with that it should just directly load a file with the same name instead of look inside a subdirectory.
I am a bit skeptical against having imports from parent directories though (as you mention with ../
). I lean towards that files should only be able to use other files in the same directory and child directories but not in parent directories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lean towards that files should only be able to use other files in the same directory and child directories but not in parent directories.
Surely this would be super restrictive — any sort of utils package wouldn't be possible, unless every file was in the same path? Are there any other languages that do this? (Or am I misunderstanding the point?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why place the module foo declarations in the directory modules, rather than have import statements near the code that uses them?
Because with this proposal:
- you only "import"/"declare" a module once and then it's in the module structure where you can use it simply by referencing that name, no
use
ormodule
statements required, - you don't need a
import ../foo
(i.e. import parent) statement, because modules are imported/declared from root down (but you can still reference parent modules after they are imported/declared), - refactoring is simpler, since there is a single import/declaration of a module,
How do we discriminate between from foo.bar where foo is a DB schema vs. a module (assuming I eventually solve
Resolve opaqueness of idents #1535)? If we have use foo in the file itself, it's explicit.
Let's discuss that after this RFC is done. I have an idea how to do it on top of modules and declarations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely this would be super restrictive — any sort of utils package wouldn't be possible, unless every file was in the same path? Are there any other languages that do this? (Or am I misunderstanding the point?)
C# and Java imports using namespaces so nothing can be from a parent directory. PHP lets you include
from any file path including relative paths which may be parent directories. JavaScript (Node.js) lets you import from parent directories. I am not sure about Python, but I don't think it is common. I also worry about the security impact of importing from parent directories.
In the PHP world when using Composer it is common that imports from your own code is in your your_project/src/
directory, and third-party code from a your_project/vendor/
directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because with this proposal:
* you only "import"/"declare" a module once and then it's in the module structure where you can use it simply by referencing that name, no `use` or `module` statements required, * you don't need a `import ../foo` (i.e. import parent) statement, because modules are imported/declared from root down (but you can still reference parent modules after they are imported/declared),
The main issue I see is that the action is quite far from the consequence. When we're missing an item in python, it's fairly obvious where the item should be:
from ..utils import read_file
"Couldn't find read_file in utils" — OK great
But in this design, what do we say if something isn't found? There are so many places we might be missing a declaration. The user needs to understand the module system well enough to then go and check each parent in the hierarchy for the missing item.
* refactoring is simpler, since there is a single import/declaration of a module,
Indeed — I can imagine refactorings being quite complicated, because the compiler can no longer tell you "we can't find this item here". I'm fine pushing the conversation re "DB schema vs. a module", but this will further confuse things, since from foo.bar
will also be legitimate SQL even when foo
isn't available as a module.
Or are my fears overblown? I can definitely see that it's an elegant system when everything is ideal — but I can also imagine it being quite confusing with newer users or messier projects.
My secondary — much more minor — concern is that the module
declarations aren't doing that much. IIUC they can only refer to a single file location. Is requiring them (relative to just finding them in the file tree) going to make things clearer?
To put this into the example, I would do something more like python, with a touch of rust, and do:
# _project.prql
# employees.prql
let employees = (...)
let salaries = (...)
let departments = (...)
# sales/_sales.prql
let revenue_by_source = (...)
# sales/orders.prql
use ..employees
let current_year = (...)
let archived = (...)
let by_employee = (from orders | join employees.employees ...)
# sales/projections.prql
use ..orders
let orders_2023 = (from orders.current_year | append orders.archived)
let orders_2024 = (...)
# util.prql
func pretty_print_num col -> (...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point, declarations are far from where they are then used. In essence, the problem with my proposal is that "the module hierarchy" is a whole concept you have to know about as opposed to just knowing how to pull a declaration from one file into another.
I've purposefully made the module system repr in filesystem normalized, so we can have very good hints about where to put your module declarations:
select [sales.utils.my_func my_column]
-----
\___ name utils does not exist
Hint: add `module utils` to `sales/_sales.prql`
module orders
------
\__ cannot load sales/orders.prql or sales/orders/_orders.prql
My point is that knowing about "the module hierarchy" is not a requirement for anyone, because we can provide hints to guide users until it makes sense to them.
My fear is that it will end up having way-too much use
s with the extreme case being:
# _project.prql
# employees.prql
use ..sales
use ..util
let employees = (...)
let salaries = (...)
let departments = (...)
use projections
use orders
use ..employees
use ..utils
# sales/_sales.prql
let revenue_by_source = (...)
# sales/orders.prql
use ..
use ..projections
use ...employees
use ...utils
let current_year = (...)
let archived = (...)
let by_employee = (from orders | join employees.employees ...)
# sales/projections.prql
use ..
use ..orders
use ...employees
use ...utils
let orders_2023 = (from orders.current_year | append orders.archived)
let orders_2024 = (...)
# util.prql
func pretty_print_num col -> (...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've purposefully made the module system repr in filesystem normalized, so we can have very good hints about where to put your module declarations:
select [sales.utils.my_func my_column] ----- \___ name utils does not exist Hint: add `module utils` to `sales/_sales.prql`
I agree this is a very good hint. But given how we would search upwards for a module, if sales
isn't found — i.e. a "root" item, there could be lots of places that it could be.
One option would be to do completely automatic imports — no need for module
, no need for use
. I might even prefer that to only having module
— which seems to ask for more than gives, at least until we had non-file module
s.
My synthesis after thinking about this is:
- Just using a file structure is simpler than offering non-file modules (we agree)
use
statements are explicit, reduce confusion, not that costly- Given the language goals & users, we should favor simplicity. And imports are often pits of confusion.
- I do think your example of the hint is quite good, but I'm still worried about the ambiguity because of the hierarchy search, and whether the hint is so good that the language should infer it itself
WDYT?
Any thoughts from others? @eitsupi & @snth ? (Note — please read the comparative examples carefully, don't make the same mistake I did...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nobody wants to weigh-in here :(
I do think your example of the hint is quite good, but I'm still worried about the ambiguity because of the hierarchy search, and whether the hint is so good that the language should infer it itself
Good point, it may be more confusing than helpful.
One option would be to do completely automatic imports — no need for module, no need for use. I might even prefer that to only having module — which seems to ask for more than gives, at least until we had non-file modules.
That's probably the safest way to go here. It's also the easiest to implement, so would't be wasting much work if we decide to change later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nobody wants to weigh-in here :(
Yeah, one issue with these big discussions is that it's more difficult for folks to weigh in on some smaller but important issues — there are 114 comments — and it's difficult to know how much context is required for weighing in. I don't know a great way around it...
That's probably the safest way to go here. It's also the easiest to implement, so would't be wasting much work if we decide to change later.
Great, very much agree.
I also think that approach is the best way of getting more feedback — merge this with a path forward, and once we start building, folks will raise anything sufficient important...
Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
for more information, see https://pre-commit.ci
} | ||
``` | ||
|
||
## Example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vanillajonathan starting a new thread for the "import parent" discussion:
Surely this would be super restrictive — any sort of utils package wouldn't be possible, unless every file was in the same path? Are there any other languages that do this? (Or am I misunderstanding the point?)
C# and Java imports using namespaces so nothing can be from a parent directory. PHP lets you include from any file path including relative paths which may be parent directories. JavaScript (Node.js) lets you import from parent directories. I am not sure about Python, but I don't think it is common. I also worry about the security impact of importing from parent directories.
In the PHP world when using Composer it is common that imports from your own code is in your your_project/src/ directory, and third-party code from a your_project/vendor/ directory.
Ah, so I think that's where the misunderstanding is — I agree it might be dangerous to allow importing from a parent outside of the package.
Python allows for importing parents within the package — e.g. from ... import utils
, because the path is relative to the current file. IIUC Java allows for import utils
in a file src/foo/bar/baz.java
from a file src/utils.java
, because the path is relative to the package root.
So given this, I agree possibly we should downweight being able to import from a parent outside the package, which might require a definition of what a package is. (Though I'm not sure, let's see how the other discussions shake out; possibly it's a nice feature to be less tethered to the concept of a package)
``` | ||
|
||
## Example | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@max-sixty Another thread for "Opaqueness of idents".
(Bear with me, I need to explain all this to so I can explain how I'd change it)
Right now, when we infer that a table exists, we add its declaration to module default_db
(this declaration is updated when we discover new columns).
Using a made-up syntax:
module default_db {
table foo = ...
}
If a table name is segmented (i.e. my_db.my_schema.foo
), we infer this:
module default_db {
table `my_db.my_schema.foo` = ...
}
... essentially, the table name contains the dots, which is then problematic when actual dots are inserted into the name segments (i.e. my_db.`my.schema`.foo
).
Instead, I propose we infer this:
module default_db {
module my_db {
module my_schema {
table foo = ...
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, very much agree. that's what I'd been trying to do in my branch for #1535 — when it finds a table type of my_schema.foo
, it would create a module my_schema
in the root (though should actually be in default_db
, good point).
My branch is super-messy, but maybe I push it and you can criticize / guide it towards something helpful. I haven't worked on it in a couple of weeks now.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
web/book/src/internals/modules.md
Outdated
If the final declaration in a module is a variable declaration that is named | ||
`main`, then the leading `let main = ` can be omitted and expressed only by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the example — should this instead be "The final statement in a module is implicitly named main
"? In the example there is no variable declaration named main
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may be written poorly, I wanted to say that:
- if the last declaration in a module is a variable declaration,
- and it is named
main
, - it may omit the leading
let main =
.
It can also be named something else, but it needs to have the let x =
in this case.
Another way to say it, would be that:
Last unnamed pipeline in a module is implicitly named `main`
... but this implies that we can have unnamed pipelines elsewhere in a module.
(ps: I've been using term "declaration" as a cover term for variable, function and type declarations. We can switch that to "statement" if you feel it is more appropriate or future-proof)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the last declaration in a module is a variable declaration,
and it is named
main
,it may omit the leading
let main =
.
I think this is the sticking point... it seems like the conclusion is required for the premise, since it can't be named main
if it omits the leading let main =
...
(sorry if I'm misunderstanding — my guess is that this is actually a small point of writing rather than some fundamental disagreement)
Last unnamed pipeline in a module is implicitly named `main`
... but this implies that we can have unnamed pipelines elsewhere in a module.
How about a "An unnamed pipeline as the module's final declaration is implicitly named main
" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see the problem! In my mind, the last pipeline remains named main
even if you remove the leading let main =
. That may be confusing, yes.
How about a "An unnamed pipeline as the module's final declaration is implicitly named main" ?
That is still implying there are unnamed pipelines.
Last variable declaration in a module can omit the leading let main =
and acquire an implicit name main
.
(yes this appears to be a minor point in writing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last variable declaration in a module can omit the leading
let main =
and acquire an implicit namemain
.
Perfect
(nitty: I'd use "final" rather than "last" as it's less ambiguous, though here it's fairly clear given the "in a module" modifier)
I think that we have a consensus on the outline for modules. There are things left to discuss, but we can get to them in separate issues. |
Great work! |
A crude draft on module and multiple queries per file. It's much more concrete than the type system one.