Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submodules in different files #13524

Closed
mppf opened this issue Jul 23, 2019 · 34 comments
Closed

Submodules in different files #13524

mppf opened this issue Jul 23, 2019 · 34 comments

Comments

@mppf
Copy link
Member

mppf commented Jul 23, 2019

This is a proposal for supporting submodules in different files that was pulled out of
#12923 (comment)

It is an alternative to #10946 and #10909 and is expected to resolve #8470.

The basic idea is that next to a Chapel file such as M.chpl (which contains module M) one can place a directory M/ which contains modules that will be compiled as submodules within M.

Example and Details

Directory Layout:

  main/
    main-module.chpl # Uses M
  M.chpl
  M/
      L.chpl
      L/
         K.chpl

Compilation of Main Module:

chpl main/main-module.chpl M.chpl
  • The compiler would implicitly make the modules in M/ available to code in M.chpl (just as they would be with submodules within M). As a result, M.chpl could have a call like L.foo() which would be allowed in M.chpl even without a use statement. M.chpl would need to include a use statement if it wanted to write the call to L.foo() as foo().
  • However these submodules would not be visible to code that uses M unless M.chpl also included public use L or similar (note that today use L is the same as public use L but that may change).
  • main-module.chpl would not be able to use L or to refer to it unless M.chpl includes public use L, just as it cannot refer to a private submodule.
  • Lastly, the compiler would consider L to be a submodule of M for privacy / scoping purposes. In particular that means that code in L can refer to private things in M.

Example contents of the files:

main/main-module.chpl

use M;
mFunction();

M.chpl

module M {
  proc mFunction() {
    L.lFunction(); // L is implicitly visible in M (but see #13536)
  }
}

M/L.chpl

module L {
  proc lFunction {
    K.kFunction(); // K is implicitly visible in M (but see #13536)
  }
}

M/L/K.chpl

module K {
  proc kFunction { }
}
@lydia-duncan
Copy link
Member

Would we allow submodules defined in this way to override their privacy by defining themselves as

public module L { ... }

?

Would it be confusing for the default privacy of a declared module to change based on its location in a directory structure? Or would this be sufficiently distinct for us to reasonably assume that users will understand the difference in behavior?

@mppf
Copy link
Member Author

mppf commented Jul 23, 2019

Would we allow submodules defined in this way to override their privacy by defining themselves as public module L { }?

I would expect that if the user wanted to make the submodule public, they would public use L within M.chpl.

Would it be confusing for the default privacy of a declared module to change based on its location in a directory structure? Or would this be sufficiently distinct for us to reasonably assume that users will understand the difference in behavior?

I'm not sure it will be different longer term, because I think we'll move towards private-by-default. Also, I don't understand how to make something that is already public become private, but I do understand how to make something private become public, by re-exporting it with public use.

@lydia-duncan
Copy link
Member

I do understand how to make something private become public, by re-exporting it with public use.

Private symbols are always ignored outside the scope where they would be visible. I didn't do anything to public uses to override that privacy, and modules aren't treated differently than other private symbols.

@mppf
Copy link
Member Author

mppf commented Jul 23, 2019

@lydia-duncan - is there any way a user can override how public these submodules are? Is the only way for their module declarations to include public/private?

Re-exporting with public use would be a thing we could support, not something we can do now.

@lydia-duncan
Copy link
Member

I believe the only way is for the declaration to include public/private.

I don't know how I feel about public use overriding private modules - I'll make a separate issue and we can discuss it there :)

@lydia-duncan
Copy link
Member

I opened #13528

@mppf
Copy link
Member Author

mppf commented Jul 23, 2019

Would we allow submodules defined in this way to override their privacy by defining themselves as public module L { }?

At least until we have a better strategy, I see no reason not to allow users to do this (and to use the strategy to make the module private or public). Syntactically and conceptually it is reasonable (by analogy to submodules that aren't in separate files). It just looks a little strange (since the public/private refers to M but appears in L.chpl).

@BryantLam
Copy link

BryantLam commented Jul 24, 2019

Working example from Rust.

# rustc 1.36.0 (a53f9df32 2019-07-03)
.
├── M.rs
├── M
│   └── L.rs
└── MyPackage.rs
// M/L.rs
pub fn fn1() { println!("fn1"); }

// M.rs
pub mod L;

// MyPackage.rs
pub mod M;

fn main() {
    M::L::fn1();
}

@mppf
Copy link
Member Author

mppf commented Jul 24, 2019

@BryantLam - thanks for clarifying about Rust here and over on #13528 (comment)

We could certainly consider allowing public module L; / private module L; here to control the visibility of L within M. On this topic @bradcray said earlier

That said, I have to admit that I'm not crazy about Michael's counterproposal:

module M {
  private module L;
}

As Chapel stands today, I interpret this as: "I'm defining a private module named L. It has no body / contents" (similar to how extern proc foo(); has no body). Nothing about this statement (as compared to the current form private module L { ... } suggests to me "look around the file system for something that defines a module named L and inject its contents here." To me, it would be surprising if such a concept did not name a file.

However the situation in this issue is slightly different. We can makeL always visible within M (by nature of being stored in M/L.chpl and so being a submodule). The declaration private module L; could simply change whether L is public or private within M. I think this is an interesting alternative to L.chpl containing private module L { ... } or public module L { ... } because it is clearer that the public/private apply to M.

Another direction we could go with it is to make package privacy an option. However this doesn't really help with the visibility of L itself:

  • if we chose to make the submodules default to package visible, we'd want a way to override that still
  • if we chose to make the submodule default to private, we'd want to a way to override that...

I wrote earlier, assuming that M.Detail is a submodule (like L in this issue).

But either way, if there is some module M that wants to also export M.Detail, it would need to public use M.Detail; (likely it would "use only" but IIRC we are thinking about changing that default). If did private use M.Detail or just used things like M.Detail.someFunction() (with no use of Detail at all), I would not expect that M.Detail would be available to code using M.

That leads me to wonder if it would be good enough to rely on that property to control whether or not Detail is exported at all from M. In that event, functions eligible for export in Detail would be marked public, but they wouldn't necessarily be available if Detail were not exported or if the functions were not included in a public use bringing in symbols in addition to the module name.

This gets to what I think is an interesting clarification. If M/L.chpl makes L a submodule of M, I think we'd want it to default to not making L always visible to users of M. That's kindof like saying private module L { ... } would be the default.

From that point, since M is in charge of the visibility of the symbol L itself (and not all the contained symbols), I think that public import L; can and should make L visible to users of M. (This is different from #13528 because right now I am only talking about such a public import making the symbol L itself visible to users of M - I'm not talking about re-exporting private functions etc.) The point here is that M is in charge of the visibility of L within it, but privacy of functions/types/etc within L should continue to be respected.

We could also consider enabling this behavior for public use L only; but my concern with enabling it for public use L; or public use L only someFunction would be that those statements are trying to make available specific functions within L but not the fact that they are implemented in L itself. If we chose to make public use not make the module symbol itself re-exported, while import did, that seems like something that might be a useful difference between import and use. This seems pretty intuitive to me. (link to the issue about import - #13119).

@mppf
Copy link
Member Author

mppf commented Jul 24, 2019

If we required one to use or import submodules before accessing them like L.someFunction(), that would give us a place to indicate the visibility of L within M. This option is discussed in #13536.

@mppf
Copy link
Member Author

mppf commented Jul 24, 2019

At this point, in answer to @lydia-duncan's question about how one would indicate the public/private visibility of a submodule, I think there are currently 3 proposals:

  1. L.chpl indicates its privacy in M, e.g. L.chpl contains public module L { } here
  2. M.chpl can override the privacy of L e.g. with public module L; 1st half in here
  3. M.chpl can override the privacy of L e.g. with public import L; 2nd half in here and related to Should calling functions in a submodule requires a use? #13536

@ben-albrecht
Copy link
Member

ben-albrecht commented Jul 26, 2019

To clarify , would proposals (2) and (3) remove the need for defining submodules as public vs. private, since the privacy is defined at the point of import/usage? Or can the privacy of the submodules be defined at both the submodule definition and the import line?

If this is still an open question, I'd advocate for privacy to be determined at one point rather than mixing. In that case, I would think of (2) and (3) as defining the privacy rather than overriding the privacy of submodule L.

I don't have a strong preference between the 3 proposals, but I do lean towards (2) and (3) since it potentially eliminates the need for submodules to have a private/public state.

Do any of these proposals generalize better to the case of any submodule rather than just a submodule in a file (the focus of this design issue)? Ideally, users shouldn't have to think about special rules for dealing with submodules in a file.

@lydia-duncan
Copy link
Member

would proposals (2) and (3) remove the need for defining submodules as public vs. private, since the privacy is defined at the point of import/usage

No. Private still impacts whether or not a symbol is generated in documentation, and that is absolutely not something a use or import should have control over (especially since one use/import could be public and another private for the same module)

@lydia-duncan
Copy link
Member

Ideally, users shouldn't have to think about special rules for dealing with submodules in a file.

I agree with this

@bradcray
Copy link
Member

In the OP, what kind of code should I imagine to be within M.chpl and L.chpl (for things to work as expected)?

What should I expect to happen if the contents of M.chpl are this:

M.chpl:

module P {  // Whoops, M.chpl doesn't define a module named M!
  writeln("In P");
}

Would it still consider M/ to define submodules for M given that the filename and directory name match? Or would it expect it not to since the module name and directory name do not?

@mppf
Copy link
Member Author

mppf commented Aug 14, 2019

In the OP, what kind of code should I imagine to be within M.chpl and L.chpl (for things to work as expected)?

I was imagining that bothM.chpl would have to contain a module M declaration and L.chpl would also contain a module L declaration. However it might be interesting to consider relaxing that (especially for L.chpl). I would imagine relaxing it would take the form of supporting implicit modules.

I will update the issue description to include some code in the example.

What should I expect to happen if the contents of M.chpl are this:

M.chpl:

module P {  // Whoops, M.chpl doesn't define a module named M!
  writeln("In P");
}

Would it still consider M/ to define submodules for M given that the filename and directory name match? Or would it expect it not to since the module name and directory name do not?

I'd make this case an error. I think such a pattern is more likely to represent confusion (as you were hinting) and generally speaking I think we should get to a style with one module per file (possibly containing submodules). I think for this pattern specifically (submodules in different files) we should enforce that.

I.e. I also would expect this to be an error:

M.chpl:

module M { }
module Q { }

and likewise this

L.chpl:

module P { }

or this

L.chpl:

module L { }
module X { }

@lydia-duncan
Copy link
Member

I'm not in favor of enforcing the rule in Michael's latest comment. Allowing the module name to differ from the file name permits programs that swap different implementations in depending the situation. I have a vague recollection of several user codes doing this (maybe Brian Dolan's, maybe the HPO program).

Removing that option forces the user to copy their code for the particular implementation into a different file every time they build a different version, instead of just altering which file they include on the compilation command. This seems unnecessary and like it would be annoying to deal with.

@mppf
Copy link
Member Author

mppf commented Aug 16, 2019

@lydia-duncan - To be clear, I was proposing that rule specifically for modules involved in submodules-in-different-files. That is, M.chpl would have the rule because it has a directory M/ in the same directory. M/L.chpl would have the rule because it is using the submodules-in-different-files strategy to be a submodule of M. I was not proposing that this rule apply globally to all modules (that would be fine with me personally, but I think it's a different issue).

Removing that option

I don't see how using different file names would be possible for submodules-in-different files even without this rule? Especially not for M/L.chpl since we do not provide M/L.chpl on the command line (if we did, would it be submodule of M anymore?).

For M.chpl itself there is a different problem - suppose it contained two modules, then are the submodules in M/*.chpl submodules of both?

If it contained just one module, it might be possible for it to have a different name, but I still think that'd be more confusing than useful.

Allowing the module name to differ from the file name permits programs that swap different implementations in depending the situation.

I think a more reasonable way to achieve this, whether we have the rule above or not, would be to have different directories containing different implementations that use the same file name (and module name).

Modules stored in files with different names confuse the compiler's current ability to find such modules, so they have to be named on the command line. But the thing that bothers me more about them is that they confuse people looking at the source code. Having a two directories, say A/ and B/, both containing M.chpl (say), and then selecting between them with -M A or -M B does not have this problem and seems to me to be just as effictive.

@lydia-duncan
Copy link
Member

To be clear, I was proposing that rule specifically for modules involved in submodules-in-different-files.

That's fair, though I think different behavior on this front will be a bit confusing (but that's sort of the nature of the beast).

I don't think this has been covered already, what are your thoughts on the case where the file name doesn't match a directory, but the (top-level?) module name does? E.g.

M/L.chpl
P.chpl:

module M { ... }

or P.chpl:

module Outer {
   module M { ... }
   ...
}

@mppf
Copy link
Member Author

mppf commented Aug 16, 2019

what are your thoughts on the case where the file name doesn't match a directory, but the (top-level?) module name does?

I think those should be errors.

@bradcray
Copy link
Member

I'm also not crazy about the proposed "one module per file" and "module names must match their filenames" proposals, at least as language rules that the compiler would create errors for. I definitely could imagine them being reasonable rules of thumb or style suggestions/requirements for a specific context, like a standard library. Specifically, I think there's been practical value in writing Chapel that has multiple modules in a file (e.g., for use with TIO and writing tests, if nothing else). I also tend to think that there shouldn't be too rigid a relationship between filenames and modules, such that I should be able to take your code, type it into a filename of my choosing (or feed it to the compiler via stdout?) and compile it without complaints. As a specific example, TIO users have no control (that I've found) over what filename is used for the source code. I also think it's nice to be able to establish filename conventions like M-1.0.0.chpl and M-2.0.1.chpl or M-blc-hack.chpl to define different versions of a module M without having that choice result in an error. More generally, it seems like an unnecessary redundancy and something you can only get wrong / mess up without obvious benefit (aside from the organizational / clarity one which is why I'm OK with it as a rule of thumb, just not part of Chapel's definition / requirements).

@bradcray
Copy link
Member

Popping up a level: Stewing on this overall proposal the past week, I think the main things that give me pause about it are how implicit the behavior is and how dependent on the layout of the files in the directory structure it is. For example, if you handed me a printout of the files in question so that I could recreate your work locally, nothing about the printouts says to me that L is considered a submodule of M or that if I were typing all the code into emacs to reproduce the example myself, that I should be sure to put L.chpl into a subdirectory within the same directory as M to make things work properly.

So, off the top of my head, let me think about how I would tweak the proposal, if I were championing it, to try and address these concerns.

I think first, I would have the default "submodule directory name" probably be something more like M.submodules or M-submodules or M-lib or M-mods or M-submods rather than simply M, both to be more descriptive, and in acknowledgement of the fact that chpl M.chpl will try to create a file named M by default which will conflict with that directory (I get that, in this instance, M isn't meant to be a main module compiled on its own, but some library modules we've written have supported the ability to self-test themselves when compiled as the main module, and someone is bound to try and compile the file in that way eventually, wise or not...). But I think the "M/ just isn't very descriptive" concern is probably the bigger concern for me even though it didn't take as many words to say.

I'd probably also have the M in M-submods refer to the module name being processed over the source code's filename. That is, if P.chpl contained module M { ... } I think I'd have the compiler look for M-submodules/ rather than P-submodules/. I think that this is so that, again, I could have multiple copies of a file defining M that all share the same directory of submodules (M-1.0.chpl1, M-bugreport.chpl, M-hack.chpl) rather than having to replicate the directory of submodules every time I needed to create a new clone of the M.chpl file to match its filename.

Next, I think I would put some sort of indicator into a module's source code to indicate that I wanted to invoke this "treat-all-modules-defined-within-a-given-specially-named-subdirectory-that-happens-to-live-adjacent-to-this-file-as-submodules-to-this-module" rule, so that if I were reading printouts of the code, I'd have some sort of visual cue that the intention was to invoke this feature to make additional submodules available. This would also save me from surprises if I happened to have a directory of Chapel source that matched my module name which I didn't intend to get parsed and compiled in this way. As a dumb example, if I saw something like this:

M.chpl:

module M {
  inject submodules;
  ...
}

it seems like it would clearly indicate that some submodules were being injected here by the compiler (though I might need to go read the documentation to understand exactly what was happening... but at least I'd have something to go search on when reading unlike in the OP?). Of course, if I was given this as a printout, it still wouldn't tell me what was being injected nor where I should arrange to save the L.chpl file, etc., but since I think that's part of the point of the proposal, I'll try not to get hung up on it for now.

That said, submodules is admittedly an awkward keyword and not really adding any real value here. So what else could I type?

  • inject auto; (similar problems, and we don't currently reserve auto, though maybe we would want to for other reasons...? I couldn't find any existing reserved words that felt "right" as an indicator that something special was getting injected)
  • inject module; (doesn't really scan, English-wise)
  • auto_inject_modules (ugh, underscores in a keyword?)
  • inject _; (vague?)

So then, feeling stuck, my head goes to:

  • inject "M.submodules"; // name the directory to use

or perhaps:

  • inject "M.submodules/"; // indicate more clearly that it's a directory whose contents we're injecting

or even:

  • inject "M.submodules/*"; // to indicate that we want all files from the directory

This approach embeds a directory location in the source code, which I think @mppf objected to on principle in earlier discussions of the include statement. But arguably, so does @mppf's proposal as it is evolving in this issue since it essentially says that the filename and module and directory must all share the same name; so if the module is declared via the preferred style in code using module M { ... }, we've effectively embedded the name of the directory M or M.submodules or whatever in source code as well, just indirectly rather than directly.

So if we ignore that concern momentarily, it seems to me that this approach has other advantages:

  • it makes explicit in the code that we're injecting submodules
  • it scans pretty well in English
  • it permits the directory of submodules to be named whatever we want it to be rather than requiring it to be M, M.submodules, M-submodules or whatever (where you may not like my choice of default, or we may never come up with something I like either).
  • it permits us to inject submodules from multiple directories by naming multiple directory locations in an inject, or using multiple inject statements.

It seems like we might still want some way to say "inject modules from the directory that shares my modulename or my filename (or is based off of it)", but perhaps we can do that using some special param/reserved string symbol name rather than using another keyword (as I was originally trying to do with my auto or module proposals above). I.e., inject __BASE_FILENAME__; inject __MODULENAME__; or whatever identifier you like. Or maybe just inject autodir; where it's understood that autodir means "use the default directory name" (whatever that ends up being).

I would imagine supporting public inject ... and private inject ... as a way to indicate whether the injected submodules were public or private by default (in the event that they didn't declare a visibility on their own module declarations... but what should happen if they did? Override? Complain? I guess my tendencies go towards "complain").

Of course at this point, my head goes to things like "What if, rather than specifying a directory name whose files we should blanket inject as modules, we could name a specific filename instead?" e.g., inject "M-submodules/L.chpl"; That would permit us to be even more precise about what we're injecting into this scope and would make my printout of the source code even clearer about what's happening (of course, it would also embed more specific filenames into the source code, which is why the "whole directory" version remains important.

And then we could even rename inject to something like input or include at which point it would seem less like something weird and new that we were creating from scratch and more like a traditional language construct.

[Note: I promise that in starting into this brainstorming, I honestly didn't set out to try to end up proposing a variant of include—albeit one that can read in a whole directory of files while also providing the include that I've wanted myself—nor was I striving to end up here. I honestly tried to take the proposal as presented, think about what about it gave me cold feet, and work on improving those aspects. So, before dismissing this whole comment based on the fact that I did end up here given that some of you are very opposed to supporting include, let me know where in this train of thought things stop making sense for you and start seeming less valuable / powerful than the OP (assuming you were in favor of it).]

[[I'm honestly truly nervous about hitting the "Comment" button because I feel like anytime I mention include, the quality of the discussion takes a nosedive... I understand that #includes have problems in C/C++ and are open to abuse. But I remain unconvinced that a language-level (non-preprocessor) statement shares those same downsides. And, frankly, I think almost any language feature is open to abuse if you don't use it in a principled way. E.g., imagine the ways I could drop a file into Michael's M/ directory in the OP that would break existing code in surprising ways...]].

@mppf
Copy link
Member Author

mppf commented Aug 20, 2019

As a specific example, TIO users have no control (that I've found) over what filename is used for the source code.

Right, but TIO users also could not make a directory for submodules-in-different-files.

I think first, I would have the default "submodule directory name" probably be something more like M.submodules or M-submodules or M-lib or M-mods or M-submods rather than simply M, both to be more descriptive, and in acknowledgement of the fact that chpl M.chpl will try to create a file named M by default which will conflict with that directory ... But I think the "M/ just isn't very descriptive" concern is probably the bigger concern for me even though it didn't take as many words to say.

Sure. I think using M.submodules/ instead of M sounds good.

I'd probably also have the M in M-submods refer to the module name being processed over the source code's filename.

That would be fine with me.

inject submodules;

I'm not opposed to some syntax like this to indicate the feature is being used but I think we could come up with better syntax.

inject "M.submodules/";
it permits the directory of submodules to be named whatever we want it to be rather than requiring it to be M, M.submodules, M-submodules or whatever (where you may not like my choice of default, or we may never come up with something I like either).

I'd prefer that the directory hierarchy always matched the module hierarchy when using this feature. I want it to be done in a way that is more understandable than configurable. For that reason, I lean against allowing the name to be changed from M.submodules/ (or whatever we pick). If people want something else in practice, they can use symbolic links.

it permits us to inject submodules from multiple directories by naming multiple directory locations in an inject, or using multiple inject statements.

I don't see this as an advantage. It just makes code using the pattern more configurable and less understandable. I think we should aim for understability on this feature before configurability.

And then we could even rename inject to something like input or include at which point it would seem less like something weird and new that we were creating from scratch and more like a traditional language construct.

I don't know if this has been proposed before, but if we were certain that we wanted to be able to bring in one submodule at a time in different files (as you are proposing but which I am not certain is the approach we should choose), why wouldn't we simply write it as

// M.chpl
module M {
  module "M.submodules/L.chpl";
}

?

I'm saying that it seems to me that module is a better keyword for it if the file being included will always be treated as a module. (I think being able to include arbitrary structures in other files is one of the main criticisms of include over in #10909).

Anyway, to my eyes, this is more or less the same as

// M.chpl
module M {
  module L;
}

meaning that the compiler should go look for M.submodules/L.chpl (say). The difference between this and the version with the path has to do with preference for consistent/understandable/repeated patterns on the filesystem vs. more configurability IMO.

Perhaps it would be productive to discuss configurability vs understandability/consistency as regards directories and files? Certainly we could keep discussing specific proposals but it seems that we need to either choose one of these over the other or else to try to balance them? (Otherwise I feel we are having the same argument over and over again in different issues).

@lydia-duncan
Copy link
Member

I really like where Brad went with this - I felt like it addressed many of the things that were making me wary as well, though I am sure there are still improvements that can be made to it

@lydia-duncan
Copy link
Member

I could definitely get behind module L; as an improvement on the proposal

@BryantLam
Copy link

BryantLam commented Aug 22, 2019

It is paramount to understand my example output #13524 (comment) and why Rust implemented this feature the way it did.

  1. All files in Rust are modules.

    1. These problems do not exist: (1) P is submodule of M -- (2) L and N are submodules of M; no exceptions -- (3a) M and Q are submodules of M; (3b) P is submodule of L; (3c) L and X are submodules of L. Fundamentally, it is simple to learn (you do have to learn it) but there's insignificant complexity to it: a file is a module.
    2. The crate created by cargo new --bin MyPackage is built by passing the crate name to the compiler (cargo build --verbose). The command looks like rustc --crate-name MyPackage ..., which is very similar to chpl --main-module MyPackage.
    3. One level of indentation is removed by eliding the module declaration. This is a minor benefit learned from C++ namespaces. Examples from modern C++ style guides: Google, LLVM. Rust forces this behavior. By eliding the module declaration, you also reduce the amount of typing to declare a module redundantly from the name of the file — should the filename ever change, you don't have to change the module name; they are the same.
    4. For Chapel, I can concede that having the one, true module declaration in the file is an alternative, though I do like expousing the principle of only having one way to do something.
  2. Submodules are in a directory of the parent module's name.

    1. Directory hierarchy makes code organization a clear one-to-one mapping with the module hierarchy.
    2. Sibling modules are in the same directory level.
  3. Modules need to be declared in the parent module.

    1. This is a caveat from No.1 because the module cannot be be declared in the same-named module file. But...
    2. Visibilty of a submodule is controled by the parent module.
    3. The compiler does not have to scan the filesystem for submodules. Put another way, all submodules must be declared in the parent module so the compiler knows exactly what files to look for when it does have to go looking. (Aside: Rust visibilty used to be default private, so you had to declare public mod Submodule if you wanted your siblings to view its contents. But now that the default visibility is pub(crate), sibilings can access your submodules without the explicit declaration being necessary. As a result, Rust had a proposal to remove the requirement for needing to declare the module.)

These rationales are not unreasonable or incompatible with the Chapel language, compiler, or other tools.

For example, if you handed me a printout of the files in question so that I could recreate your work locally, nothing about the printouts says to me that L is considered a submodule of M ...

All proposals have downsides and (1) this concern can be mitigated with tree and use of top-of-file comments like any well-trained programmer should. Also, most printer drivers will print the filename in the header when given a text file. But (2) this concern is not important to me given its apparant advantages in forcing a packaging convention which is a higher priority than printablilty. I'll mention more in the next section, No.5 alternative.

I would have the default "submodule directory name" probably be something more like M.submodules or M-submodules or M-lib or M-mods or M-submods rather than simply M, both to be more descriptive, and in acknowledgement of the fact that chpl M.chpl will try to create a file named M by default which will conflict with that directory.

It is unfortunate that the name of the module directory and the default executable name are the same. Ways to mitigate:

  1. Mason does out-of-tree builds. This is my preferred solution and Mason needs to be the final solution.
  2. chpl -o for non-main modules. Still annoying as to what to name the executables.
  3. chpl -o for out-of-tree builds. This is a short-term practical solution.
  4. Name the submodule directory M.submod or similar. This is fine, but ultimately redundant/unnecessary in most situations, especially long-term.
  5. Least of all, another alternative, but it goes into a long digression and I don't like it.
(Click here for another alternative.)

Python has a packaging layout where the module directory includes an __init.py__ file that is the actual module itself. Other files in the directory are the traditional submodules. It does have gotchas.

Rust adopted this layout as its first design.

.
├── M
│   ├── L.rs
│   └── mod.rs
└── main.rs

module M is the file M/mod.rs and M/L.rs is still module L as a submodule of M. The problems with this approach, however, are:

  1. This slightly breaks the filesystem-is-module hierarchy because the module M (mod.rs) is next to all its submodules (L.chpl). It is just slightly harder to learn, but this isn't that much harder to learn because it'll always be named mod.rs or __init.py__. There is precedence and it resolves the naming problem by making all modules generate a mod executable, which isn't ideal.

  2. From a productivity standpoint, this layout is annoying to a developer. When you open up a file and your editor calls it mod.rs, how do you know what module is it? (This is especially true if you assume that all files are file-level modules so there's no module definition block.) The primary remedy is to get a smarter editor or IDE. It's not ideal; the practical remedy is to include top-of-file comments as you should. Rust learned from the Python design by providing an incrementally better layout in Submodules in different files #13524 (comment), the core of this issue's proposal.

  3. From the Python side, there are limitations because your module has to be defined in an "all at once" approach (the module and its submodules). They didn't find it appealing for a number of packaging-related challenges and have since adopted PEP 420 Namespace Packages to supplement it. That link has rationale, but more seriously, it has some discussion and objections against it, one of which is about performance and why the Python interpreter slams a filesystem whenever it loads a package.

I consider these two languages (and maybe JavaScript) the most successful at packaging today and they're both moving away from this version.

But I think the "M/ just isn't very descriptive" concern is probably the bigger concern for me even though it didn't take as many words to say.

Foreshadowing from below: Would you rather (a) learn that that M/ is where M.chpl/module M keeps its submodules or to (b) grep 'module M' src, grep for include, and then finally traverse the target of the include to see where to continue greping because what you're really looking for is a grandchild of M? I do the latter all the time. I'd much rather look for std/List.chpl.

I'd probably also have the M in M-submods refer to the module name being processed over the source code's filename. That is, if P.chpl contained module M { ... } I think I'd have the compiler look for M-submodules/ rather than P-submodules/.

Continuing the inference, the proposed solution wouldn't just look in P-submodules/; it looks in P-submodules/M-submodules/ for children of M.

I could have multiple copies of a file defining M that all share the same directory of submodules (M-1.0.chpl1, M-bugreport.chpl, M-hack.chpl) rather than having to replicate the directory of submodules every time I needed to create a new clone of the M.chpl file to match its filename.

Use a symlink. #10946 (comment)

.
├── M
│   └── L.chpl
├── M-1.0.chpl
├── M-bugreport.chpl
├── M-hack.chpl
├── M.chpl -> M-1.0.chpl
└── MyProject.chpl

(It's neat that tree output has symlinks.)

You could also introduce a new compiler flag: chpl --module-alias M=M-1.0.chpl. This solution isn't that different from today's approach of chpl MyProject.chpl M-1.0.chpl. Or the same solution of a symlink funnily enough.

However, let's say for some reason someone wanted to do this or to have two sibling modules to have identical submodules, a directory symlink would work, but that's hacky to me. The actual solution to that issue involves re-exporting the symbol with public import.

I think I would put some sort of indicator into a module's source code to indicate that I wanted to invoke this "treat-all-modules-defined-within-a-given-specially-named-subdirectory-that-happens-to-live-adjacent-to-this-file-as-submodules-to-this-module" rule, so that if I were reading printouts of the code, I'd have some sort of visual cue that the intention was to invoke this feature to make additional submodules available. This would also save me from surprises if I happened to have a directory of Chapel source that matched my module name which I didn't intend to get parsed and compiled in this way.

This situation is why forcing users to not learn two ways to do something is better. It so happens that the alternative to this proposal is an include-equivalent ("inject-any") that makes it actually more like learning N ways instead of just two.

On reading printouts, if a tree printout is provided, I don't think it's as hard to lay out the paperwork like the module hierarchy. Again, you do have to learn this is how Chapel structures code, but I don't think it's a difficult ask. In fact, it's trivial when the alternative is to scan all lines of code looking for includes to understand an unfamiliar piece of code. (This is assuming the last variant where the inject can inject any directory.)

[For inject-any], it seems to me that this approach has other advantages:

  • it permits the directory of submodules to be named whatever we want it to be rather than requiring it to be M, M.submodules, M-submodules or whatever (where you may not like my choice of default, or we may never come up with something I like either).

I am vehemently against the arbitrary naming of the injection target. My guiding principle is that learning a coding style became obsolete when a style checker like clang-format can enforce it. Force a naming, whatever it may be, to eliminate this cognitive burden.

  • it permits us to inject submodules from multiple directories by naming multiple directory locations in an inject, or using multiple inject statements.

I would imagine supporting public inject ... and private inject ... ...

And then we could even rename inject to something like input or include at which point it would seem less like something weird and new that we were creating from scratch and more like a traditional language construct.

This is effectively a re-export. Re-exporting (once implemented) solves this issue and is capable of re-exporting only some symbols or renaming the symbols.


There is one assumption in some of my counter arguments: files are modules. The overall rationale is sound because it is true in most situations, but there are some behaviors today that make it not true.

These current behaviors would have to be re-evaluated or changed. (This is for a different issue.)

Prototype modules are the default behavior for file-scope script-like code (no proc main). I don't expect significant impact to these users.

Success in packaging requires success in building, testing, etc. and by no means do I consider C/C++ to be successful at packaging what with the tools like gmake, CMake, autotools, SCons, gn, build2, Meson, Spack, and Conan all trying to solve one problem of how to find code when the organization is arbitrary. These tools usually do something explicitly declarative, which is build-engineer-level of work that cargo and pip don't have because the base languages forced a style of code organization.

To focus the discussion, what properties of this design are things you absolutely could not live with?

@bradcray
Copy link
Member

Both Michael's and Bryant's most recent responses seemed to lean on the importance of hierarchies of modules / directories-within-directories being an important driver of this proposal rather than just a single level of submodules as illustrated in the OP. Is that right? If so, Michael, would you consider extending the issue text to include multiple levels of submodules?

I've also been curious about whether there are any implications about non-.chpl file types in the directory hierarchy in this proposal. Would they be an error? Would the presence of a libfoo.a or foo.o result in the equivalent of a require "libfoo.a"; or require "foo.o"; in the code? Would the presence of a foo.h result in an extern { #include "foo.h" } in the code?

Right, but TIO users also could not make a directory for submodules-in-different-files.

Sorry, I thought you were saying that the compiler would always require a single module per file whose name matched the filename. If I'm understanding you, I now think you're suggesting that the filename==module name rule would only apply in the event of these automatically-injected directories/files—is that right?

if we were certain that we wanted to be able to bring in one submodule at a time in different files (as you are proposing but which I am not certain is the approach we should choose)

In case it wasn't clear, I wasn't counter-proposing that users would have to only bring in a single submodule at a time. They could definitely bring in an entire directory of files as well.

(I can't begin to tell you how often I weep over tree not being available on most of the systems I use... :'( )

@mppf
Copy link
Member Author

mppf commented Aug 23, 2019

Both Michael's and Bryant's most recent responses seemed to lean on the importance of hierarchies of modules / directories-within-directories being an important driver of this proposal rather than just a single level of submodules as illustrated in the OP. Is that right? If so, Michael, would you consider extending the issue text to include multiple levels of submodules?

Sure, I'll do that in a moment.

I've also been curious about whether there are any implications about non-.chpl file types in the directory hierarchy in this proposal. Would they be an error? Would the presence of a libfoo.a or foo.o result in the equivalent of a require "libfoo.a"; or require "foo.o"; in the code? Would the presence of a foo.h result in an extern { #include "foo.h" } in the code?

I would hope that the subdirectory would be available (e.g. to C compiler header or library paths) when compiling code in that subdirectory but other than that would not expect any of these to happen automatically. We could consider them as future improvements, though.

Right, but TIO users also could not make a directory for submodules-in-different-files.

Sorry, I thought you were saying that the compiler would always require a single module per file whose name matched the filename. If I'm understanding you, I now think you're suggesting that the filename==module name rule would only apply in the event of these automatically-injected directories/files—is that right?

Yes, that's what I'm proposing right now. (I like to talk about going further than that, but I don't want to muddy the waters in this issue. I'm intrigued by @BryantLam's suggestion that the implicit file-level module is always created unless there is a single module declaration matching the file name. But these things are almost certainly for another issue).

@mppf
Copy link
Member Author

mppf commented Aug 28, 2019

FWIW at the moment I like M/L.chpl more than M.submodules/L.chpl. I know that chpl M.chpl would try to produce an executable named M and that wouldn't be possible with a directory named M/ but AFAIK Rust has the same problem. What do they do? It appears that you simply can't compile a "main" for a module using this strategy (@BryantLam please let me know if I've done something obviously wrong in this example):

# rustc 1.35.0
.
├── M.rs
├── M
    └── L.rs
// M/L.rs
pub fn fn1() { println!("fn1"); }

// M.rs
pub mod L;
fn main() { }
$ rustc M.rs
error[E0583]: file not found for module `L`
 --> M.rs:1:9
  |
1 | pub mod L;
  |         ^
  |
  = help: name the file either L.rs or L/mod.rs inside the directory ""

error: aborting due to previous error

For more information about this error, try `rustc --explain E0583`.

In other words, Rust simply doesn't allow a module with a submodules-subdirectory to be compiled by itself. I suppose it might view these as always being libraries.

I like naming the submodules-directory M/ more than M.submodules enough that I'd prefer we do one of the following:

  • Just don't allow compiling M.chpl as the main module if there is an M/ (as Rust does)
  • Adjust the default executable name and print a warning ("Sorry, I couldn't create an executable M because there is already a directory there. I went ahead and saved your executable in M.run but you might want to supply a -o argument."). Note that this would also help with my repeated personal problem of trying to compile test.chpl in $CHPL_HOME...

However if I'm the only one with this view, I could live with M.submodules.

@BryantLam
Copy link

BryantLam commented Aug 29, 2019

It appears that you simply can't compile a "main" for a module using this strategy

That's correct, though I attribute it to a technical design decision with the initial Rust modules design that should be corrected. Refer to my previous long-form #13524 (comment) inside the "(Click here for another alternative.)". In lieu of repeating myself, the alternative's design of "the module hierarchy not equaling the directory hierarchy" is occurring in your code and in my original working example from Rust in #13524 (comment) for only the main module.

Specifically, the MyPackage.rs main module is declaring a submodule mod M, but the M.rs file is in the same directory level instead of the MyPackage directory. This is a special case similar to searching for lib.rs (for a library instead of a binary crate) and mod.rs (for modules).

When trying to define a fn main() inside M.rs, the same behavior is expected, so module L is being searched for at the same directory level, which is a mismatch from the proposed Chapel user's expected behavior of looking at M/L.rs.

(This is why I didn't like that original design / the No.5 alternative to begin with. It's confusing.) There's a few ways to resolve this inelegance:

  1. Apply a strict interpretation of the module hierarchy. E.g.:

    $ tree -U .
    .
    ├── MyPackage.chpl
    └── MyPackage
        ├── M.chpl
        └── M
            └── L.chpl
    

    This is not ideal because it increases the directory depth by 1, but is completely valid and remains an option should users prefer this path. It has one benefit in that the main module is usually in the root of the relevant tree.

  2. Do not declare submodule M inside the main module. In other words, the mod M should just be something else, like a use or import statement so that module M.chpl can be a proper sibling module instead of main's submodule.

    • This is my preferred solution and will likely be enabled by Option 1 in Calling functions in a module without any use #13523.
    • There is still the problem of how to find M when doing use or import. I think you refer to this as the chicken-and-egg in Calling functions in a module without any use #13523 (comment).
      • Rust's current approach is to declare the dependency in the Cargo.toml file and the rustc compiler is invoked with the correct module-path-like flags to make it so. One benefit is that you can rename a dependency in the Cargo.toml file without having to change the crate name inside actual source code to e.g., use an alternative implementation.
      • Also, this gets into the bigger design discussion of where to start searching for modules. If I had my way, the compiler would see use M as a hint to only look in the directory of the current module's file to see if there's an M.chpl. Doing anything else such as searching through the module path would require a different mount point from the root of the module hierarchy, such as use /D for other, external packages. It's easy to understand, but likely not palatable for backwards compatibility in Chapel today, so considerations should be made to how module searching is actually performed.

@BryantLam
Copy link

However if I'm the only one with this view, I could live with M.submodules.

I could live with it too, though I really push that we should name it M. Calling chpl -o isn't that hard to understand versus having to type and see M.submodules or submods everywhere for all time. This is especially true when out-of-tree builds are going to be more common in the long-term.

For reference, Rust gets around the problem of having to "compile test binaries from main" because there's a test infrastructure in cargo. In fact, mason supports out-of-tree building for tests today via the test/ directory. (Rust does support building multiple main modules with a [[bin]] target if truly desired.)

@mppf
Copy link
Member Author

mppf commented Oct 4, 2019

Maybe we already know this but...

Let's suppose for a moment that we chose to require use/import for submodules (discussed in #13536). Then there would be a statement upon which to put the visibility of the submodule. Excerpting and adjusting the example from the top, where we are using M.chpl and M/ to make a submodule with M/L.chpl:

M.chpl

module M {
  import L; // make L visible here; public/private control its visibility elsewhere
  proc mFunction() {
    L.lFunction();
  }
}

M/L.chpl

module L {
  proc lFunction { ... }
}

This seems to make the submodule-in-different-file behavior less implicit.

@mppf
Copy link
Member Author

mppf commented Feb 5, 2020

Following my thoughts in #14407 (comment) - I am currently thinking that e.g. private use and private module { } have two different meanings of private. The first is strictly about visibility.

The impact to this issue is that I now have a potentially better answer to the question about how to indicate public/private of a submodule and how to indicate if it is visible outside of the module. To recap this question:

At this point, in answer to @lydia-duncan's question about how one would indicate the public/private visibility of a submodule, I think there are currently 3 proposals:

  1. L.chpl indicates its privacy in M, e.g. L.chpl contains public module L { } here
  2. M.chpl can override the privacy of L e.g. with public module L; 1st half in here
  3. M.chpl can override the privacy of L e.g. with public import L; 2nd half in here and related to Should calling functions in a submodule requires a use? #13536

I'm currently thinking that 1) or 2) are reasonable answers. In contrast, for 3), public import L; vs private import L; merely control whether or not L is visible outside of M but do not affect whether or not L itself or its contents are public or private.

@mppf
Copy link
Member Author

mppf commented Mar 26, 2020

PR #15279 implemented something close to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants