Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple-defined module symbol has shadowing instead of error #14014

Open
BryantLam opened this issue Sep 5, 2019 · 22 comments
Open

multiple-defined module symbol has shadowing instead of error #14014

BryantLam opened this issue Sep 5, 2019 · 22 comments

Comments

@BryantLam
Copy link

BryantLam commented Sep 5, 2019

Forked from #11262 (comment)

This code prints "42" regardless of ordering for the two use statements in block scope. I think it should be an error since N is multiple-defined.

// 4.chpl
module M {
  module N {
    var x = 100;
  }
}

module N {
  var x = 42;
}

{
  use N only; // It doesn't matter if first or second.
  use M;
  writeln(N.x); // Always prints 42.
}

chpl version 1.20.0 pre-release (8cb22586)

Note that in chpl version 1.19.0, this code always prints "100" instead. [TIO]

Edit: From Brad's explanation in the next post, this code is supposed to always print "100".

@bradcray
Copy link
Member

bradcray commented Sep 5, 2019

The reason for this is due to the current interpretation of use which effectively puts the symbols made available by the use into a scope just outside of the current lexical scope and the module symbols themselves into a scope just outside of that.

Bringing over a comment from #11262:

The use M and use N only; statements drop their contents into the scope just outside of the use (putting M.N and [nothing] into the scope respectively); and they drop their own names into the scope just outside of that (putting top-level M and N there). Sketching that out:

use M;
use N only;
...N.x...
...N.fn()...

results in:

{ // scope making module symbols themselves available
  ...module M...
  ...module N...
  { // scope making module contents available
    ...module [M.]N...
    ...<nothing from N>...
    {  // scope in which the `use` statements appeared
      ...N.x...
      ...N.fn()...
    }
  }
}

So then when the compiler resolves N.whatever, it sees module M.N as being closer to scope than top-level N.

Interestingly, I tried dumping both sets of symbols into the same scope as part of #13930, but this resulted in problems when a module and its symbols shared the same name, which is what made me ask how important it was in #13925. It seemed that this was pretty important/standard practice in other languages which is why I reverted to the traditional interpretation above.

@BryantLam
Copy link
Author

I think this code demonstrates non-intuitive behavior with how module symbols and module contents can confusingly shadow, but I attribute the behavior to use statements and I agree that import statements will intuitively work in almost all cases.

I haven't thought of a way to create a conflict with import+use statements using re-exports #13979 yet..

@bradcray
Copy link
Member

bradcray commented Sep 6, 2019

It's interesting to me to note (if I'm not making any mistakes) that if (a) one were to always utilize use statements in the defensive / "import nothing" mode (i.e., use M only;) and (b) we required sub-modules to be used before they could be referenced like top level modules (i.e., issue #13536) then this program would give the behaviors you want (which arguably is similar to say that import statements will be more intuitive, but also maybe suggests an obvious answer to #13536... and arguably is similar to the one that @mppf was suggesting when both modules were named M.

Specifically:

  use M only;
  use N only;  // can only refer to top-level N due to `use M only;`
  writeln(N.x);  // can only refer to top-level N.x or 42
  use M only;
  use M.N only;  // obviously refers to M.N
  writeln(N.x);  // can only refer to M.N.x or 100 because we haven't `use-d` submodule `N`
  use M only;
  use N only;
  use M.N only;  // error: symbol `N` is multiply defined                                                             
  writeln(N.x);

That takes me more off the fence on the "must submodules be used" issue than I had been before (in favor of requiring them to be used). Now I'm wondering how hard it would be to extend #13930 to apply to submodules as well (I'm guessing "not very", but it's late in the week...).

@bradcray
Copy link
Member

This example continues to haunt me... thanks for contributing it, Bryant. I continue to think it's reasonable to say "imports will handle this better", but it points to some problems with use statements that could potentially be improved upon as well (and I think, should).

First, it haunts me because, though you didn't say it, it points to a potential example of hijacking through submodules. Let's say that my code had started out as a program that used two library modules:

module M {
  proc bar() {
    writeln("Computing bar()");
  }
}

module N {
  proc foo() {
    writeln("Computing foo()");
  } 
}

module Main {
  proc main() {
    use M;
    use N;
    foo();
    bar();
  }
}

but then later, the implementer of M adds a new sub-module N:

module M {
  proc bar() {
    writeln("Computing bar()");
  }
  module N {
    proc foo() {
      writeln("Hijacked!");
    }
  }
}

module N {
  proc foo() {
    writeln("Computing foo()");
  } 
}

module Main {
  proc main() {
    use M;
    use N;
    foo();  // this now calls the hijacked version
    bar();
  }
}

While I've rationalized above why use N chooses to prefer the submodule over the top-level one and why this is consistent with use being designed as a programming-in-the-small "bring everything in by default" concept, that doesn't make it any less surprising if your code changes out from under you.

This makes me wonder whether, more as Bryant's original intuition was saying, if a use could be resolved to multiple modules in the parent lexical scopes or through symbols made available by use, the programmer should get an "ambiguous modules, but I picked this one" warning or simply an "ambiguous modules" error, forcing them to resolve the conflict by filtering or renaming. This could still arguably be annoying (a change in a library could require a change in your code), but at least wouldn't lead to surprises. It has the downside of continuing to make module symbols more special than other types (which seemed to be a negative to Bryant in the original comment that spawned this issue?), but would treat them somewhat more like function resolution where multiple candidates would be considered if they were available. That said, I might say that resolving a use to an obviously local-to-scope symbol would not result in an ambiguity under assumptions that the module author would be aware of their own module's contents (or should be) and that it isn't a case that could result in hijacking.

Also interesting to me is that there isn't any way to say use rootScope.N to qualify a use as being intended as top-level, say if you wanted to disambiguate the use N to refer to the top-level module rather than the sub-module. The only way to ensure you are referring to the top-level module would be use M only [bar]; to avoid bringing M.N in, or to rename M.N via something like use M only N as MN;. That makes me wonder whether there ought to be some symbolic way to refer to the root / program scope in code.

I'm going to reopen this issue because of the unease it's causing me.

@bradcray bradcray reopened this Sep 10, 2019
@mppf
Copy link
Member

mppf commented Sep 10, 2019

if a use could be resolved to multiple modules in the parent lexical scopes or through symbols made available by use, the programmer should get an "ambiguous modules, but I picked this one" warning or simply an "ambiguous modules" error,

👍

Also interesting to me is that there isn't any way to say use rootScope.N to qualify a use as being intended as top-level,

I think use /N or use ::N or use ^N or use _.N are all brain-stormy ways of writing that but that probably deserves its own issue.

@lydia-duncan
Copy link
Member

lydia-duncan commented Sep 10, 2019

(Edited for outdated comment)

My initial thought was to use the file name as the module prefix to reference the top level N, given that we insert that as a wrapper when there is other top level code. This doesn't work when only modules are defined at the top level of the file, but does suffice when there is a non-module symbol at the top level. E.g. this version of "foo.chpl" compiles:

module M {
  proc bar() {
    writeln("Computing bar()");
  }
  module N {
    proc baz() {
      writeln("Hijacked!");
    }
  }
}

module N {
  proc baz() {
    writeln("Computing baz()");
  } 
}

module Main {
  proc main() {
    use M;
    use foo.N; // prefix with file name
    baz();  // this still calls the original
    bar();
    whee();
  }
}

proc whee() {
  writeln("in whee");
}

But this does not:

module M {
  proc bar() {
    writeln("Computing bar()");
  }
  module N {
    proc baz() {
      writeln("Hijacked!");
    }
  }
}

module N {
  proc baz() {
    writeln("Computing baz()");
  } 
}

module Main {
  proc main() {
    use M;
    use foo.N; // prefix with filename
    baz();
    bar();
    whee();
  }
}

Maybe we should consider always allowing reference to the file name for disambiguation in this way?

@lydia-duncan
Copy link
Member

I agree with the ambiguous module warning idea

@BryantLam
Copy link
Author

simply an "ambiguous modules" error, forcing them to resolve the conflict by filtering or renaming.

👍

This could still arguably be annoying (a change in a library could require a change in your code), but at least wouldn't lead to surprises.

I'm okay with this. It means that a dependency has changed in a meaningful way that would potentially break your code.[1] (I assume it's meaningful since the author made or left the inner module public.)

It has the downside of continuing to make module symbols more special than other types (which seemed to be a negative to Bryant in the original comment that spawned this issue?), but would treat them somewhat more like function resolution where multiple candidates would be considered if they were available.

My initial comments were intended as counterargument to the claim that symbols in Chapel are treated as equally as possible, since I don't think that's reasonable given that each symbol kind has a different design goal (functions can overload, variables can scope-shadow, modules shouldn't scope-shadow[1]).

That said, I might say that resolving a use to an obviously local-to-scope symbol would not result in an ambiguity under assumptions that the module author would be aware of their own module's contents (or should be) and that it isn't a case that could result in hijacking.

Probably true, though at least to me, it's not apparent at all why "100" is printed even after learning the scoping rules. I'd be curious what most programmers would blind guess.

Also interesting to me is that there isn't any way to say use rootScope.N to qualify a use as being intended as top-level,

I think use /N or use ::N or use ^N or use _.N are all brain-stormy ways of writing that but that probably deserves its own issue.

👍 👍 👍 #13915

[1] Once there's a way to specify a precise module you want to use, these conflicts can then only occur with the generic use M/import M form. it might even make sense to get rid of this generic form altogether (and reuse the syntax for one of the specific forms) if it becomes easy enough and idiomatic to precisely specify e.g., use /N for top-level.

@bradcray
Copy link
Member

Also interesting to me is that there isn't any way to say use rootScope.N to qualify a use as being intended as top-level,

I think use /N or use ::N or use ^N or use _.N are all brain-stormy ways of writing that but that probably deserves its own issue.

Another idea that @mppf mentioned on the phone last week (and credited to @BryantLam, though I don't think it's on this issue, and I may not have seen it (yet) in its original context) would be to always require fully-qualified module references in use statements. Thus, given:

module M {
  module N {
  }
}

module N {
}

The code:

  use M;
  use N;

would never get us to module M.N, but only to top-level modules M and N. To get to module M.N, I would have to do:

  use M [only [N]];
  use M.N;

or perhaps just:

  use M.N;

This would obviate the need for a "root scope" qualifier for top-level modules at the cost of always needing to type more to use a submodule.

I find the benefit here attractive (not that I'm opposed to introducing a root scope qualifier, but am not immediately drawn to any of the current proposals), but don't know whether the downsides would be considered attractive or unattractive to others (nor whether they're familiar / burdensome for those coming from other languages).

@BryantLam
Copy link
Author

Another idea that @mppf mentioned on the phone last week (and credited to @BryantLam, though I don't think it's on this issue, and I may not have seen it (yet) in its original context)

It likely came from one of the many Rust links between @mppf and myself starting from #12923 (comment).

... would be to always require fully-qualified module references in use statements.

This is effectively how Rust 1.0 behaves. They've since introduced an explicit way to choose a module from some point in the hierarchy, which #13915 attempts to address.

This is fine, but likely not what we should do. I would caution that absolute paths would break a lot of code in submodules, as submodules would no longer be able to directly use Child anymore and instead would have to start at the top-level with use Project.Parent.Child.

As a consequence, this behavior also makes refactoring efforts via moving directories around also more painful, as not only will all use targets break that point to the moved modules (this is normal for a refactoring effort) but the modules within that directory will subsequently break as per the note above. All use Project.Parent.Child inside the Parent module trying to get at their Child would also change to use Project.SomeNewMiddleTier.Parent.Child despite the Parent and Child not moving relative to each other.

... Unless you meant only top-level use statements would be absolute. In which case, there is precedence for how confusing this can be in Rust. These articles talk about it:

@mppf
Copy link
Member

mppf commented Feb 3, 2022

@bradcray - I know we have made some progress in this area since most of this discussion (resolving #13915 for imports and deciding not to make use always start from top-level in https://github.com/Cray/chapel-private/issues/885 ). Do you think there is more to do here?

@mppf
Copy link
Member

mppf commented Feb 3, 2022

@bradcray - I am trying to understand something about #14014 (comment)

If we have this:

module M {
  proc bar() {
    writeln("Computing bar()");
  }
  module N {
    proc foo() {
      writeln("Hijacked!");
    }
  }
}

module N {
  proc foo() {
    writeln("Computing foo()");
  } 
}

module Main {
  proc main() {
    use M;
    use N;
    foo();  // this now calls the hijacked version
    bar();
  }
}

I am not following why use N is finding M.N. In particular I would expect that the process of resolving the use statements would bring in M.N at a distance of two but that the top-level N would already be present at a distance of one (notionally, because it is an enclosing scope - namely the root module). Is the compiler using some sort of "last resort" idea for finding top level modules?

I thought that perhaps it has to do with how the use-statement processing is breaking the tie between the top-level N and the non-top-level N so I tried wrapping your entire program in a module. But when I do that, I get an error about a full path or explicit use being required. I tried wrapping just M and N in a module but that gives me an error about not finding M, which I can't explain right now, either. I am curious if this reminds you of something or if it looks like a bug to you. Anyway what I was hoping to see was - if there are no top level modules - if the behavior is consistent with the top level case.

module GG {

module M {
  proc bar() {
    writeln("Computing bar()");
  }
  module N {
    proc foo() {
      writeln("Hijacked!");
    }
  }
}

module N {
  proc foo() {
    writeln("Computing foo()");
  }
}


}

module Main {
  proc main() {
    use GG;
    use M;
    use N;
    foo();  // this now calls the hijacked version
    bar();
  }
}
program.chpl:24: In function 'main':
program.chpl:27: error: Cannot find module or enum 'N'

@mppf
Copy link
Member

mppf commented Feb 3, 2022

I am not following why use N is finding M.N. In particular I would expect that the process of resolving the use statements would bring in M.N at a distance of two but that the top-level N would already be present at a distance of one (notionally, because it is an enclosing scope - namely the root module).

Oh, wait, it is at a distance of two as well, because one hop goes out of main, and the second hop is for the use M.

But that can't be what is going on because this variation on your program has the same behavior:

module M {
  proc bar() {
    writeln("Computing bar()");
  }
  module N {
    proc foo() {
      writeln("Hijacked!");
    }
  }
}

module N {
  proc foo() {
    writeln("Computing foo()");
  }
}

module Main {
  use M;
  use N;
  proc main() {
    foo();  // this now calls the hijacked version
    bar();
  }
}

@mppf
Copy link
Member

mppf commented Feb 3, 2022

I've opened #19167 which asks about the design of shadowing more broadly and would impact the behavior of programs in this issue.

@mppf
Copy link
Member

mppf commented Feb 8, 2022

I've opened #19219 about the two implicit/hidden/shadow scopes for use statements that was brought up in #14014 (comment)

@mppf
Copy link
Member

mppf commented Feb 10, 2022

I just wanted to bring up one thing from #14014 (comment)

This makes me wonder whether, more as Bryant's original intuition was saying, if a use could be resolved to multiple modules in the parent lexical scopes or through symbols made available by use, the programmer should get an "ambiguous modules, but I picked this one" warning or simply an "ambiguous modules" error, forcing them to resolve the conflict by filtering or renaming. This could still arguably be annoying (a change in a library could require a change in your code), but at least wouldn't lead to surprises. It has the downside of continuing to make module symbols more special than other types (which seemed to be a negative to Bryant in the original comment that spawned this issue?), but would treat them somewhat more like function resolution where multiple candidates would be considered if they were available. That said, I might say that resolving a use to an obviously local-to-scope symbol would not result in an ambiguity under assumptions that the module author would be aware of their own module's contents (or should be) and that it isn't a case that could result in hijacking.

I might be splitting hairs but I wouldn't think about it as making module symbols special and different from variables; rather I would think about it as defining how use works (in a slightly different way than what we have today).

@mppf
Copy link
Member

mppf commented Feb 12, 2022

Following up with a conversation I had with @bradcray - I think that the two of us are in agreement that we can make use search more scopes than a regular mention of a variable would in order to give an ambiguity error for the case in the OP and in #14014 (comment); and that this is something special about use rather than something special about module symbols.

I think the remaining question is - when the use gives that ambiguity error, should it give the error only if there are two ambiguous modules (or enums, since we can use enums), or also if there is a variable or function with the same name as a module? I am inclined to make the ambiguity check only consider things that could be used (modules and enums), but I would like to see what @bradcray thinks about that.

One argument for only considering modules/enums is that it keeps the error focused on cases that are actually problematic instead of making it an error that introduces noise - in particular if the "closest" definition of X for use X turns out to be a variable then we will get an error about that anyway; and if the closest definition is a module that shadows a variable then I don't see the need for an error. And, generally speaking, errors like this can make programs no longer compile when a library is updated; so it's in our interest to keep them focused.

@mppf
Copy link
Member

mppf commented Feb 14, 2022

A further related question - would it be OK to resolve the ambiguity with say import M; use M; (which would mean "use the M I just imported") or are such things making it more messy than necessary? Note that this interacts with #19160 - if the import doesn't introduce a shadow scope, maybe it is more reasonable for this pattern to work the way I've described.

@bradcray
Copy link
Member

I am inclined to make the ambiguity check only consider things that could be used (modules and enums)

Thinking about this question first made me think about the following case (which I see now you also brought up in your third paragraph):

module M {
}

module N {
  module M {
  }
}

module Main {
  use N;
  var M: int;
  use M;  // should refer to `var M: int;` and generate an error
}

where I'd expect that we should get an error (e.g., "Cannot use variable M of type int") rather than searching further for other possible modules/enums that they may have meant to refer to. I.e., we only do the "deep search" if the first appropriately-scoped thing we find is not a module/enum? Looks like you agree with this case as well.

So then, if the normal shadowing scoping rules only find a single symbol and it's a module or enum, then we'd search further for other modules/enums. Or possibly "only if it's a module" and/or "only search for other modules," since enums probably aren't capable of hijacking in a way that wouldn't simply cause compilation errors(?). Or am I being naive about that?

But I think that if we find multiple equally-visible symbols without doing the deep search, we'd need to generate an ambiguity warning rather than preferring the module for any reason. E.g., it seems like this case should report an ambiguity?

module P {
  module M {
  }
}

module N {
  var M: int;
}

module Main {
  use P;
  use N;
  use M;  // Ambiguous: `var M: int` and module `P.M` are equally close?
}

And then here's another case that I think would remain an error (probably not up for debate, but as long as I'm playing with these):

module M {
}

module N {
  var M: int;
}

module Main {
  use N;
  use M;  // Error because `var M: int;` is closer?
}

Basically, the case I want to avoid is that use M; will for any reason "prefer" a module M if there's an ambiguity with other non-module/enum symbols. In the same way that we wouldn't break an ambiguity for x in z = x + y; in favor of something that's not a module by arguing that "modules can't be added."

would it be OK to resolve the ambiguity with say import M; use M;

I think if this falls out through normal rules, that'd be fine (if not particularly clear), but I wouldn't do something special to make it do something heroic. For the example in #14014 (comment), if I have to change my code to resolve the ambiguity anyway, I'd just as soon change it to something like use M except N; and/or use M only N as MsN; which seems clearer (?).

@lydia-duncan
Copy link
Member

...

where I'd expect that we should get an error (e.g., "Cannot use variable M of type int") rather than searching further for other possible modules/enums that they may have meant to refer to. I.e., we only do the "deep search" if the first appropriately-scoped thing we find is not a module/enum? Looks like you agree with this case as well.

I also agree with Brad's position here.

So then, if the normal shadowing scoping rules only find a single symbol and it's a module or enum, then we'd search further for other modules/enums. Or possibly "only if it's a module" and/or "only search for other modules," since enums probably aren't capable of hijacking in a way that wouldn't simply cause compilation errors(?). Or am I being naive about that?

I'm not sure I fully understand this. If we've already found a single symbol that works, why are we looking further? Or is it only in certain circumstances to do with the "merged" scopes?

But I think that if we find multiple equally-visible symbols without doing the deep search, we'd need to generate an ambiguity warning rather than preferring the module for any reason. E.g., it seems like this case should report an ambiguity?

I agree with this and the rest of Brad's comment

@mppf
Copy link
Member

mppf commented Feb 14, 2022

Right, I was talking specifically about cases where a variable has the same name but wouldn't be the "closest" one that we pick.

module HasVarM {
  var M: int;
}
module Main {
  module M { }
  use HasVarM;
  use M;  // module M is closer, so we prefer that.
          // Should it be an error anyway, since M is arguably ambiguous?
}

@bradcray
Copy link
Member

I'm not sure I fully understand this. If we've already found a single symbol that works, why are we looking further?

This an approach Michael proposed for avoiding the silent hijacking type of situation brought up in: #14014 (comment) The proposal is to look harder for these types of ambiguities for use statements since they don't start from the top-level and so are subject to this type of hijacking (where the theory is that these ambiguities and situations probably don't come up much in 'real' code so wouldn't get triggered often, but would help prevent against surprises and bad actors).

mppf added a commit that referenced this issue Oct 5, 2023
Resolves Cray/chapel-private#5220
Resolves #19780
Resolves #19782

This PR introduces a warning for cases where symbol shadowing might be
surprising. In particular, the warning is focused on cases where a `use`
statement brings in a symbol that conflicts with a more obviously
available symbol in an outer scope. This warning is motivated by the
example in #14014 and the the hijacking example in
#14014 (comment).
Additionally, in review of PR #19306, it was requested to make a warning
about the case that a `public use` symbol shadows a `private use` one
and this PR implements that request.

The new warning only applies to scope lookups for the innermost match;
that means that the new warning does not apply to functions and calls.

What is the logic for the warning?
* when a symbol is available through a bulk import from a `use`
statement, warn if it's defined in another available scope (another
shadow scope or outer scope) except:
* do not warn for locally defined symbols that shadow a symbol from a
private use (shadow scope 1 or 2)
* do not warn for conflicts between a name brought in by a private use
and the module name brought in by the same private use

The warning is implemented to consider all parent scopes and the names
of toplevel modules, if we are considering a `use`/`import`. Considering
the parent scopes seems natural to me but the toplevel module part is
necessary to give the warning in the case of
#14014 (comment)
. We considered avoiding warning in the case that the toplevel module in
question is in a different file but opted not to do that.

Since the warning is (for the most part) a superset of the warning for
hidden formals added in PR #21614, this PR removes the hidden formal
warning.
* The case where a function is found through a `use` statement but has
the same name as a formal is no longer warned about because the new
warning does not apply to function lookups
(test/functions/iterators/recursive/recursive-iter-findfilesfailure)
* The case of a local variable that shadows something from a `use`
statement that in turn shadows a formal is no longer warned about, by
the design of the new warning
(test/modules/diten/testIfModuleShadowsLocal)

Future Work:
 * consider a similar error for function calls

Reviewed by @DanilaFe and @lydia-duncan - thanks!

- [x] full comm=none testing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants