Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

specify behavior of fromEnvironment #304

Closed
sigmundch opened this issue Apr 8, 2019 · 31 comments
Closed

specify behavior of fromEnvironment #304

sigmundch opened this issue Apr 8, 2019 · 31 comments
Assignees

Comments

@sigmundch
Copy link
Member

We'd like to have a source of truth to define what is required for fromEnvironment constructors and what is the expected behavior we will support.

Since modular compilation was not even in the picture at the time the feature was introduced, we are now finding a lot of inconsistencies across the tools. The recent changes that move constant evaluation to the CFE also highlight the need to define a coherent story going forward.

Some questions that have recently come up:

  • are both const and new supported?
  • when is evaluation supposed to take place (compile-time, link-time, runtime)?
  • if evaluation is at compile-time
    • how is constant consistency maintained across modular builds (are environment definitions stored so they are always consistent at the app level, or are they substituted eagerly and then forgotten?)
@eernstg
Copy link
Member

eernstg commented Apr 9, 2019

Makes sense! I'll label this issue as 'specification' at first. There might be some elements which would justify a language design perspective, but I suspect (and hope) that we can consider it to be more like a clarification. The decision may end up being part of the language specification or the core library documentation, but that can be decided independently of the actual contents, so nothing is implied at this point.

@leafpetersen
Copy link
Member

We'd like to get closure on this. My proposal is the following:

  • are both const and new supported?

Yes

  • when is evaluation supposed to take place (compile-time, link-time, runtime)?

compile-time

  • if evaluation is at compile-time

    • how is constant consistency maintained across modular builds (are environment definitions stored so they are always consistent at the app level, or are they substituted eagerly and then forgotten?)

Consistency is checked at link time. That is, if two separately compiled modules are linked together (whether before runtime, or at runtime) there is a check that they have consistent definitions for fromEnvironment).

@lrhn @munificent @eernstg Are we in agreement on this? Issues I'm not seeing?

@eernstg
Copy link
Member

eernstg commented May 6, 2019

@leafpetersen wrote:

[evaluation occurs at] compile-time ..
Consistency is checked at link time

Making the choice at compile-time and then checking consistency at link time is what I've suggested we should do whenever this came up, for simplicity. So in that sense I agree.

But it is my impression that we are currently using a more complex approach, in particular in the common front end: I believe it has a notion of constant expressions that are unevaluated at compile time. Such expressions are evaluated to such an extent that there is no need for a regular binding environment (so we do not need to look up the declaration of an identifier in order to finish the evaluation), the evaluation can be completed based on nothing more than the values of constant expressions of the form e.fromEnvironment(...).

Presumably, this machinery would not be needed if our tools were able to rely on having the values of objects fromEnvironment at compile-time, which means that we have been aiming at a more flexible model than the one you mention.

@askeksa-google, can you provide more detail on this?

@lrhn
Copy link
Member

lrhn commented May 6, 2019

  • Both const and new supported: I'd be fine with only allowing const, but it is a breaking change.
  • When does evaluation happen: If we are completely consistent, then it should not matter. Our constant expressions are exactly expressions where it doesn't matter when they are evaluated. Since some constant evaluation happens at compile-time, and conditional imports are based on the same environment and definitely happens at compile-time, that would mean that we need to retain the compile-time environment at run-time (at least if there is any new ...fromEnvironment invocation with a dynamic argument). So, in short, I think the answer to this should be decided by the tool needs, not the language/library team. There is no "link time" in the language (and the definition of "compile time" is really just that a program containing a compile-time error should not start running, there is no requirement that constant expressions are evaluated at compile time apart from that, so if a constant expression can provably not fail, it can be evaluated at any time and retain the specified semantics).
  • Consistency: We do need consistency, as if there is only one environment which is used both during compile-time checks and conditional imports, and at run-time. If we retain compile-time constants in the intermediate formats, then I'd be fine with linking checking whether there are conflicts and breaking if there is. It might be sufficient to retain the values that were actually used (no use, no conflict), and only the full environment if a new .. invocation is in the compilation unit. We should not run code that doesn't correspond to a single full program with a single available environment.

Retaining the full environment will increase the files. It will not work well for DDC where you compile files individually and only link them by loading the JS code into the same JS VM. We don't want to do environment checking there. Perhaps they can just retain a sufficiently large hash of the environment and check that (again as long as there is no new operations).

If we disallow new, then I think a lot of the complications get smaller. A hash might be sufficient in all cases, and still only store that in compilation units that actually depend on the environment.

@askeksa-google
Copy link

Some thoughts about supporting new:

  • It is currently not supported in dart2js and DDC, and it seems not obvious how to supply a run-time environment in either case. We could specify it at compile-time (though retaining that in the compilation has complications / overhead), but then const could just as well be used.
  • In JIT mode, const and new are equivalent.
  • Our API documentation states (since dart-lang/sdk@7f7c681) that only const is supported.
  • The VM currently allows the embedder to supply a callback for specifying environment values. Thus, the VM does not have a fixed (and not necessarily finite) environment.
  • The Isolate.spawnUri method has an environment parameter to specify the environment for the new isolate. This feature is currently unimplemented. All isolates in the same VM use the same environment (based on commandline options and the embedder callback).

It seems new is only really relevant for the compile-to-kernel + run-in-vm case. Here, it seems mainly useful if the run-time values can be different from the compile-time values, but this goes against the desire for consistency.

@leafpetersen
Copy link
Member

leafpetersen commented May 6, 2019

@lrhn I don't really understand your reply, I think maybe we're talking past each other?

The key questions as I understand it are the following. Assume the compiler is run with -DPARAM=0 in module A, and the VM is run on the resulting kernel file with -DPARAM=2

  1. Should it be made legal to call the fromEnvironment constructors with new?
  2. Given a String.fromEnvironment call in module A with "PARAM":
    • is it guaranteed to produce 0
    • guaranteed to produce 2
    • does the result depend on whether you call it with "new" vs "const"?
    • It the result tool defined?
  3. Given a second module B, compiled with -DPARAM=1 and linked with module A
    • Is this an error?
    • If not, do invocations in the different modules see different results?

My answers are as follow:

  1. I'm agnostic, but don't see a need for the breaking change on the VM
  2. I think this should produce 0, always (new or const)
  3. I think this should be an error

What are your answers?

Retaining the full environment will increase the files. It will not work well for DDC where you compile files individually and only link them by loading the JS code into the same JS VM. We don't want to do environment checking there.

I'm not sure I see this. Why is the environment large? Most invocations don't pass any -D arguments at all?

@vsmenon Do you have DDC implementation concerns here?

@lrhn
Copy link
Member

lrhn commented May 6, 2019

@leafpetersen
My answers are:
0. I'm agnostic as well. I believe some tools may get an advantage from "no".

  1. A const invocation should produce 0. If we do allow passing flags at run-time, they should be honored and produce 2 if invoked using new. I'd prefer to not allow it and produce 0 for new as well. Not a strong use-case, so I'm not particularly attached to either.
  2. If module B uses the value, I'll make it a compile-time error, precisely so that invocations in different modules do not see different values. If the value isn't used, the program is indistinguishable from one which was provided a consistent environment.

For modular compilation, any environment will be in each module. I also expect dart.library.xyz variables to be represented somehow. Maybe we'll just encode that as a "platform", but I think that's not a future safe design (it requires us to enumerate platforms in the format somehow).

@leafpetersen
Copy link
Member

OK, as far as I can tell we have consensus.

  • Using new produces the same result as using const, always
  • We may not initially have support for using new on all platforms, that's ok
  • Using new or const uses the value that was provided at the point that the source code was processed (that is, compile time if using AOT/snapshot, etc, or runtime if running directly from source)
  • When multiple compilation artifacts are processed ("linked") it is an error if they have inconsistent environments.
    • There is a small question here to be resolved as to exactly what it means to be inconsistent

@leafpetersen
Copy link
Member

2. If module B uses the value, I'll make it a compile-time error, precisely so that invocations in different modules do not see different values. If the value isn't used, the program is indistinguishable from one which was provided a consistent environment.

I'm not sure I think this is a good idea. What does it mean for a value to be used? Mentioned? Evaluated? If the latter, do you mean even at runtime (using "new")? What about short-circuiting in "const"? If the former, can we always tell which ones are mentioned? Certainly doesn't seem like that for "new".

Even ignoring the issue of defining this well, this feels fragile to me. I have a library that's working "fine" in all of my tests (because coverage doesn't hit the path that evaluates a particular fromEnvironment), but as soon as someone uses something that hits the code path with the fromEnvironment they start getting link time errors. I'd rather know up front that something has inconsistent defines.

@lrhn
Copy link
Member

lrhn commented May 8, 2019

It's true that if we allow new, then we can't possibly know which environment declarations are used. We'll have to consider every value used if the code contains a new String.fromEnvironment (including every unspecified name).
For a non-taken branch of a potentially constant evaluation, we know at evaluation point whether the value of the declaration affects run-time semantics. If the branch isn't taken, then it doesn't matter what the value would have been.

I guess I'm saying that I only want to make a declaration conflict into a compile-time error if the conflicting value actually makes a (potential) difference.
Maybe that's a bad idea, and failing early on any compilation discrepancy is the way to go, but dart.library.io is such an environment declaration, so pre-linking artifacts would not be shareable between platforms, even if they are entirely platform independent.

If a compilation artifact only remembers the values that it has actually depended on, then anything that is entirely compilation environment independent can be reused freely (and it doesn't need to keep the extra data around).

@kmillikin
Copy link
Contributor

  • Using new or const uses the value that was provided at the point that the source code was processed (that is, compile time if using AOT/snapshot, etc, or runtime if running directly from source)

I would amend this to say that "using new or const takes the value that was provided at the point that a value was provided". We can leave it up to the tools to decide how an environment is provided and to make their own performance and usability tradeoffs.

If that formulation is too vague we could try something like "using new or const takes the value that was provided at the point that the code was processed for the purpose of evaluating calls to const {bool, int, String}.fromEnvironment". (Please don't require processing the source code for this purpose!)

@leafpetersen
Copy link
Member

@kmillikin I read both of your formulations as permitting the current VM behavior, which I don't think we want. In particular, I think we want:

  • If the user provides a define at any point in the compile step, it is the case that either:
    • an error is thrown because it conflicts with a previously provided define, either in the current or some other module (this error may not show up until link or runtime in the latter case)
    • the value is ignored because the value was already defined in a previous step
    • the value is used for the evaluation of any fromEnvironment calls, new or const, globally in the final program

@lrhn

We'll have to consider every value used if the code contains a new String.fromEnvironment (including every unspecified name).

I don't understand the _unspecified _ name bit here, can you elaborate?

@leafpetersen
Copy link
Member

leafpetersen commented May 15, 2019

[EDIT this is wrong, see correction in next comment]

To address the question @lrhn raises, would the following work?

  • if a module is compiled without defining a value for FOO
    • If constant evaluation results in evaluating a fromEnvironment of FOO to value V
      • then this is considered equivalent to an implicit -DFOO=V for that module, with all of the implied errors as I described above.
    • Otherwise, the module is allowed to be linked with any other module that has any define for FOO
  • At link time, global consistency of the define map is checked

This should allow modules that don't depend on FOO in any way to be compiled once and linked into any program that provides an otherwise consistent definition of FOO.

@leafpetersen
Copy link
Member

I think my suggestion in the previous comment was wrong, it should be as follows:

A symbol may be in the following states in the map:

  • Unconstrained (not defined, and not explicitly UNDEFINED)

  • UNDEFINED, meaning that constants have been evaluated with no define for that symbol

  • DEFINED to V, meaning that a define was given for that symbol with value V

  • if a module is compiled without defining a value for FOO

    • If constant evaluation results in evaluating a fromEnvironment of FOO to value V
      • then the environment must mark FOO as UNDEFINED for that module, and the module can only be linked with other modules that either do not constrain FOO, or which mark it as UNDEFINED
    • Otherwise, the module is allowed to be linked with any other module that has any define for FOO (including UNDEFINED), or no define at all
  • At link time, global consistency of the define map is checked

@leafpetersen
Copy link
Member

Ok, here's my attempt to unify this based on discussion.

Every compilation unit is associated with an environment.

An environment is a mapping from strings to environment entries.

An environment entry is either a value (a string) or a special value UNDEFINED

Any tool that processes a compilation unit is allowed to add new entries to the environment. No tool may change an existing environment entry.

When a fromEnvironment constructor is evaluated (whether const or new), the string argument is looked up in the environment. If it is present, the value is used. If it is not, then an environment entry of UNDEFINED is added for that string. (This last step can be elided when evaluating "new fromEnvironment" constructors if otherwise no further modifications of the environment are permitted at runtime - the presence of the UNDEFINED sentinel is only used to check that inconsistent defines are not used).

When two or more compilation units are linked (whether as part of a compilation step, or at runtime), their environments are checked for consistency, and it is an error if they are not consistent.

Two environments E1 and E2 are consistent iff for every key S in the intersection of their domains E1 and E2 map S to the same environment entry.

Linking is only defined for consistent environments. The result of linking two compilation units with consistent environments E1 and E2 has an environment E3 where the domain of E3 is the union of the domains of E1 and E2, and where E3 maps each key S in its domain to the value given for that key in E1 if present, else the value given in E2.

@lrhn
Copy link
Member

lrhn commented May 16, 2019

I don't understand the _unspecified _ name bit here, can you elaborate?

If a library uses new String.fromEnvironment(dynamicString), then we have to remember the value of all environment variables, so that it won't conflict with another library. That includes even the environment variables that have not been supplied. If this library is compiled with no -Dfoo=... declaration, and another library in the same program is compiled with a -Dfoo=bar declaration, then we have a conflict.

I guess that actually applies to const String.fromEnvironment("foo") too. If the current library defines -Dfoo=bar and another library is compiled with no foo declaration, that is a conflict as well, so there is nothing special here.

(As usual, we can choose to allow conflicts that are not detectable because one of the libraries does not depend on the value - there exists a single consistent definition environment that the entire program could be compiled in to get the same semantics).

@lrhn
Copy link
Member

lrhn commented May 16, 2019

I think the unification of the constant environments is good, but it does not address new String.fromEnvironment(somethingComputed). If you don't know the key string at compile-time, you need to assume every key is in use. (If you do know the string at compile-time, you should just use const).

So, maybe:


An environment is a mapping from key strings, or a special key ALL, to environment entries.

An environment entry is either a value (a string) or a special value UNDEFINED

Any tool that processes a compilation unit is allowed to add new entries to the environment if no ALL key is present. No tool may change an existing environment entry.

When a fromEnvironment constructor is evaluated (whether const or new) with a constant string as argument, the string argument is looked up in the environment. If it is present, the value is used. If it is not, the default value of the constructor invocation is used, and if no ALL key is present, then an environment entry of UNDEFINED is added for that string. (This last step can be elided when evaluating "new fromEnvironment" constructors if otherwise no further modifications of the environment are permitted at runtime - the presence of the UNDEFINED sentinel is only used to check that inconsistent defines are not used).

Whena fromEnvironment constructor is evaluated using new and the argument is not a constant string, if the ALL key is not present in the environment, add an ALL key with an UNDEFINED value.

When two or more compilation units are linked (whether as part of a compilation step, or at runtime), their environments are checked for consistency, and it is an error if they are not consistent.

Two environments E1 and E2 are consistent iff

  • For each key string in E1 with a non-UNDEFINED entry,
    • either E2 has the same entry for that string,
    • or E2 has no entry for that string and no ALL entry
  • and vice versa.

Linking is only defined for consistent environments. The result of linking two compilation units with consistent environments E1 and E2 has an environment E3 where the domain of E3 is the union of the domains of E1 and E2, and where E3 maps each key S in its domain to the value given for that key in E1 if present, else the value given in E2.


I guess this could be expressed as having two kinds of environments: open and closed.
An open environment defaults to unspecified for all entries not explicitly mentioned, and you can have either values or UNDEFINED entries for keys.
A closed environment defaults to UNDEFINED for all keys not explicitly mentioned, and you only need to mention string entries.
You close an environment for a library if it contains any new ...fromEnvironment(unknownString).

@leafpetersen
Copy link
Member

@lrhn

I think the unification of the constant environments is good, but it does not address new String.fromEnvironment(somethingComputed). If you don't know the key string at compile-time, you need to assume every key is in use. (If you do know the string at compile-time, you should just use const).

I don't understand this at all. What are you trying to achieve here? I am trying to achieve the following properties:

  1. const invocations of fromEnvironment can be (but are not required to be) evaluated at compile and/or link time
  2. Every invocation (const or new) of a given fromEnvironment constructor with the same string argument produces the same result
  3. Code that does not have any const fromEnvironment invocations with string S is not prevented from producing compilation artifacts that are linkable with compilation artifacts that were produced with any definition for S

Any new fromEnvironment invocations don't affect the compilation artifacts at all: they are evaluated at runtime.

So to summarize:

I claim that my proposal guarantees that at runtime there is a single global environment, and that the result of evaluating any new fromEnvironment call will produce the same result that any evaluation of a const fromEnvironment call did during compilation of any of the modules.

Do you think my claim is wrong (and if so, what is the counter-example)?

Or are you trying to ensure some other property than what I describe above, and if so what is it?

@lrhn
Copy link
Member

lrhn commented May 19, 2019

I want to achieve two things on top of the property you list:

  • If a library contains String.fromEnvironment("foo"), it will evaluate to the value (or lack of value) that was provided when that library was compiled. If there is no foo definition when compiling the library, and it is linked with another compilation unit which was compiled with -Dfoo=bar, I do not want to allow the first library to get "bar" as result.
  • And I was trying to allow a library compiled with -Dfoo=bar, but which never uses String.fromEnvironment or friends, to be linkable with another library with a different declaration for foo.

The latter may just be too permissive and friendly, and not needed. It would allow compilation artifacts to only retain environment declarations that are actually (potentially) used, instead of retaining the entire environment.
The former seems like something I'd want, but without the latter, it would basically mean that all compilation units must be compiled with the exact same environment. Which is fine, they probably will be.

@leafpetersen
Copy link
Member

@lrhn Note that nothing stops a library from calling a function in a different library which does the String.fromEnvironment('foo'), so if we allow any kind of difference in the environments, a module can always observe it. That is, even if we special case String.fromEnvironment to notice that module A calls it, and use that to lock down the environment, we can't handle the case that module A calls checkFoo from module B instead, which just does the same thing.

I'm fine with just disallowing any differences in the environments - @sigmundch and @kmillikin both seemed to feel that our ability to share compilation artifacts would be low anyway. If we change our mind later, it's a non-breaking change to relax the restrictions.

@leafpetersen
Copy link
Member

Ok, per offline discussion, we will go with the restrictive version that requires environments to be the same. If in the future it proves important to share compilation artifacts, we can consider relaxing this as a non-breaking change. The short version of the spec then is that environments have to be the same when linked. If we're happy requiring that all defines be given before constant evaluation happens, then I think that's enough. The language below allows for tools to add more defines at any point, so long as they are done consistently across all modules, and so long as they are observed to be the same at every invocation. If no defines are allowed to be added after the stage of compilation in which constant evaluation is done, then the UNDEFINED mechanism can be ignored, and environment consistency degenerates to equality.

Every compilation unit is associated with an environment.

An environment is a mapping from strings to environment entries.

An environment entry is either a value (a string) or a special value UNDEFINED.

Any tool that processes a compilation unit is allowed to add new entries to the environment. No tool may change an existing environment entry.

When a fromEnvironment constructor is evaluated (whether const or new), the string argument is looked up in the environment. If it is present, the value is used. If it is not, then an environment entry of UNDEFINED is added for that string. (This last step can be elided if no further modifications to the environment area allowed, such as at runtime - the presence of the UNDEFINED sentinel is only used to check that a define for a given string S is not added after a const fromEnvironment constructor with argument S has been evaluated).

When two or more compilation units are linked (whether as part of a compilation step, or at runtime), their environments are checked for consistency, and it is an error if they are not consistent.

Two environments E1 and E2 are consistent iff for every key S in the union of their domains:

  • either E1 maps S to the same entry as E2
  • or E1 maps S to UNDEFINED and E2 has no entry for S
  • or E2 maps S to UNDEFINED and E1 has no entry for S

Linking is only defined for consistent environments. The result of linking two compilation units with consistent environments E1 and E2 has an environment E3 such that E3 maps every key S in the union of the domains of E1 and E2 to the value given by E1 if any, and otherwise to that given by E2.

@lrhn
Copy link
Member

lrhn commented May 21, 2019

Conditional imports are based on the environment too, so the part of the environment controlling conditional imports must be available before imports can be resolved.
That part only contains defines with names staring with dart.library..
Tools may receive this environment implicitly as a "platform" setting rather than explicit key/value pairs.

We used to allow such names to be overridden by user defines, but we should probably stop doing that and reserve all names starting with dart. as a system environment that cannot be overridden by user flags.

@askeksa-google
Copy link

Shouldn't the consistency rules rather be (to fulfill item 3 in #304 (comment)):

  • either E1 maps S to the same entry as E2
  • or E1 has no entry for S
  • or E2 has no entry for S

That is, if there is no entry for a key S in one environment (i.e. it is neither defined nor evaluated), it OK for S to map to anything in the other environment(s)?

@leafpetersen
Copy link
Member

Shouldn't the consistency rules rather be (to fulfill item 3 in #304 (comment)):

No. My final version here is explicitly giving up on item 3 from the comment you reference. So the proposed version essentially requires that the environments are exactly the same. The only reason it's not actually specified as exact equality is that I'm still accounting for the possibility that tools may provide additional defines at different points in the compilation pipeline, and I want the spec to enforce that the resulting view is consistent. Hence the UNDEFINED mechanism for recording the fact that a given string key has been observed to be in the unset state, and can not be set to something by a later compilation stage.

@leafpetersen
Copy link
Member

@eernstg has this been incorporated into the spec yet?

@lrhn
Copy link
Member

lrhn commented Feb 12, 2021

I'm not sure this belongs in the language specification since it talks about the compilation process. The language specification does not mention compilation at all, it just gives run-time semantics, and some compile-time errors.

I guess it could be possible to say that it's a compile-time error if the string associated with a key in the "Dart environment" is not the same between two calls to a .fromEnvironment/hasEnvironment constructor, but I'd prefer to write it in the platform library documentation that a call to String.fromEnvironment must always behave as if it has the same underlying value or absence of value (and the other functions must agree).

Let me try to write that.

@eernstg
Copy link
Member

eernstg commented Feb 12, 2021

has this been incorporated into the spec yet?

No.

I'm not sure this belongs in the language specification

One part that we usually have in the language specification is whether or not an expression is constant, but String.fromEnvironment, bool.fromEnvironment, and bool.hasEnvironment are declared as constant constructors in those classes, which is sufficient to have them covered (as <constObjectExpression>).

The other property that we'd need to specify in the language specification is if it is an error (run-time or compile-time) to invoke these constructors at run time. But the consensus seems to be that this is not an error.

So it seems to fit with current practice that this is specified in the system library documentation.

@lrhn
Copy link
Member

lrhn commented Feb 12, 2021

Possible documentation update for the methods: https://dart-review.googlesource.com/c/sdk/+/184467

dart-bot pushed a commit to dart-lang/sdk that referenced this issue Feb 17, 2021
…ve consistently.

AAll calls to those constructors must behave as if there is a single consistent
environment backing them.
This will force compilers to reject programs where that wouldn't be the case,
which is possible if doing modular compilation with different settings.

If a library doesn't check the environment, it doesn't matter whether the
environment was declared differently, the requirement is only that when actually checking,
the values must be consistent.

Bug: dart-lang/language#304
Change-Id: Ie52ecc3ea49ed87297fab92e5e7bb6d9f96a495d
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/184467
Commit-Queue: Lasse R.H. Nielsen <lrn@google.com>
Reviewed-by: Sigmund Cherem <sigmund@google.com>
Reviewed-by: Leaf Petersen <leafp@google.com>
Reviewed-by: Erik Ernst <eernst@google.com>
@leafpetersen
Copy link
Member

Can we consider this done?

@eernstg
Copy link
Member

eernstg commented Feb 19, 2021

I think so. @lrhn, do you have any further plans in this area?

@lrhn
Copy link
Member

lrhn commented Feb 19, 2021

I think the current text defines the behavior we want: We don't care how you compile, but the combined program must behave as if there is one consistent environment, both compile-time and run-time (which might not match how the VM currently works).
If you can't do that, you'll have to abandon compilation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants