Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escaped reserved words #271

Open
munificent opened this issue Mar 15, 2019 · 15 comments
Open

Escaped reserved words #271

munificent opened this issue Mar 15, 2019 · 15 comments
Labels
feature Proposed language feature that solves one or more problems

Comments

@munificent
Copy link
Member

This is a proposed solution for #270:

Evolving a programming language is always challenging in a number of ways. Users often want new features, but those features need syntax and adding syntax without breaking existing programs is difficult.

In particular, it is impossible to add new reserved words to the language. A reserved word, by definition cannot be used by users as an identifier. This means that if, say, Dart 2.x turns foo into a reserved word, then any existing program using foo as a variable name, type name, member name, import prefix, etc. breaks with a syntax error.

The typical way Dart and other languages avoid this is by never adding new reserved words. Instead, they add "contextual keywords" or "built-in identifiers". These are identifiers that behave like keywords in some contexts but can be used by users as normal identifiers in other places.

For example, show behaves like a keyword when used after an import or export directive:

import 'foo.dart' show something;

But it can also be used as an identifier:

class Widget {
  void show() { print("I am visible now."); }
}

In addition to not breaking existing programs, this has another advantages:

  • Natural-sounding words can be used without taking them away from users. It would be a shame if a language designed for user interfaces didn't let users create methods named show and hide. It would be annoying if a language frequently used for web apps couldn't use get. It would be really strange if a language that had a Set type in its core library didn't let you name a variable set.

But it carries a number of disadvantages:

  • Syntax highlighting is harder. A tool doesn't know whether to treat await as a keyword unless it tracks whether or not it is currently inside an async function. Simple tools like syntax highlighters rarely carry that context, so writing correct highlighters for Dart is challenging.

    In practice, most just always highlight contextual keywords as if they were reserved words. That in turn encourages users to believe these are reserved, which then causes confusion when they stumble into code that uses it like an identifier.

  • Error recovery is harder. Contextual keywords are almost always used for their keyword behavior and rarely as identifiers. When a user inadvertently uses the keyword in a place where it doesn't behave like a keyword, they usually want error messaging that tells them why it's not a keyword. Using await in a function that you forgot to mark async is the classic example.

    But, because the keyword could technically be used as a identifier there too, it's hard for tools like IDEs to know what to tell users to fix their code. This means folks working on IDEs spend a lot more effort to deliver decent error messages than they would need to if the keyword was completely reserved.

  • Defining the context such that the keyword's use isn't ambiguous is harder. The nice thing about a reserved word is that you know what it means regardless of where it appears in the user's code. But when the same lexeme can be used as both a keyword and an identifier, the grammar needs to be carefully designed to ensure those two cases don't collide.

    For example, one annoyance of await is that has low precedence and doesn't chain nicely in method calls. Hard to read code like this is common:

    await (await (await foo).bar).baz;

    An obvious solution would be a postfix await syntax:

    foo.await.bar.await.baz.await;

    But this doesn't work since await here appears in a place where it could also be an identifier.

    This makes it harder to evolve the language since any new syntax has to avoid these ambiguous cases.

  • Making semicolons optional is harder. The ambiguity problem becomes acute when we try to make semicolons optional. For that to work gracefully, we need to ignore newlines in places where they obviously aren't meaningful, but "obvious" gets murky around contextual keywords.

    Consider:

    import 'foo.dart'
    hide bar

    Without an explicit semicolon separating the import directive from the next declaration, we have to decide whether to treat this as:

    import 'foo.dart' hide bar;

    Or:

    import 'foo.dart';
    hide bar; // A variable "bar" of type "hide".

    Because hide is a contextual keyword, both are plausible. There are similar ambiguities around Function, await, async, etc.

  • It's just harder to understand. Many programmers don't know "contextual keywords" even exist. They have a simpler mental model that a given name is either completely reserved and owned by the language or not meaningful to the language at all. They read code assuming this mental model, which works correctly most of the time, and then are very confused when they run into places where it breaks down.

    Having contextual keywords increases the cognitive load of the language, especially given that Dart actually has several categories of contextual keywords, each with their own special rules.

In other words, contextual keywords make the language bigger, more confusing, and harder to change. They technically preserve compatibility, but with a high tax.

A Model for Evolving the Language

In the past, most programming languages evolved with a policy of 100% backwards compatibility. That's great for, well, compatibility, but the trade-off is that the language gets monotonically more complex over time.

The increasing complexity means people now avoid C++ completely because it's simply too large for a new user to learn. If you didn't get on the C++ train a decade ago, it's very difficult to catch up. (See 1, 2.)

The other problem is that language features have to be compromised from their ideal form in the name of compatibility. For example, if we wanted to add non-nullable types in a non-breaking way, then we'd have to treat every existing type annotation as nullable, since that's what they mean today.

In order to get a non-nullable type, you'd need some explicit marker like !. But that's the wrong default. Empirical analysis shows something like 90% of variables are non-nullable, so forcing users to opt in to that only the majority of their types is a strictly worse feature.

To avoid that, Dart, Rust, and other languages are moving to a model where compatibility is preserved through a combination of opting in to new features and migration tooling. Requiring an opt in means existing code continues to work as it does today.

At the point that you opt in, you can also run a tool that changes your existing code to get it to a form that makes the most sense in the context of the new feature. With non-nullable types, that lets us make non-nullable the default, leading to cleaner code post-migration without piles of pointless !. It's even theoretically possible to have migrations that purely remove deprecated features, giving us a way to simplify the language over time by removing functionality that no longer carries its weight.

This model generally works well for syntax changes, but one area where it breaks down is when the migration tooling would change the public API of a library. At that point, a user can't freely opt in to the change because it forces them to break their existing users.

An example that gets to the point of this proposal is reserving a new word. Let's say we want to turn async into a fully reserved word. We could write a tool that found any existing uses of async as an identifier and re-wrote them to something like myAsync. The resulting code now no longer has syntax errors. But if those identifiers are in public members, any library importing the migrated one are broken. In other words, migration isn't encapsulated.

Escaping Reserved Words

This proposal solves that for reserved words by providing a syntax that lets you explicitly use any reserved word (new or old) as an identifier. We borrow a feature from Swift and allow a backticks around any reserved word or identifier:

var `for` = "a variable named 'for'";

This provides two main benefits:

  • We can add new reserved words. In order to do so, we require an opt in and then ship a tool that finds any existing uses of the keyword and wraps them in backticks. This gets the code back to its original meaning without changing its public API.

    (Eventually, a library author will probably want to stop using the now-reserved word in their API, but they can do that at their discretion.)

  • We can gracefully interop with other languages and systems that use Dart reserved words as identifiers. When generating Dart APIs that interop with JavaScript, protobufs, JSON, etc. you can provide access to identifiers in those other systems even if they happen to be a reserved word.

We have a general goal of making the language easier to evolve, and this feature would give us one small mechanism to let us evolve the set of keywords in a mechanically-migratable way.

@munificent munificent added the feature Proposed language feature that solves one or more problems label Mar 15, 2019
@leafpetersen
Copy link
Member

To clarify, is the model that:

  • Uses of the reserved word need to be escaped as well
class A {
  static var `for` = "a variable named 'for'";
}
A.`for`.length;
  • But in un-opted in code, you could still reference the opted-in API without escaping?
library not_opted_in;
import opted_in.dart;
A.for.length; // still works

@munificent
Copy link
Member Author

munificent commented Mar 16, 2019

That's right. There's a level of separation here where this particular proposal is independent any sort of opt in, migration, or new reserved words. It just says if you want to, say, name a variable "for", here's the syntax to do it.

That in turn happens to be a nice affordance because then if we want to reserve a new word, we can do so, by:

  1. Define an opt in to use the new reserved word as a keyword.
  2. Provide a tool to migrate existing code such that it can be opted in without breaking it or any code that calls it.

This feature makes it possible to implement step 2 because any existing uses of the new keyword can simply be wrapped in backticks and everything continues to work as before.

@lrhn
Copy link
Member

lrhn commented Mar 18, 2019

There are precedence (ES6) for allowing unquoted reserved words after a ..
We probably won't do that because it would prevent us from doing postfix await as foo.await.

Other options for escaping could be:

var \escaped = o.\escaped;
var `escaped = o.`escaped;
var #escaped = o.#escaped;

The backslash works like JavaScript before the above mentioned feature, where o.\if worked in most implementations (contrary to what the spec actually said, but it was rather vague).
The ` is similar to Scheme-like symbol escapes, but probably not a good match for Dart.
The # is reminiscent of a symbol, which is already way to mention a source name.
Since the following is always an identifier, there is no problem not having an end delimiter. Using a symbol literal would keep us inside the existing grammar, just allowing identifer | symbolLiteral in some places (we'd obviously have to check that we won't introduce ambiguities that way).

Using something else would keep `-quoted strings as an option in the grammar (and single-` quoting would still prevent that).

@mit-mit
Copy link
Member

mit-mit commented Mar 18, 2019

Just my 5-cents: I found ` familiar because of it's use in markdown.

@yjbanov
Copy link

yjbanov commented Mar 18, 2019

@mit-mit markdown familiarity might actually be an issue. Backticks appearing in code snippets inside dartdocs (which are markdown) could interfere with the syntax.

@yjbanov
Copy link

yjbanov commented Mar 18, 2019

Having said that, I do like the backticks in code.

@munificent
Copy link
Member Author

munificent commented Mar 18, 2019

Since the following is always an identifier, there is no problem not having an end delimiter.

A minor point, but if we ever want to use this as a feature to enable interop with other languages that have different identifier lexical rules, having an end delimiter might be useful:

dom.css.`background-color` = red;

Using a symbol literal would keep us inside the existing grammar, just allowing identifer | symbolLiteral in some places (we'd obviously have to check that we won't introduce ambiguities that way).

If you want to use an escaped identifier to call a getter on the implicit this, then using symbol literal syntax collides with using a symbol literal as an expression:

class Foo {
  get #for => "weird, but whatever";
  baz() {
    #for // Symbol literal or getter call on this?
  }
}

@mit-mit markdown familiarity might actually be an issue. Backticks appearing in code snippets inside dartdocs (which are markdown) could interfere with the syntax.

That's a good point, though I imagine cases where you want some inline code in Markdown to reference an escaped Dart identifier are rare. When you do need to, Markdown has a way to escape the backticks inside inline code.

@lrhn
Copy link
Member

lrhn commented Mar 18, 2019

If we allow "quoted identifiers" that are not just identifiers or reserved words, then we need something more, and an end quote character is a good idea.

If we do that, would we want to do it from the beginning, and allow any string as an "identifier":

int get `something or other` => 42;
void `throw 💩`() => throw "💩";

It seems ridiculous and exploitable, but it would also allow non-ASCII identifiers like

void get `blåbærgrød` => getBerries().boil();

so it might get used.

@Hixie
Copy link

Hixie commented Mar 19, 2019

I'm not a fan of backticks because several languages (e.g. bash, perl) use it for other purposes.
Backslash seems pretty reasonable, though it doesn't let you use any arbitrary string.
If we go with something like backticks, would escapes be allowed within the sequence?

   var `foo\nbar\`baz` = 2;

@rakudrama
Copy link
Member

Don't forget that identifiers are used in various doc-comment contexts , so I would suggest finding something that works well with markdown, i.e. inside back-ticks and [] references.

Idle question - why is blåbærgrød not already an identifier?

@lrhn
Copy link
Member

lrhn commented Mar 19, 2019

@rakudrama

Idle question - why is blåbærgrød not already an identifier?

Are you asking how the specification defines an identifier so it doesn't include blåbærgrød, or why we have not yet changed the spec to allow it?

Dart identifiers must be ASCII only. In fact, non-ASCII characters can only occur in Dart code inside strings or in comments.

Some languages, like Java and JavaScript, allow Unicode identifiers. That obviously has a cost for parsing (and security), but allows words in other languages to be used as source code identifiers.
Other languages, like C++, do not. Dart has so far chosen not to go there.

@DanTup
Copy link

DanTup commented Mar 20, 2019

F# uses backticks, but they have to be doubled:

let ``let`` = 75

I'm not sure if that makes markdown escaping easier or harder though 😄

var \escaped = o.\escaped;

I think using blackslashes or only having a marker at the start is a bit confusing since it looks like it might just be escaping a character rather than defining the whole word.

Also, I suspect it's not a goal here, but in F#'s double-backticks you can use characters that aren't normally valid, like spaces (and I think other punctuation):

let ``test that some thing happens with the thing``() = ...

Doing that raises other questions (like how to call from un-opted code), but it was a nice feature there to avoid munging descriptions into names_like_this. I'm not sure if it was often used outside of tests though, and Dart has its own way of handling them.

@MisterJimson
Copy link

Just wanted to chime in and note that C# uses an @ for this and I feel it works pretty well.

    public class @class
    {
        public int age;
    }
    class Program
    {
        static void Main(string[] args)
        {
            @class p1 = new @class();
            p1.age = 10;
            Console.WriteLine("Age: "+p1.age);
            Console.WriteLine("Press Enter Key to Exit..");
            Console.ReadLine();
        }
    }

@Dokotela
Copy link

Thanks so much for working on this! I'm pretty new, but I've enjoyed working in dart so far. I'm working with some complex json currently, and it uses 'class', 'list', 'extends', 'for', and 'assert' as variable names, so I'm looking forward to being able to escape them at some point in the future. In the meantime, any suggestions on working with them?

@munificent
Copy link
Member Author

In the meantime, any suggestions on working with them?

I would just pick a convention to tweak the name so that it doesn't collide with the reserved word. A trailing underscore would work, but looks funny in Dart where leading underscores are meaningful.

Note that unlike in JavaScript, JSON keys in Dart must always be quoted strings, not bare identifiers, so you won't have any collisions there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Proposed language feature that solves one or more problems
Projects
None yet
Development

No branches or pull requests

10 participants