Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Output let and const for variable declarations, without breaking changes #5377

Open
GeoffreyBooth opened this issue Sep 20, 2021 · 9 comments

Comments

@GeoffreyBooth
Copy link
Collaborator

Migrating this from #5344 (comment):

Now that CoffeeScript 2 outputs modern ES6+ syntax, we can use let and const in our output. Leaving aside the question of supporting block-scoped variables in the CoffeeScript input, (see #4985) there’s no reason that our generated output needs to use var when it could use let or const instead. We could therefore make two improvements in output code readability:

  1. Whenever possible, variable declarations should be placed at the block scope (via let) rather than always at the top of the function scope (the current behavior with var).

  2. Whenever possible, variable declarations should use const. This would be whenever a variable is never reassigned.

The key here is to not cause any breaking changes. So when in doubt, we would keep the current “define at the top of the function scope” behavior. A declaration/first assignment within a loop is a example of such a case. A let x within a loop means that x is declared multiple times (once for each iteration of the loop) which might be a breaking change for existing code.

If you look at this example:

if new Date().getHours() < 9
  breakfast = if new Date().getDay() is 6 then 'donuts' else 'coffee'
  alert "Time to make the #{breakfast}!"
else
  alert 'Time to get some work done.'

breakfast is only used within that if block, but it’s currently getting a var breakfast; line at the top of the output. The output instead could be this:

if (new Date().getHours() < 9) {
  let breakfast;

  breakfast = new Date().getDay() === 6 ? 'donuts' : 'coffee';
  alert(`Time to make the ${breakfast}!`);
} else {
  alert('Time to get some work done.');
}

And then phase 2 would see that breakfast is never reassigned and never referenced before its assignment (read about the let/const “temporal dead zone”) and use const instead:

if (new Date().getHours() < 9) {
  const breakfast = new Date().getDay() === 6 ? 'donuts' : 'coffee';
  alert(`Time to make the ${breakfast}!`);
} else {
  alert('Time to get some work done.');
}

I would try to achieve the two enhancements as separate PRs, probably with the block-scoping first.

@edemaine
Copy link
Contributor

I am looking at implementing this as it helps with TypeScript output (#5307). In TypeScript, var x = 5; implicitly declares x: number, whereas var x; x = 5; implicitly declares x: any. So if we can put declarations with first assignments, we would vastly reduce the need for explicit typing.

However, I am not convinced that there's significant benefit to automatically outputting let or const over var. CoffeeScript's scoping rules are very explicitly function-level, so var feels the most truthful.

So I would make almost the opposite proposal: keep using var, and push the declaration to the first assignment to the variable within the function scope. This is always backward compatible: the var declaration can go anywhere (even in a loop), but the assignment will happen where the var x = ... line appears. I was initially worried about destructuring assignments where only some of the variables are undeclared, but duplicate declarations are allowed, so we can do things like var [x, y] = [1, 2]; var [x, z] = ['foo', 4];. (TypeScript fully understand this, detecting an invalid assignment to x: number in the second line.)

Here's an example to think about:

if doingWell()
  status = 'OK'
else
  status = 'bad'

I'd propose that this outputs the following:

if (doingWell()) {
  var feeling = 'OK';
} else {
  feeling = 'bad';
}

In TypeScript, we get feeling: string. Such a transformation is not possible with let; with let, you'd need a blank let status at the top, which in TypeScript would define feeling: any.

The code above looks a little weird to humans, but I think only because we intuitively think of var behaving like let, whereas it really means "hoist a variable declaration to the top of this function". And it would be so much more helpful for static type checking to ensure that variables get assigned consistent types. (If you assign feeling a string in one case and a number in the other, you'd need to explicitly declare feeling of type string | number, as you would in TypeScript.)

The one case where we can't add var to an assignment is when we're doing deep restructuring, e.g. [x, array[1]] = [1, 2]. This can't be prefixed by var, so we'd need to put var x at the top as usual, or find another assignment to x.

@GeoffreyBooth
Copy link
Collaborator Author

I would make almost the opposite proposal: keep using var, and push the declaration to the first assignment to the variable within the function scope.

The intent of the original proposal was to have CoffeeScript output more idiomatic modern ES code, where people generally use only let and const. There was discussion somewhere that while CoffeeScript maintains all variables as function scoped, using var is more truthful as it reflects the scoping of the original CoffeeScript, and I see the logic in that. I remember doing a branch where I changed the CoffeeScript output from var to let everywhere (but kept the declarations all function-scoped, in the same places as before) and it felt like a step backward, because let implied block scoping where there wasn’t any. Hence the suggestion on this thread, where when we know that a variable is only used within a particular block, we declare it via let or const, choosing const when we know that it never gets reassigned. This would allow more idiomatic output most of the time, without breaking changes and without adding any complexity on the author’s part.

I was initially worried about destructuring assignments where only some of the variables are undeclared, but duplicate declarations are allowed

They may be allowed, but they’re strongly discouraged as they look like a bug, like the author forgot that a variable had already been declared. Banning repeated declarations is one of eslint’s core recommended rules. It also violates the CoffeeScript principle that our JavaScript output should be human readable.

So if we can put declarations with first assignments, we would vastly reduce the need for explicit typing.

I sympathize with this goal. How about as a first step you simply try to get const output working? Something like:

  • Update CoffeeScript’s scope tracking to not just track function scopes but also block scopes.
  • When a variable’s first appearance is an assignment, and its subsequent references are all in the same block or child blocks and it never gets reassigned, output it inline with const where the assignment happens (and exclude it from the list of variables to declare at the top of the function scope in the var line).

This alone would be an improvement over the current output, regardless of anything related to static types; and it would get you a long way toward your goal of output JavaScript that can be typed implicitly. Once this is achieved, the only question would be how to handle the cases written using let in TypeScript, which even in TypeScript often aren’t able to be implied.

To go back to your doingWell example, I feel like this is a reason I see ternaries so often in TypeScript and modern ES (made multiline here because real-life code is often complicated enough that it’s not kept on one line):

const status = doingWell() ?
  'OK' :
  'bad';

I feel like I understand why people do this. I get the desire to use const, to avoid three references to status (the two assignments and a let status; line), and the desire for the typing benefit. But CoffeeScript has the same, even more readable:

status = if doingWell()
  'OK'
else
  'bad'

And this would/should output as a ternary which could use const, which gets an implied type. So really the question becomes, once const is working, is what cases wouldn’t it cover? Hopefully very few, and I’d like to get JSDoc comments working (for function signatures and for variable declarations/assignments, to start with) and hopefully JSDoc could cover those edge cases that const can’t.

@edemaine
Copy link
Contributor

edemaine commented Jan 24, 2022

I'm pretty sure even const output would require a huge modification to the compiler. First, the compiler currently only tracks scopes at the function level, not block level, and it doesn't track all uses of a variable. More crucially, the compiler is currently essentially a single pass of compileNode calls. But when compiling a block A (say, an if clause), we won't have compiled later blocks (e.g. the else clause), so we won't know whether to include a letdeclaration for the variables assigned in A. Instead, we'd need to do a full initial pass to flag variable usage patterns, and then use that duringcompileNode` in a second pass.

I think switching to a two-pass compiler would be a major undertaking. (In fact, I just tried to do it, and ran into all sorts of trouble because so much of what we currently do in compileNode might need to move to the first pass.) There might also be a performance penalty.

I don't see how this extreme level of change makes sense for making output slightly more idiomatic/nice in modern ECMAScript. Even TypeScript supports var, so why not use it?

To your other question, there is lots of code that isn't amenable to const, and all of it would have the same typing problem. Some simple examples:

any = false
all = true
squares = []
for x in list
  any or= x
  all and= x
  squares.push x ** 2

Sure, this could be done with Array.prototype.some, Array.prototype.all, and array comprehensions, but there's lots of more complicated code that needs actual loops and modifies variables in the loop.

By contrast, in a single pass, my current branch generates the following code, which feels pretty idiomatic to me (but with var instead of let), and gets pretty good types for zero effort:

var any = false;  // : boolean
var all = true;  // : boolean
var squares = [];  // : any[]
for (var i = 0, len = list.length; i < len; i++) {
  var x = list[i];
  any || (any = x);
  all && (all = x);
  squares.push(x ** 2);
}

Note that x could only be declared var here.

In fact, TypeScript + any for loop is a good argument for var. We can translate for x in list then f(x) into

for (i = 0, len = list.length; i < len; i++) {
  var x = list[i];
  f(x);
}

By contrast, we can't use let x = list[i] (we'd get very different behavior), and if we put let x at the top, TypeScript won't know that x is only assigned to elements of list.

@GeoffreyBooth
Copy link
Collaborator Author

GeoffreyBooth commented Jan 24, 2022

I thought there was already something similar to the “two pass” approach happening in order to output the var line itself. As the nodes are being assembled, we keep track of all the variables that get created and where (in what scope) and then output the declarations at the top of the scope at the end of the “bottom up” compilation of the nodes within each scope. Within this logic, it tracks if a variable is already in the list of variables to declare (as in, it’s already been added to the list of variable names to output in the var line) and if so, it doesn’t re-add it. That “don’t re-add it” part could be where we also track if a variable has been reassigned, since reassignment and redeclaration are the same thing in CoffeeScript. Variables would be assumed to be not reassigned until discovered otherwise (the second time a previously encountered variable is assigned) which is how we could determine const eligibility without needing a second pass.

Updating CoffeeScript to understand block scopes in addition to function scopes would be a significant refactor, yes, though one that I think wouldn’t be bad to do on its own merits; but it’s only necessary if we want to output the const lines in place, as opposed to the top of the function scope. However we can only move const declarations, or var declarations for that matter, to the top of the scope if they have no side effects. So putting var any = false at the top of the scope is fine, but var any = shouldUseAny() is not.

@edemaine
Copy link
Contributor

edemaine commented Jan 24, 2022

I guess the line between one and two passes is a fuzzy one. By "pass" I meant one depth-first traversal of the parse tree. Currently, Assign::compileNode (for assignments) does both of the following:

  1. Add variable to scope if it isn't there already.
  2. Produce the JavaScript for the assignment.

And Code::compileNode (for functions) does the following:

  1. Recursively compile all its children. (So now the scope knows all the relevant variables.)
  2. Concatenate these, prepended by var declarations for everything in the scope.

Note that, by the time Code can do any analysis of how variables are used, the JavaScript for all assignments has already occurred, which is too late. To do function-level analysis, we'd need to split up the two steps of Assign (and probably all IdentifierLiterals which correspond to using the variable). So there'd be a "static analysis" phase where we recurse through the entire tree, before we recursively call compileNode through the entire tree. This is what I meant by adding a second pass. It's not impossible, but it's a lot of work.

By contrast, detecting that var can happen right here, and telling the scope that it no longer needs a var at the top (when that happens) can be done within the existing recursive structure. That's how my current code (almost done) works, and it seems to give all the TypeScript benefits I want. Perhaps that could be merged first, and this issue could be left for those who want to beautify the output in some cases further?

@GeoffreyBooth
Copy link
Collaborator Author

Perhaps that could be merged first, and this issue could be left for those who want to beautify the output in some cases further?

Sure, we can at least discuss that on its PR (or a new discussion issue). Improving type detection (the var stuff) and generating more idiomatic output (the original intent of this issue) are different enough goals that we shouldn’t conflate them.

@STRd6
Copy link
Contributor

STRd6 commented Nov 10, 2022

A related sub-issue could be to start using let in loop comprehensions:

for a in [1..b]
  console.log a
for (let i = 1, a = i, ref = b; (1 <= ref ? i <= ref : i >= ref); a = 1 <= ref ? ++i : --i) {
  console.log(a);
}

This will keep inaccessible refs confined to as narrow a scope as possible and will reduce the linear growth of ref vars in the whole file.

@Inve1951
Copy link
Contributor

Inve1951 commented Nov 11, 2022

a would still need to be declared in the outer scope and not be hidden by the loop's let.
If the value of a is accessed after loop execution anyways.

@STRd6
Copy link
Contributor

STRd6 commented Nov 11, 2022

@Inve1951 Good point, it should end up like:

var a;
for (let i = a = 1, ref = b; (1 <= ref ? i <= ref : i >= ref); a = 1 <= ref ? ++i : --i) {
  console.log(a);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants