New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid copying properties from the prototype chain. #6748

Merged
merged 2 commits into from Nov 6, 2017

Conversation

@bmeurer
Member

bmeurer commented Nov 5, 2017

I noticed that babylon spends a lot of time in what we call slow mode
for..in when running in Node (on V8), and the reason for that is that
the version distributed on npm is build with loose mode, which turns
methods on the prototype into enumerable properties. Let's look at a
simplified example of the State class from src/tokenizer/state.js:

class State {
  constructor() { this.x = 1; }

  clone() {
    var state = new State();
    for (var key in this) {
      var val = this[key];
      state[key] = val;
    }
    return state;
  }
}

According to the specification the State.prototype.clone method is
non-enumerable. However when transpiling this with loose mode, we get
the following output:

var State = (function() {
  function State() { this.x = 1; }

  State.prototype.clone = function clone() {
    var state = new State();
    for (var key in this) {
      var val = this[key];
      state[key] = val;
    }
    return state;
  }

  return State;
})();

So all of a sudden the State.prototype.clone method is enumerable.
This means that the for..in loop inside of that method enumerates
x and clone for key, whereas originally it was supposed to only
enumerate x. This in turn means that the shape of the result of a
call to clone will be different than the shape of a state that is
created via the State constructor. You can check this in d8 using
the --allow-natives-syntax flag and this simple test driver:

const s = new State;
%DebugPrint(s);
%DebugPrint(s.clone());

Using either the class version or the transpiled version we see:

$ out/Release/d8 --allow-natives-syntax state-original.js
0x2a9d7970d329 <State map = 0x2a9d40b0c751>
0x2a9d7970d3c1 <State map = 0x2a9d40b0c751>
$ out/Release/d8 --allow-natives-syntax state-loose.js
0x3729ee30d1b9 <State map = 0x3729af90c701>
0x3729ee30d251 <State map = 0x3729af90c7a1>

So as you can see, the transpiled version (using loose mode) produces
a different shape for the result of clone, whereas the original
version is fine. This pollutes all sites which use either a state
created from the State constructor or returned from the clone
method. The original one has only the x property in either case,
whereas in the transpiled version the result of clone has properties
x and clone on the instance.

To mitigate this effect, it's best to replace the for..in loops with
a combination of Object.keys and Array.prototype.filter, so that
the loop body only deals with own properties and not with properties
from the prototype chain. This change does exactly that for the two
affected clone functions.

On the babylon test in the web-tooling-benchmark, this improves
the runs/sec by around 5-7% alone.

Properly guard for..in loops with Object#hasOwnProperty.
I noticed that babylon spends a lot of time in what we call *slow mode*
`for..in` when running in Node (on V8), and the reason for that is that
the version distributed on npm is build with *loose mode*, which turns
methods on the prototype into enumerable properties. Let's look at a
simplified example of the `State` class from `src/tokenizer/state.js`:

```js
class State {
  constructor() { this.x = 1; }

  clone() {
    var state = new State();
    for (var key in this) {
      var val = this[key];
      state[key] = val;
    }
    return state;
  }
}
```

According to the specification the `State.prototype.clone` method is
non-enumerable. However when transpiling this with loose mode, we get
the following output:

```js
var State = (function() {
  function State() { this.x = 1; }

  State.prototype.clone = function clone() {
    var state = new State();
    for (var key in this) {
      var val = this[key];
      state[key] = val;
    }
    return state;
  }

  return State;
})();
```

So all of a sudden the `State.prototype.clone` method is enumerable.
This means that the `for..in` loop inside of that method enumerates
`x` and `clone` for `key`, whereas originally it was supposed to only
enumerate `x`. This in turn means that the shape of the result of a
call to `clone` will be different than the shape of a state that is
created via the `State` constructor. You can check this in `d8` using
the `--allow-natives-syntax` flag and this simple test driver:

```js
const s = new State;
%DebugPrint(s);
%DebugPrint(s.clone());
```

Using either the class version or the transpiled version we see:

```
$ out/Release/d8 --allow-natives-syntax state-original.js
0x2a9d7970d329 <State map = 0x2a9d40b0c751>
0x2a9d7970d3c1 <State map = 0x2a9d40b0c751>
$ out/Release/d8 --allow-natives-syntax state-loose.js
0x3729ee30d1b9 <State map = 0x3729af90c701>
0x3729ee30d251 <State map = 0x3729af90c7a1>
```

So as you can see, the transpiled version (using *loose mode*) produces
a different shape for the result of `clone`, whereas the original
version is fine. This pollutes all sites which use either a state
created from the `State` constructor or returned from the `clone`
method. The original one has only the `x` property in either case,
whereas in the transpiled version the result of `clone` has properties
`x` and `clone` on the instance.

To mitigate this effect, it's best to guard the `for..in` loops with
`Object.prototype.hasOwnProperty` calls, such that the actual body of
the loop only deals with own properties and not with properties from the
prototype chain. This change does exactly that for the two affected
`clone` functions.

In addition to the performance hit because of the unnecessary
polymorphism, there's also the performance hit because of the *slow
mode* `for..in` itself, which has to collect the properties from the
instance plus the prototype. Ideally the prototype properties shouldn't
be enumerable to avoid this whole set of problems. I see a couple of
possible solutions:

1. Distribute the original ES2015 version via npm.
2. Don't use loose mode, so that `Object.defineProperty` is used
   instead, correctly passing `enumerable:false`.
3. Globally change loose mode in Babel to generate the correct and
   fast `Object.defineProperty` instead.

I'd personally prefer a combination of 1. and 3. here, but I'm aware
that distributing the ES2015 code might not be an option yet. So the
mitigation of properly guarding the `for..in` here should already help.
But it'd be nice to have a discussion on using `Object.defineProperty`
in general, as I imagine that this could easily bite other applications
as well and this performance cliff is completely unobvious to
developers.
@babel-bot

This comment has been minimized.

Show comment
Hide comment
@babel-bot

babel-bot Nov 5, 2017

Collaborator

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/5675/

Collaborator

babel-bot commented Nov 5, 2017

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/5675/

@babel-bot

This comment has been minimized.

Show comment
Hide comment
@babel-bot

babel-bot Nov 5, 2017

Collaborator

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/5671/

Collaborator

babel-bot commented Nov 5, 2017

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/5671/

@Andarist

This comment has been minimized.

Show comment
Hide comment
@Andarist

Andarist Nov 5, 2017

Member

Is using Object.defineProperty costless or is it preferred here because we are talking about using it on the class' prototype - thus on initialization rather than during the "runtime" of the program?

Member

Andarist commented Nov 5, 2017

Is using Object.defineProperty costless or is it preferred here because we are talking about using it on the class' prototype - thus on initialization rather than during the "runtime" of the program?

@bmeurer

This comment has been minimized.

Show comment
Hide comment
@bmeurer

bmeurer Nov 5, 2017

Member

@Andarist It's usually slower than the direct assignment by itself. But the cost of having enumerable properties on prototypes combined with for..in is pretty high, so Object.defineProperty(prototype, name, {enumerable:false, value: function(){...}}) is very likely going to pay for itself.

Since the assignments to prototype are usually executed only once (in like 99.9% of the cases), the overhead of Object.defineProperty isn't that huge in that case, since the first time you execute an assignment, it will also take the slow-path.

Member

bmeurer commented Nov 5, 2017

@Andarist It's usually slower than the direct assignment by itself. But the cost of having enumerable properties on prototypes combined with for..in is pretty high, so Object.defineProperty(prototype, name, {enumerable:false, value: function(){...}}) is very likely going to pay for itself.

Since the assignments to prototype are usually executed only once (in like 99.9% of the cases), the overhead of Object.defineProperty isn't that huge in that case, since the first time you execute an assignment, it will also take the slow-path.

@azz

This comment has been minimized.

Show comment
Hide comment
@azz

azz Nov 6, 2017

Member

How does

Object.keys(this).forEach(key => {
  // ...
});

compare to

for (const key in this) {
  if (Object.prototype.hasOwnProperty.call(this, key)) {
    // ...
  }
}

in this scenario?

It certainly looks prettier.

Member

azz commented Nov 6, 2017

How does

Object.keys(this).forEach(key => {
  // ...
});

compare to

for (const key in this) {
  if (Object.prototype.hasOwnProperty.call(this, key)) {
    // ...
  }
}

in this scenario?

It certainly looks prettier.

@bmeurer

This comment has been minimized.

Show comment
Hide comment
@bmeurer

bmeurer Nov 6, 2017

Member

Good suggestion. That'll also address the problem. I'd go with a for..of loop instead of the Array#forEach tho. Should I update the PR?

Member

bmeurer commented Nov 6, 2017

Good suggestion. That'll also address the problem. I'd go with a for..of loop instead of the Array#forEach tho. Should I update the PR?

@Andarist

This comment has been minimized.

Show comment
Hide comment
@Andarist

Andarist Nov 6, 2017

Member

I'd go with a for..of loop instead of the Array#forEach tho.

Any particular reason or just personal preference/taste?

Member

Andarist commented Nov 6, 2017

I'd go with a for..of loop instead of the Array#forEach tho.

Any particular reason or just personal preference/taste?

@bmeurer

This comment has been minimized.

Show comment
Hide comment
@bmeurer

bmeurer Nov 6, 2017

Member

Personal preference is Array#forEach, but in the current shipping versions of Node the for..of loop will be faster. But that might not matter here, since the keys array is small. So I'm fine with either.

Member

bmeurer commented Nov 6, 2017

Personal preference is Array#forEach, but in the current shipping versions of Node the for..of loop will be faster. But that might not matter here, since the keys array is small. So I'm fine with either.

@Andarist

This comment has been minimized.

Show comment
Hide comment
@Andarist

Andarist Nov 6, 2017

Member

If the difference is negligible I'd go with the shorter Array#forEach, dunno what others think though.

Member

Andarist commented Nov 6, 2017

If the difference is negligible I'd go with the shorter Array#forEach, dunno what others think though.

@nicolo-ribaudo

This comment has been minimized.

Show comment
Hide comment
@nicolo-ribaudo

nicolo-ribaudo Nov 6, 2017

Member

I'd prefer forEach, since we would transpile the for-of loop anyway.

Member

nicolo-ribaudo commented Nov 6, 2017

I'd prefer forEach, since we would transpile the for-of loop anyway.

@bmeurer

This comment has been minimized.

Show comment
Hide comment
@bmeurer

bmeurer Nov 6, 2017

Member

Done. Pretty cool idea, thanks.

Member

bmeurer commented Nov 6, 2017

Done. Pretty cool idea, thanks.

@azz

azz approved these changes Nov 6, 2017

@Andarist Andarist merged commit de72ce6 into babel:master Nov 6, 2017

4 checks passed

babel/repl REPL preview is available
Details
ci/circleci Your tests passed on CircleCI!
Details
codecov/project 86.88% (target 80%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@bmeurer bmeurer changed the title from Properly guard for..in loops with Object#hasOwnProperty. to Avoid copying properties from the prototype chain. Nov 6, 2017

@bmeurer bmeurer deleted the bmeurer:for-in-guard branch Nov 6, 2017

@bmeurer

This comment has been minimized.

Show comment
Hide comment
@bmeurer

bmeurer Nov 6, 2017

Member

Thanks.

Member

bmeurer commented Nov 6, 2017

Thanks.

@Andarist

This comment has been minimized.

Show comment
Hide comment
@Andarist

Andarist Nov 6, 2017

Member

Thanks to you!

Member

Andarist commented Nov 6, 2017

Thanks to you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment