Add a parseExpression public method #213

jeromew · 2016-11-01T23:01:58Z

Q	A
Bug fix?	no
Breaking change?	no
New feature?	yes
Deprecations?	no
Spec compliancy?	yes
Tests added/pass?	yes
Fixed tickets	#210
License	MIT

This change implements a parseExpression public method that allows to parse a single Expression with performance in mind.

This method's objective is to replace the module is-babylon-expression that stopped working after the migration to rollup since it cannot have access to the parser and inherit from the Parser anymore.

The ability to parse an expression with performance in mind is needed by the templating language jade/pug which allows raw expressions in the templates. I need to parse these expressions in order to assert the expressions and generate an AST tree.

I am looking for a way to replace the now broken is-expression-babylon that would be acceptable for the babylon project and I hope that this PR could maybe be accepted in the project.

Do not hesitate to tell me and give me directions if there anything I can change to the PR (naming, convention, tests) in order for it to be merged.

Thank you

loganfsmyth · 2016-11-02T05:33:19Z

The ability to parse an expression with performance in mind is needed

This particular motivation seems to be making a lot of assumptions about performance with no data to back it up, I'd be quite surprised it it made that much of a difference.

Since I don't want to shut this down outright, can you help me understand the usecase for parsing individual expressions explicitly, where there won't be statement-level syntax in place to allow parsing using the existing statement parser?

I'm definitely hesitant on this, since expressions on their own don't really have a parser target in the JS language grammar, so parsing a standalone expression ends up needing more flags, e.g. is the snippet strict? is it a module or script? Can it have yield? Can it have await?

TimothyGu · 2016-11-02T07:01:11Z

Pug has a syntax that could be exemplified by the following

p #{expression}

where expression is any JavaScript expression. We current use Acorn to check if expression is a valid one or not, and that is one of the biggest improvements of Pug v2 over Jade v1, but Babylon would provide support for next-generation JS, which is our long term goal. Not being able to prevent cases such as the ones listed in #210 (comment) from compiling would be a major no-no for us. (Of course, one can argue that we should introspect and walk the AST to see if expression actually is an Expression, but that seems way too complicated for such a task.)

With regards to performance. Indeed, I would be surprised if direct access to Babylon internals (parseExpression) instead of going through the usual Program > ExpressionStatement > Expression route makes parsing any faster. But, if we were to adopt the "correct" way of traversing the AST after every parse, there would be a nontrivial performance overhead. Furthermore, we are in the process of investigating migrating the entire Pug code generator to babel-generate, and being able to not parse the expression multiple times (once in the lexer during syntax check, once in the Pug code generator to build a Babylon AST) does provide a significant performance improvement.

I understand your hesitation, but having a function that parses expression is not without precedent. For one, Acorn provides a parseExpressionAt that is slightly different from our use case but still very similar. And I think this function will be beneficial to Babylon as well.

jeromew · 2016-11-02T09:41:24Z

To clarify the given example

p(style={color: '#' + c }) #{expression}
p
  = intro ? intro.toLowerCase() : ""
  br
  = paragraph

is converted to something like (simplified)

var html = '<p style="color:'+escape('#' + c)+'">' + escape(expression) + '</p>'
html += '<p> + escape(intro ? intro.toLowerCase() : "") + '<br>' + escape(paragraph) + </p>

and this is why we need to parse the expressions directly.

I admit that I invoked performance without a satisfying benchmark to back it up. A pug template can have many expressions to parse (content, attributes, ..) and a website can be composed of hundreds of templates so I believe that lowering the footprint of the code path over the compilation of a whole website can have an impact.

For what it's worth, i did a small benchmark on my machine

var Benchmark = require('benchmark');
var babylon = require('./')

var suite = new Benchmark.Suite;
var expr = 'a = b';

// add tests
suite.add('parse (expr)', function() {
  babylon.parse('('+expr+')').program.body[0].expression
})
.add('parse expr=expr', function() {
  babylon.parse('expr='+expr).program.body[0].expression.right;
})
.add('parseExpression', function() {
  babylon.parseExpression(expr)
})
// add listeners
.on('cycle', function(event) {
  console.log(String(event.target));
})
.on('complete', function() {
  console.log('Fastest is ' + this.filter('fastest').map('name'));
})
.run();

The result of several runs on simple expressions (we have many simple expressions to parse and more occasional complex expression) is roughly stable with a 40% gain in ops/sec for parseExpression.

expr: "a = b"

parse (expr) x 81,714 ops/sec ±5.68% (78 runs sampled)
parse expr=expr x 89,567 ops/sec ±2.72% (81 runs sampled)
parseExpression x 145,622 ops/sec ±1.53% (83 runs sampled)
Fastest is parseExpression

expr: "a + b + c"

parse "(expr)" x 71,259 ops/sec ±7.87% (73 runs sampled)
parse "expr=expr" x 70,723 ops/sec ±3.21% (83 runs sampled)
parseExpression x 116,130 ops/sec ±1.55% (83 runs sampled)
Fastest is parseExpression

expr: !a

parse "(expr)" x 104,670 ops/sec ±3.49% (77 runs sampled)
parse "expr=expr" x 83,771 ops/sec ±3.57% (81 runs sampled)
parseExpression x 183,886 ops/sec ±1.47% (82 runs sampled)
Fastest is parseExpression

expr: a ? a : ""

parse "(expr)" x 50,379 ops/sec ±54.81% (76 runs sampled)
parse "expr=expr" x 64,427 ops/sec ±4.04% (80 runs sampled)
parseExpression x 111,606 ops/sec ±1.55% (87 runs sampled)
Fastest is parseExpression

correct parsing is another aspect as we saw that neither the "(expr)" nor the "expr=expr" trick leads to a correct parser because of edge cases like "a)(" or "=a" that get incorrectly parsed as expressions.

I could add tests if you guide me as to what kind of tests would be needed.
There were a bunch of tests in is-expression-babylon that I could port over if they fit (https://github.com/pugjs/is-expression-babylon/blob/master/test/index.js) but I think the expression parser is already tested.

Thanks for your consideration.

ForbesLindesay · 2016-11-02T15:00:27Z

@jeromew's benchmarks seem pretty compelling to me. My main motivation for wanting this was to fix the hack of having to; add characters to wrap the expression, discount those characters from the line & column positions of the resulting AST, dig into the body of the resulting AST to find the actual expression, guard against corner cases like a)( which otherwise cause havoc.

jeromew · 2016-11-09T20:02:32Z

Is there anything I can do to improve my PR ? Here is the current summary of the pros/cons as I understand them.

babylon has a Parser that allows direct parsing of expressions without parsing at the statement level (cf PR).

pug (ex jade) is a templating language that mixes html and direct javascript expresssions that needs to parse a lot of expressions when templates are compiled (1) to ensure that the templates are correct (2) to give appropriate debug messages with line/col information (3) to directly build an AST tree version of the template.

pros:

it is 40% faster than statement level alternatives
it avoids parsing errors compared to parsing (expr) or e=expr at the statement level when expr has degenerate values like a)( or ==b
it avoids line/col numbering offset that appear when parsing (expr) or e=expr at the statement level
parsing expressions directly is a requested feature by parser users (even if low frequency). cf Parsing just expressions acornjs/acorn#97 , acorn parseExpressionAt function
expression parsing would benefit from the great parsing engine of babylon
it would allow to replace the is-expression-babylon module that was broken when rollup was implemented in babylon

cons:

it adds a new public entry point on babylon
Expressions don't really have a parser target in the JS language grammar

If this PR cannot be polished until it meets merging status, could we consider exporting the Parser or exporting enough entry points that is-expression-babylon could be rebuilt ?

Thanks for your time.

danez · 2016-12-08T17:05:34Z

Ok so I merged #240 and if you rebase you can create a second fixture runner for parseExpression.
As for the folder name maybe expressions or expressionFixtures?
And you need to copy index.js and adjust it for the expressions.

Right now both of the new functions are uncovered, so at least enough tests to cover all lines. (https://codecov.io/gh/babel/babylon/commit/e800d72acce6763ab7db5d1e837ddfd31687b890) Maybe some counter tests (as discussed above) to see that a)( and ==b don't parse.
If you have already more tests in is-expression-babylon for edge cases/regressions which you want to move here, this is also fine. More tests is always fine. :)

Sorry that it took so long.

jeromew · 2016-12-16T09:15:51Z

@danez I squashed my PR, adding tests. Apparently codecov does not update its review.

The tests are in the test/expressions directory. I added an esprima test set based on the expressions test i could find in fixtures/esprima (keeping the LICENCE file I found in this dir). and an is-babylon-expression dir with the tests that were in this module.

You will see I added a new denyLineComment option (defaults to false) to babylon.
This was a trade-off that I measured via benchmarks to keep a feature that was in is-babylon-expression, raising an exception for expressions that finish with a lineComment.
@TimothyGu, note that I reversed the default behavior of is-babylon-expression to avoid having a different default for parse and parseExpression so lineComment: true is not necessary anymore, and all other cases need a denyLineComment: true

Tell me what you think and if we agree on this. I will then add documentation on the README regarding the new option and parseExpression API.

Thanks

danez · 2017-01-02T09:36:32Z

Could you please rebase the commit on current master? The commit it is currently based on seems to be broken, that's why codecov is not updating.

codecov-io · 2017-01-06T08:34:20Z

Current coverage is 97.27% (diff: 100%)

Merging #213 into master will increase coverage by <.01%

@@             master       #213   diff @@
==========================================
  Files            21         21          
  Lines          3999       4004     +5   
  Methods         489        482     -7   
  Messages          0          0          
  Branches       1169       1179    +10   
==========================================
+ Hits           3890       3895     +5   
  Misses           49         49          
  Partials         60         60

Powered by Codecov. Last update 28c467e...3dc7b8d

jeromew · 2017-01-06T08:54:30Z

@danez I just rebased the PR.
coverage information was calculated on https://travis-ci.org/babel/babylon/jobs/189441861
but coverage seems to have a problem updating its view of this PR - https://codecov.io/gh/babel/babylon/pull/213

I think it wants to start its history on the first commit I did that was squashed and replaced. This is probably linked to https://github.com/codecov/support/issues/126 as the coverage page states "Commit was deleted by force push or pull request rebase."

Do you want me to try and open another pull request ? I could create a new branch on my side with identical commits and send it as a PR forcing github + coverage to start a new thread.

danez · 2017-01-07T23:41:04Z

Can you elaborate more on the new option you added? From your comment it sounds that this is added because of performance reasons? Somehow throwing an exception, because it is slower doesn't make sense to me.
In general I'm a little bit opposed to the new option, as we try to be complete ecma-script parser and not a customizable parser where certain (specified) features can be turned off/on.

ForbesLindesay · 2017-01-17T13:52:27Z

@jeromew I don't think we should have denyLineComment in here. We should either implement that as part of isExpression or not at all, even if there is a performance hit. In the long run, we will be using babel-code-gen to generate the output and it won't matter if something has line comments in.

The denyLineComment option doesn't really make sense as part of babylon.

jeromew · 2017-01-17T17:03:37Z

@danez, @ForbesLindesay you are right. I tried to fit the "deny comment" feature that is-babylon-expression had. I agree that it does not seem reasonable that my implemention of that would tend to add a such a morphing option to the parser. I will remove it.

jeromew · 2017-01-17T18:32:21Z

@danez I removed the denyLineComment. Thanks for the feedback.

After further examination, I think that parseExpression users will have the opportunity to detect LineComments by simply looking at the trailingComments property that babel adds on the parsed Node. This is simpler and cleaner.

TimothyGu · 2017-01-26T18:25:46Z

Any timeline for when this can be applied?

jeromew mentioned this pull request Nov 1, 2016

git clone not working pugjs/is-expression-babylon#1

Closed

danez added the Tag: New Feature label Nov 11, 2016

danez mentioned this pull request Nov 28, 2016

Prepare tests for multiple fixture runners. #240

Merged

jeromew force-pushed the master branch 2 times, most recently from 422f8d3 to 25286a5 Compare December 16, 2016 08:46

jeromew force-pushed the master branch from 25286a5 to 304aaaa Compare January 6, 2017 08:34

jeromew force-pushed the master branch from 304aaaa to 1ff1005 Compare January 17, 2017 18:06

Add a parseExpression public method

3dc7b8d

jeromew force-pushed the master branch from 1ff1005 to 3dc7b8d Compare January 17, 2017 18:23

jeromew mentioned this pull request Jan 18, 2017

Adding source endings to tokens pugjs/pug-lexer#76

Open

danez approved these changes Jan 20, 2017

View reviewed changes

danez merged commit 898c4a7 into babel:master Jan 27, 2017

hzoo mentioned this pull request May 26, 2017

Add a parseExpression method #210

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a parseExpression public method #213

Add a parseExpression public method #213

jeromew commented Nov 1, 2016 •

edited

Loading

loganfsmyth commented Nov 2, 2016

TimothyGu commented Nov 2, 2016

jeromew commented Nov 2, 2016 •

edited

Loading

ForbesLindesay commented Nov 2, 2016

jeromew commented Nov 9, 2016 •

edited

Loading

danez commented Dec 8, 2016 •

edited

Loading

jeromew commented Dec 16, 2016 •

edited

Loading

danez commented Jan 2, 2017

codecov-io commented Jan 6, 2017 •

edited

Loading

jeromew commented Jan 6, 2017

danez commented Jan 7, 2017 •

edited

Loading

ForbesLindesay commented Jan 17, 2017

jeromew commented Jan 17, 2017

jeromew commented Jan 17, 2017

TimothyGu commented Jan 26, 2017

Add a parseExpression public method #213

Add a parseExpression public method #213

Conversation

jeromew commented Nov 1, 2016 • edited Loading

loganfsmyth commented Nov 2, 2016

TimothyGu commented Nov 2, 2016

jeromew commented Nov 2, 2016 • edited Loading

ForbesLindesay commented Nov 2, 2016

jeromew commented Nov 9, 2016 • edited Loading

danez commented Dec 8, 2016 • edited Loading

jeromew commented Dec 16, 2016 • edited Loading

danez commented Jan 2, 2017

codecov-io commented Jan 6, 2017 • edited Loading

Current coverage is 97.27% (diff: 100%)

jeromew commented Jan 6, 2017

danez commented Jan 7, 2017 • edited Loading

ForbesLindesay commented Jan 17, 2017

jeromew commented Jan 17, 2017

jeromew commented Jan 17, 2017

TimothyGu commented Jan 26, 2017

jeromew commented Nov 1, 2016 •

edited

Loading

jeromew commented Nov 2, 2016 •

edited

Loading

jeromew commented Nov 9, 2016 •

edited

Loading

danez commented Dec 8, 2016 •

edited

Loading

jeromew commented Dec 16, 2016 •

edited

Loading

codecov-io commented Jan 6, 2017 •

edited

Loading

danez commented Jan 7, 2017 •

edited

Loading