This repository has been archived by the owner. It is now read-only.

Add a parseExpression public method #213

Merged
merged 1 commit into from Jan 27, 2017

Conversation

Projects
None yet
6 participants
@jeromew
Copy link
Contributor

jeromew commented Nov 1, 2016

Q A
Bug fix? no
Breaking change? no
New feature? yes
Deprecations? no
Spec compliancy? yes
Tests added/pass? yes
Fixed tickets #210
License MIT

This change implements a parseExpression public method that allows to parse a single Expression with performance in mind.

This method's objective is to replace the module is-babylon-expression that stopped working after the migration to rollup since it cannot have access to the parser and inherit from the Parser anymore.

The ability to parse an expression with performance in mind is needed by the templating language jade/pug which allows raw expressions in the templates. I need to parse these expressions in order to assert the expressions and generate an AST tree.

I am looking for a way to replace the now broken is-expression-babylon that would be acceptable for the babylon project and I hope that this PR could maybe be accepted in the project.

Do not hesitate to tell me and give me directions if there anything I can change to the PR (naming, convention, tests) in order for it to be merged.

Thank you

@jeromew jeromew referenced this pull request Nov 1, 2016

Closed

git clone not working #1

@loganfsmyth

This comment has been minimized.

Copy link
Member

loganfsmyth commented Nov 2, 2016

The ability to parse an expression with performance in mind is needed

This particular motivation seems to be making a lot of assumptions about performance with no data to back it up, I'd be quite surprised it it made that much of a difference.

Since I don't want to shut this down outright, can you help me understand the usecase for parsing individual expressions explicitly, where there won't be statement-level syntax in place to allow parsing using the existing statement parser?

I'm definitely hesitant on this, since expressions on their own don't really have a parser target in the JS language grammar, so parsing a standalone expression ends up needing more flags, e.g. is the snippet strict? is it a module or script? Can it have yield? Can it have await?

@TimothyGu

This comment has been minimized.

Copy link
Contributor

TimothyGu commented Nov 2, 2016

Pug has a syntax that could be exemplified by the following

p #{expression}

where expression is any JavaScript expression. We current use Acorn to check if expression is a valid one or not, and that is one of the biggest improvements of Pug v2 over Jade v1, but Babylon would provide support for next-generation JS, which is our long term goal. Not being able to prevent cases such as the ones listed in #210 (comment) from compiling would be a major no-no for us. (Of course, one can argue that we should introspect and walk the AST to see if expression actually is an Expression, but that seems way too complicated for such a task.)

With regards to performance. Indeed, I would be surprised if direct access to Babylon internals (parseExpression) instead of going through the usual Program > ExpressionStatement > Expression route makes parsing any faster. But, if we were to adopt the "correct" way of traversing the AST after every parse, there would be a nontrivial performance overhead. Furthermore, we are in the process of investigating migrating the entire Pug code generator to babel-generate, and being able to not parse the expression multiple times (once in the lexer during syntax check, once in the Pug code generator to build a Babylon AST) does provide a significant performance improvement.

I understand your hesitation, but having a function that parses expression is not without precedent. For one, Acorn provides a parseExpressionAt that is slightly different from our use case but still very similar. And I think this function will be beneficial to Babylon as well.

@jeromew

This comment has been minimized.

Copy link
Contributor Author

jeromew commented Nov 2, 2016

To clarify the given example

p(style={color: '#' + c }) #{expression}
p
  = intro ? intro.toLowerCase() : ""
  br
  = paragraph 

is converted to something like (simplified)

var html = '<p style="color:'+escape('#' + c)+'">' + escape(expression) + '</p>'
html += '<p> + escape(intro ? intro.toLowerCase() : "") + '<br>' + escape(paragraph) + </p>

and this is why we need to parse the expressions directly.

I admit that I invoked performance without a satisfying benchmark to back it up. A pug template can have many expressions to parse (content, attributes, ..) and a website can be composed of hundreds of templates so I believe that lowering the footprint of the code path over the compilation of a whole website can have an impact.

For what it's worth, i did a small benchmark on my machine

var Benchmark = require('benchmark');
var babylon = require('./')

var suite = new Benchmark.Suite;
var expr = 'a = b';

// add tests
suite.add('parse (expr)', function() {
  babylon.parse('('+expr+')').program.body[0].expression
})
.add('parse expr=expr', function() {
  babylon.parse('expr='+expr).program.body[0].expression.right;
})
.add('parseExpression', function() {
  babylon.parseExpression(expr)
})
// add listeners
.on('cycle', function(event) {
  console.log(String(event.target));
})
.on('complete', function() {
  console.log('Fastest is ' + this.filter('fastest').map('name'));
})
.run();

The result of several runs on simple expressions (we have many simple expressions to parse and more occasional complex expression) is roughly stable with a 40% gain in ops/sec for parseExpression.

expr: "a = b"

parse (expr) x 81,714 ops/sec ±5.68% (78 runs sampled)
parse expr=expr x 89,567 ops/sec ±2.72% (81 runs sampled)
parseExpression x 145,622 ops/sec ±1.53% (83 runs sampled)
Fastest is parseExpression

expr: "a + b + c"

parse "(expr)" x 71,259 ops/sec ±7.87% (73 runs sampled)
parse "expr=expr" x 70,723 ops/sec ±3.21% (83 runs sampled)
parseExpression x 116,130 ops/sec ±1.55% (83 runs sampled)
Fastest is parseExpression

expr: !a

parse "(expr)" x 104,670 ops/sec ±3.49% (77 runs sampled)
parse "expr=expr" x 83,771 ops/sec ±3.57% (81 runs sampled)
parseExpression x 183,886 ops/sec ±1.47% (82 runs sampled)
Fastest is parseExpression

expr: a ? a : ""

parse "(expr)" x 50,379 ops/sec ±54.81% (76 runs sampled)
parse "expr=expr" x 64,427 ops/sec ±4.04% (80 runs sampled)
parseExpression x 111,606 ops/sec ±1.55% (87 runs sampled)
Fastest is parseExpression

correct parsing is another aspect as we saw that neither the "(expr)" nor the "expr=expr" trick leads to a correct parser because of edge cases like "a)(" or "=a" that get incorrectly parsed as expressions.

I could add tests if you guide me as to what kind of tests would be needed.
There were a bunch of tests in is-expression-babylon that I could port over if they fit (https://github.com/pugjs/is-expression-babylon/blob/master/test/index.js) but I think the expression parser is already tested.

Thanks for your consideration.

@ForbesLindesay

This comment has been minimized.

Copy link
Contributor

ForbesLindesay commented Nov 2, 2016

@jeromew's benchmarks seem pretty compelling to me. My main motivation for wanting this was to fix the hack of having to; add characters to wrap the expression, discount those characters from the line & column positions of the resulting AST, dig into the body of the resulting AST to find the actual expression, guard against corner cases like a)( which otherwise cause havoc.

@jeromew

This comment has been minimized.

Copy link
Contributor Author

jeromew commented Nov 9, 2016

Is there anything I can do to improve my PR ? Here is the current summary of the pros/cons as I understand them.

babylon has a Parser that allows direct parsing of expressions without parsing at the statement level (cf PR).

pug (ex jade) is a templating language that mixes html and direct javascript expresssions that needs to parse a lot of expressions when templates are compiled (1) to ensure that the templates are correct (2) to give appropriate debug messages with line/col information (3) to directly build an AST tree version of the template.

pros:

  • it is 40% faster than statement level alternatives
  • it avoids parsing errors compared to parsing (expr) or e=expr at the statement level when expr has degenerate values like a)( or ==b
  • it avoids line/col numbering offset that appear when parsing (expr) or e=expr at the statement level
  • parsing expressions directly is a requested feature by parser users (even if low frequency). cf acornjs/acorn#97 , acorn parseExpressionAt function
  • expression parsing would benefit from the great parsing engine of babylon
  • it would allow to replace the is-expression-babylon module that was broken when rollup was implemented in babylon

cons:

  • it adds a new public entry point on babylon
  • Expressions don't really have a parser target in the JS language grammar

If this PR cannot be polished until it meets merging status, could we consider exporting the Parser or exporting enough entry points that is-expression-babylon could be rebuilt ?

Thanks for your time.

@danez

This comment has been minimized.

Copy link
Member

danez commented Dec 8, 2016

Ok so I merged #240 and if you rebase you can create a second fixture runner for parseExpression.
As for the folder name maybe expressions or expressionFixtures?
And you need to copy index.js and adjust it for the expressions.

Right now both of the new functions are uncovered, so at least enough tests to cover all lines. (https://codecov.io/gh/babel/babylon/commit/e800d72acce6763ab7db5d1e837ddfd31687b890) Maybe some counter tests (as discussed above) to see that a)( and ==b don't parse.
If you have already more tests in is-expression-babylon for edge cases/regressions which you want to move here, this is also fine. More tests is always fine. :)

Sorry that it took so long.

@jeromew jeromew force-pushed the jeromew:master branch 2 times, most recently from 422f8d3 to 25286a5 Dec 16, 2016

@jeromew

This comment has been minimized.

Copy link
Contributor Author

jeromew commented Dec 16, 2016

@danez I squashed my PR, adding tests. Apparently codecov does not update its review.

The tests are in the test/expressions directory. I added an esprima test set based on the expressions test i could find in fixtures/esprima (keeping the LICENCE file I found in this dir). and an is-babylon-expression dir with the tests that were in this module.

You will see I added a new denyLineComment option (defaults to false) to babylon.
This was a trade-off that I measured via benchmarks to keep a feature that was in is-babylon-expression, raising an exception for expressions that finish with a lineComment.
@TimothyGu, note that I reversed the default behavior of is-babylon-expression to avoid having a different default for parse and parseExpression so lineComment: true is not necessary anymore, and all other cases need a denyLineComment: true

Tell me what you think and if we agree on this. I will then add documentation on the README regarding the new option and parseExpression API.

Thanks

@danez

This comment has been minimized.

Copy link
Member

danez commented Jan 2, 2017

Could you please rebase the commit on current master? The commit it is currently based on seems to be broken, that's why codecov is not updating.

@jeromew jeromew force-pushed the jeromew:master branch from 25286a5 to 304aaaa Jan 6, 2017

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Jan 6, 2017

Current coverage is 97.27% (diff: 100%)

Merging #213 into master will increase coverage by <.01%

@@             master       #213   diff @@
==========================================
  Files            21         21          
  Lines          3999       4004     +5   
  Methods         489        482     -7   
  Messages          0          0          
  Branches       1169       1179    +10   
==========================================
+ Hits           3890       3895     +5   
  Misses           49         49          
  Partials         60         60          

Powered by Codecov. Last update 28c467e...3dc7b8d

@jeromew

This comment has been minimized.

Copy link
Contributor Author

jeromew commented Jan 6, 2017

@danez I just rebased the PR.
coverage information was calculated on https://travis-ci.org/babel/babylon/jobs/189441861
but coverage seems to have a problem updating its view of this PR - https://codecov.io/gh/babel/babylon/pull/213

I think it wants to start its history on the first commit I did that was squashed and replaced. This is probably linked to https://github.com/codecov/support/issues/126 as the coverage page states "Commit was deleted by force push or pull request rebase."

Do you want me to try and open another pull request ? I could create a new branch on my side with identical commits and send it as a PR forcing github + coverage to start a new thread.

@danez

This comment has been minimized.

Copy link
Member

danez commented Jan 7, 2017

Can you elaborate more on the new option you added? From your comment it sounds that this is added because of performance reasons? Somehow throwing an exception, because it is slower doesn't make sense to me.
In general I'm a little bit opposed to the new option, as we try to be complete ecma-script parser and not a customizable parser where certain (specified) features can be turned off/on.

@ForbesLindesay

This comment has been minimized.

Copy link
Contributor

ForbesLindesay commented Jan 17, 2017

@jeromew I don't think we should have denyLineComment in here. We should either implement that as part of isExpression or not at all, even if there is a performance hit. In the long run, we will be using babel-code-gen to generate the output and it won't matter if something has line comments in.

The denyLineComment option doesn't really make sense as part of babylon.

@jeromew

This comment has been minimized.

Copy link
Contributor Author

jeromew commented Jan 17, 2017

@danez, @ForbesLindesay you are right. I tried to fit the "deny comment" feature that is-babylon-expression had. I agree that it does not seem reasonable that my implemention of that would tend to add a such a morphing option to the parser. I will remove it.

@jeromew jeromew force-pushed the jeromew:master branch from 304aaaa to 1ff1005 Jan 17, 2017

@jeromew jeromew force-pushed the jeromew:master branch from 1ff1005 to 3dc7b8d Jan 17, 2017

@jeromew

This comment has been minimized.

Copy link
Contributor Author

jeromew commented Jan 17, 2017

@danez I removed the denyLineComment. Thanks for the feedback.

After further examination, I think that parseExpression users will have the opportunity to detect LineComments by simply looking at the trailingComments property that babel adds on the parsed Node. This is simpler and cleaner.

@danez

danez approved these changes Jan 20, 2017

@TimothyGu

This comment has been minimized.

Copy link
Contributor

TimothyGu commented Jan 26, 2017

Any timeline for when this can be applied?

@danez danez merged commit 898c4a7 into babel:master Jan 27, 2017

3 checks passed

codecov/patch 100% of diff hit (target 97.27%)
Details
codecov/project 97.27% (+<.01%) compared to 28c467e
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@hzoo hzoo referenced this pull request May 26, 2017

Closed

Add a parseExpression method #210

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.