Javascript not fully parsed #200

Open
bitwiseman opened this Issue Mar 23, 2013 · 8 comments

4 participants

@bitwiseman

This is the source of a lot of bugs. Also, it is feature.

@bitwiseman

js-beatifier processes a provided string into a formatted javascript string. It does this without fully parsing, instead opting for a set of best guesses about the syntax only in so much as it effects the formating. That incomplete parse makes for inaccuracies - bugs.

If we fully parsed the provided javascript, then ran a set of well-understood rules over that output, we would have fewer of these edge-case bugs like #56. Assuming the parser handled invalid javascript and recovered, we could then skip formatting on sections that were invalid.

On the other hand, the incomplete parsing make the code able to run fast and reasonably well, even on invalid javascript. Also, that is how js-beautify is implemented (in more than one language, at that).

Parsing will certainly become more complete over time, as specific bugs are reported and fixed, but it would be a huge undertaking to change this design choice within the current code base, basically a complete rewrite.

@einars

On the bright side, we do have a fairly good test base.

@evocateur

Having delved into the uglifyjs parser, I am strongly in favour of retaining the current string-based token parsing instead of adopting an actual JS parser. It is stupidly complex, and basically unportable (in a maintainable, recognizable way) to Python.

@bitwiseman

@evocateur Just out of curiosity, are you talking about the uglifyjs parser, the uglifyjs2 parser, or both?

@evocateur

Both, actually. Uglify v1 was terribly disorganized, and in that sense v2 is much more readable. It uses some really snappy code generation algorithms, at least as far as I can tell. I'm still pretty poor at understanding AST, but I'm getting better.

I actually spent some time a few months back trying to make Uglify2's "beautify" mode work to my preferences, and I got a tiny bit done, but nothing shippable. I ended up giving up the idea because js-beautify already handles 99% of what I wanted. There wasn't enough flexibility in Uglify2's API at the time, anyway, to justify a whole bunch of effort to get 1% of benefit.

To the point of this thread, I'm more concerned about API portability between Python and JS (if indeed that is to remain a priority for the project). Parsing JS, to me, represents an incompatible step away from the Python script, because it will make some very difficult-to-port AST modification logic (if indeed Python can somehow parse JS itself already).

@bitwiseman

So, what you're saying is Uglify (in fact, any full-fledged js parser) treats the js AST first and foremost, a set of tokens second, and formatting third (if at all). Beautifiers based on that model will, quite reasonably, walk the AST and perform modifications on the AST, but that is often overkill in terms of producing formatted code.

You're not opposed to building up some sort of tree tokens as long as it serves the primary features of js-beautifer.

@evocateur

Correct, I have nothing against ASTs, just a little hesitation on creating an undue maintenance burden between the implementations.

On the other hand, a quick google yielded Python projects like https://github.com/rspivak/slimit that provide JS lexing and parsing, resulting in an AST that could be operated on in the same fashion as the JS AST (provided by, say, uglify). Insofar as our implementations are as identical as possible, this seems like a reasonable compromise.

However, we still don't get JS parsing even with the various Python libraries, so the ability to catch syntax errors in a trivial way (instead of insanely complex AST manipulations) is lost. I'm not exactly broken up about that, as other tools exist (JSHint) to ensure those sorts of errors don't occur.

@rmariuzzo

👍 I would love to see that issue fixed...

@bitwiseman bitwiseman added a commit to bitwiseman/js-beautify that referenced this issue Apr 30, 2014
@bitwiseman bitwiseman Pure refactor to fully tokenize before formatting
We've been dealing with limitations from incomplete parsing (#200) for ages.

Instead of parsing fully this change tokenizes fully before starting formatting.  This should open the door to simplifying parsing and formatting rules.

For example, to find an object literal currently, we have to check repeatedly later on for a matching pattern.  With this change we could determine as a block is opened whether to treat it as on object-literal or not.
76b8478
@bitwiseman bitwiseman added a commit to bitwiseman/js-beautify that referenced this issue May 1, 2014
@bitwiseman bitwiseman Pure refactor to fully tokenize before formatting
We've been dealing with limitations from incomplete parsing (#200) for ages.

Instead of parsing fully this change tokenizes fully before starting formatting.  This should open the door to simplifying parsing and formatting rules.

For example, to find an object literal currently, we have to check repeatedly later on for a matching pattern.  With this change we could determine as a block is opened whether to treat it as on object-literal or not.
93a5d7d
@bitwiseman bitwiseman added a commit to bitwiseman/js-beautify that referenced this issue May 2, 2014
@bitwiseman bitwiseman Pure refactor to fully tokenize before formatting
We've been dealing with limitations from incomplete parsing (#200) for ages.

Instead of parsing fully this change tokenizes fully before starting formatting.  This should open the door to simplifying parsing and formatting rules.

For example, to find an object literal currently, we have to check repeatedly later on for a matching pattern.  With this change we could determine as a block is opened whether to treat it as on object-literal or not.
c7c2471
@bitwiseman bitwiseman added a commit to bitwiseman/js-beautify that referenced this issue Sep 16, 2014
@bitwiseman bitwiseman Pure refactor to fully tokenize before formatting
We've been dealing with limitations from incomplete parsing (#200) for ages.

Instead of parsing fully this change tokenizes fully before starting formatting.  This should open the door to simplifying parsing and formatting rules.

For example, to find an object literal currently, we have to check repeatedly later on for a matching pattern.  With this change we could determine as a block is opened whether to treat it as on object-literal or not.
0252529
@bitwiseman bitwiseman added a commit to bitwiseman/js-beautify that referenced this issue Sep 16, 2014
@bitwiseman bitwiseman Pure refactor to fully tokenize before formatting
We've been dealing with limitations from incomplete parsing (#200) for ages.

Instead of parsing fully this change tokenizes fully before starting formatting.  This should open the door to simplifying parsing and formatting rules.

For example, to find an object literal currently, we have to check repeatedly later on for a matching pattern.  With this change we could determine as a block is opened whether to treat it as on object-literal or not.
6c7f8d0
@bitwiseman bitwiseman added a commit to bitwiseman/js-beautify that referenced this issue Sep 17, 2014
@bitwiseman bitwiseman Pure refactor to fully tokenize before formatting
We've been dealing with limitations from incomplete parsing (#200) for ages.

Instead of parsing fully this change tokenizes fully before starting formatting.  This should open the door to simplifying parsing and formatting rules.

For example, to find an object literal currently, we have to check repeatedly later on for a matching pattern.  With this change we could determine as a block is opened whether to treat it as on object-literal or not.
25505cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment