Javascript not fully parsed #200

bitwiseman · 2013-03-23T06:50:18Z

This is the source of a lot of bugs. Also, it is feature.

bitwiseman · 2013-03-23T07:30:03Z

js-beatifier processes a provided string into a formatted javascript string. It does this without fully parsing, instead opting for a set of best guesses about the syntax only in so much as it effects the formating. That incomplete parse makes for inaccuracies - bugs.

If we fully parsed the provided javascript, then ran a set of well-understood rules over that output, we would have fewer of these edge-case bugs like #56. Assuming the parser handled invalid javascript and recovered, we could then skip formatting on sections that were invalid.

On the other hand, the incomplete parsing make the code able to run fast and reasonably well, even on invalid javascript. Also, that is how js-beautify is implemented (in more than one language, at that).

Parsing will certainly become more complete over time, as specific bugs are reported and fixed, but it would be a huge undertaking to change this design choice within the current code base, basically a complete rewrite.

einars · 2013-03-23T07:33:43Z

On the bright side, we do have a fairly good test base.

evocateur · 2013-03-25T18:10:14Z

Having delved into the uglifyjs parser, I am strongly in favour of retaining the current string-based token parsing instead of adopting an actual JS parser. It is stupidly complex, and basically unportable (in a maintainable, recognizable way) to Python.

bitwiseman · 2013-04-01T18:07:12Z

@evocateur Just out of curiosity, are you talking about the uglifyjs parser, the uglifyjs2 parser, or both?

evocateur · 2013-04-01T18:16:40Z

Both, actually. Uglify v1 was terribly disorganized, and in that sense v2 is much more readable. It uses some really snappy code generation algorithms, at least as far as I can tell. I'm still pretty poor at understanding AST, but I'm getting better.

I actually spent some time a few months back trying to make Uglify2's "beautify" mode work to my preferences, and I got a tiny bit done, but nothing shippable. I ended up giving up the idea because js-beautify already handles 99% of what I wanted. There wasn't enough flexibility in Uglify2's API at the time, anyway, to justify a whole bunch of effort to get 1% of benefit.

To the point of this thread, I'm more concerned about API portability between Python and JS (if indeed that is to remain a priority for the project). Parsing JS, to me, represents an incompatible step away from the Python script, because it will make some very difficult-to-port AST modification logic (if indeed Python can somehow parse JS itself already).

bitwiseman · 2013-04-01T19:00:54Z

So, what you're saying is Uglify (in fact, any full-fledged js parser) treats the js AST first and foremost, a set of tokens second, and formatting third (if at all). Beautifiers based on that model will, quite reasonably, walk the AST and perform modifications on the AST, but that is often overkill in terms of producing formatted code.

You're not opposed to building up some sort of tree tokens as long as it serves the primary features of js-beautifer.

evocateur · 2013-04-01T20:42:11Z

Correct, I have nothing against ASTs, just a little hesitation on creating an undue maintenance burden between the implementations.

On the other hand, a quick google yielded Python projects like https://github.com/rspivak/slimit that provide JS lexing and parsing, resulting in an AST that could be operated on in the same fashion as the JS AST (provided by, say, uglify). Insofar as our implementations are as identical as possible, this seems like a reasonable compromise.

However, we still don't get JS parsing even with the various Python libraries, so the ability to catch syntax errors in a trivial way (instead of insanely complex AST manipulations) is lost. I'm not exactly broken up about that, as other tools exist (JSHint) to ensure those sorts of errors don't occur.

rmariuzzo · 2013-12-10T21:08:32Z

👍 I would love to see that issue fixed...

We've been dealing with limitations from incomplete parsing (beautifier#200) for ages. Instead of parsing fully this change tokenizes fully before starting formatting. This should open the door to simplifying parsing and formatting rules. For example, to find an object literal currently, we have to check repeatedly later on for a matching pattern. With this change we could determine as a block is opened whether to treat it as on object-literal or not.

bitwiseman mentioned this issue Mar 23, 2013

issue with semicolon-less javascript #56

Closed

bitwiseman mentioned this issue Apr 1, 2013

Reach agreement on key features #221

Open

bitwiseman mentioned this issue Apr 28, 2013

Format json in line #114

Closed

evocateur mentioned this issue Apr 29, 2013

Add option to extend the beautifier with custom tokenizers/formatters #250

Open

bitwiseman mentioned this issue May 3, 2013

Multiple var Arrays or Objects not indented properly. #256

Closed

bitwiseman mentioned this issue May 27, 2013

New line above function declarations in objects #141

Open

bitwiseman mentioned this issue Jun 6, 2013

Blocks, arrays, and expressions over indented #281

Closed

einars mentioned this issue Jul 12, 2013

throw could be a function #309

Closed

jdavisclark mentioned this issue Jul 23, 2013

Incorrect formating with semicolon-less code jdavisclark/JsFormat#75

Closed

bitwiseman mentioned this issue Aug 7, 2013

Chained code indents break at comment lines #314

Closed

jdavisclark mentioned this issue Sep 3, 2013

Incorrect formating with semicolon-less code #323

Closed

bitwiseman mentioned this issue Sep 10, 2013

Option to preserve or inline "short objects" on a single line #315

Closed

This was referenced Dec 6, 2013

Vertically align (indent) values in object literals #365

Open

Reserved words used as property/function/variable identifiers are formatted incorrectly #368

Closed

jdavisclark mentioned this issue Dec 10, 2013

Property with name ´default´ is formatted in a new line jdavisclark/JsFormat#89

Closed

bitwiseman mentioned this issue Dec 18, 2013

Ignore object formatting option? #370

Closed

evocateur mentioned this issue Jan 14, 2014

Selective ignore using comments (feature request) #384

Closed

bitwiseman mentioned this issue Jan 19, 2014

Support reserved words as property names #378

Closed

bitwiseman mentioned this issue Jan 31, 2014

Option to preserve existing var and object alignment (like preserve_array_indentation) #396

Open

This was referenced Mar 26, 2014

Ending brace missaligned when part of first definition in var line #430

Closed

line wrapping breaks in weird places #438

Closed

Cannot declare object literal properties with unquoted reserved words #440

Closed

bitwiseman mentioned this issue Apr 6, 2014

Support separate indent_size for wrapped parameters #447

Open

evocateur mentioned this issue Jun 25, 2014

Add support for unifying quote style (single- vs. double-quotes) of string literals #483

Open

This was referenced Sep 11, 2014

Unclosed string problem #505

Closed

daisy-chain indentation leads to over-indentation #482

Closed

bitwiseman mentioned this issue Sep 16, 2014

Refactor to fully tokenize before formatting #530

Merged

This was referenced Oct 3, 2014

TypeScript oddly formatted with 1.5.3 #552

Closed

Hybrid space_in_paren option, e.g. "});" rather than "} );" #373

Open

bitwiseman mentioned this issue Dec 9, 2014

Smart Tabs #591

Open

This was referenced Jan 29, 2015

Vertically align (indent) variable assignment #603

Open

Break chained methods option treats property access as chained method #606

Open

This was referenced Feb 9, 2015

align parameters #619

Open

Indentation of function inside if #621

Open

bitwiseman mentioned this issue Feb 2, 2016

Add "collapse-one-line" option for non-collapse brace styles #487

Closed

bitwiseman mentioned this issue Jun 13, 2016

beautify with long-ish variables #954

Open

bitwiseman mentioned this issue Nov 22, 2016

Are there options to force semicolons and braces? #1064

Closed

This was referenced Dec 23, 2016

Allow comment of a "else[ if]" block to be dedented #1069

Open

Wrong indentation, when new line between chained methods #892

Closed

bitwiseman added the language: javascript label Sep 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Javascript not fully parsed #200

Javascript not fully parsed #200

bitwiseman commented Mar 23, 2013

bitwiseman commented Mar 23, 2013

einars commented Mar 23, 2013

evocateur commented Mar 25, 2013

bitwiseman commented Apr 1, 2013

evocateur commented Apr 1, 2013

bitwiseman commented Apr 1, 2013

evocateur commented Apr 1, 2013

rmariuzzo commented Dec 10, 2013

Javascript not fully parsed #200

Javascript not fully parsed #200

Comments

bitwiseman commented Mar 23, 2013

bitwiseman commented Mar 23, 2013

einars commented Mar 23, 2013

evocateur commented Mar 25, 2013

bitwiseman commented Apr 1, 2013

evocateur commented Apr 1, 2013

bitwiseman commented Apr 1, 2013

evocateur commented Apr 1, 2013

rmariuzzo commented Dec 10, 2013