Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Javascript not fully parsed #200

Open
bitwiseman opened this issue Mar 23, 2013 · 8 comments
Open

Javascript not fully parsed #200

bitwiseman opened this issue Mar 23, 2013 · 8 comments

Comments

@bitwiseman
Copy link
Member

This is the source of a lot of bugs. Also, it is feature.

@bitwiseman
Copy link
Member Author

js-beatifier processes a provided string into a formatted javascript string. It does this without fully parsing, instead opting for a set of best guesses about the syntax only in so much as it effects the formating. That incomplete parse makes for inaccuracies - bugs.

If we fully parsed the provided javascript, then ran a set of well-understood rules over that output, we would have fewer of these edge-case bugs like #56. Assuming the parser handled invalid javascript and recovered, we could then skip formatting on sections that were invalid.

On the other hand, the incomplete parsing make the code able to run fast and reasonably well, even on invalid javascript. Also, that is how js-beautify is implemented (in more than one language, at that).

Parsing will certainly become more complete over time, as specific bugs are reported and fixed, but it would be a huge undertaking to change this design choice within the current code base, basically a complete rewrite.

@einars
Copy link
Contributor

einars commented Mar 23, 2013

On the bright side, we do have a fairly good test base.

@evocateur
Copy link
Contributor

Having delved into the uglifyjs parser, I am strongly in favour of retaining the current string-based token parsing instead of adopting an actual JS parser. It is stupidly complex, and basically unportable (in a maintainable, recognizable way) to Python.

@bitwiseman
Copy link
Member Author

@evocateur Just out of curiosity, are you talking about the uglifyjs parser, the uglifyjs2 parser, or both?

@evocateur
Copy link
Contributor

Both, actually. Uglify v1 was terribly disorganized, and in that sense v2 is much more readable. It uses some really snappy code generation algorithms, at least as far as I can tell. I'm still pretty poor at understanding AST, but I'm getting better.

I actually spent some time a few months back trying to make Uglify2's "beautify" mode work to my preferences, and I got a tiny bit done, but nothing shippable. I ended up giving up the idea because js-beautify already handles 99% of what I wanted. There wasn't enough flexibility in Uglify2's API at the time, anyway, to justify a whole bunch of effort to get 1% of benefit.

To the point of this thread, I'm more concerned about API portability between Python and JS (if indeed that is to remain a priority for the project). Parsing JS, to me, represents an incompatible step away from the Python script, because it will make some very difficult-to-port AST modification logic (if indeed Python can somehow parse JS itself already).

@bitwiseman
Copy link
Member Author

So, what you're saying is Uglify (in fact, any full-fledged js parser) treats the js AST first and foremost, a set of tokens second, and formatting third (if at all). Beautifiers based on that model will, quite reasonably, walk the AST and perform modifications on the AST, but that is often overkill in terms of producing formatted code.

You're not opposed to building up some sort of tree tokens as long as it serves the primary features of js-beautifer.

@evocateur
Copy link
Contributor

Correct, I have nothing against ASTs, just a little hesitation on creating an undue maintenance burden between the implementations.

On the other hand, a quick google yielded Python projects like https://github.com/rspivak/slimit that provide JS lexing and parsing, resulting in an AST that could be operated on in the same fashion as the JS AST (provided by, say, uglify). Insofar as our implementations are as identical as possible, this seems like a reasonable compromise.

However, we still don't get JS parsing even with the various Python libraries, so the ability to catch syntax errors in a trivial way (instead of insanely complex AST manipulations) is lost. I'm not exactly broken up about that, as other tools exist (JSHint) to ensure those sorts of errors don't occur.

@rmariuzzo
Copy link

👍 I would love to see that issue fixed...

bitwiseman added a commit to bitwiseman/js-beautify that referenced this issue Apr 30, 2014
We've been dealing with limitations from incomplete parsing (beautifier#200) for ages.

Instead of parsing fully this change tokenizes fully before starting formatting.  This should open the door to simplifying parsing and formatting rules.

For example, to find an object literal currently, we have to check repeatedly later on for a matching pattern.  With this change we could determine as a block is opened whether to treat it as on object-literal or not.
bitwiseman added a commit to bitwiseman/js-beautify that referenced this issue May 1, 2014
We've been dealing with limitations from incomplete parsing (beautifier#200) for ages.

Instead of parsing fully this change tokenizes fully before starting formatting.  This should open the door to simplifying parsing and formatting rules.

For example, to find an object literal currently, we have to check repeatedly later on for a matching pattern.  With this change we could determine as a block is opened whether to treat it as on object-literal or not.
bitwiseman added a commit to bitwiseman/js-beautify that referenced this issue May 2, 2014
We've been dealing with limitations from incomplete parsing (beautifier#200) for ages.

Instead of parsing fully this change tokenizes fully before starting formatting.  This should open the door to simplifying parsing and formatting rules.

For example, to find an object literal currently, we have to check repeatedly later on for a matching pattern.  With this change we could determine as a block is opened whether to treat it as on object-literal or not.
bitwiseman added a commit to bitwiseman/js-beautify that referenced this issue Sep 16, 2014
We've been dealing with limitations from incomplete parsing (beautifier#200) for ages.

Instead of parsing fully this change tokenizes fully before starting formatting.  This should open the door to simplifying parsing and formatting rules.

For example, to find an object literal currently, we have to check repeatedly later on for a matching pattern.  With this change we could determine as a block is opened whether to treat it as on object-literal or not.
bitwiseman added a commit to bitwiseman/js-beautify that referenced this issue Sep 16, 2014
We've been dealing with limitations from incomplete parsing (beautifier#200) for ages.

Instead of parsing fully this change tokenizes fully before starting formatting.  This should open the door to simplifying parsing and formatting rules.

For example, to find an object literal currently, we have to check repeatedly later on for a matching pattern.  With this change we could determine as a block is opened whether to treat it as on object-literal or not.
bitwiseman added a commit to bitwiseman/js-beautify that referenced this issue Sep 17, 2014
We've been dealing with limitations from incomplete parsing (beautifier#200) for ages.

Instead of parsing fully this change tokenizes fully before starting formatting.  This should open the door to simplifying parsing and formatting rules.

For example, to find an object literal currently, we have to check repeatedly later on for a matching pattern.  With this change we could determine as a block is opened whether to treat it as on object-literal or not.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants