Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore tokens in RegExp #4

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Ignore tokens in RegExp #4

wants to merge 2 commits into from

Conversation

shuhei
Copy link

@shuhei shuhei commented Feb 15, 2014

Some tokens in RegExp literal caused the following problems:

  • Unclosed quotes mess up the state so that following require() calls are not properly detected. e.g. var pattern = /"/; require("a"); then a is not detected.
  • require() calls in RegExp literal are wrongly detected. e.g. var pattern = /require("a")/; then a is detected.

To fix those issues, this patch treats RegExp literals as strings with slash as quote. To keep supporting shebang that inevitably contains slashes, now it is handled before the parsing starts.

@shuhei
Copy link
Author

shuhei commented Feb 15, 2014

Uh, no. This patch can't distinguish between the start of regexp and the division operator...

@shuhei shuhei closed this Feb 15, 2014
@shuhei shuhei reopened this Feb 15, 2014
@shuhei
Copy link
Author

shuhei commented Feb 15, 2014

Revised the code to naively distinguish between regexp leteral and division operator. Now it assumes it's regexp literal if preceding token is one of (,=:[!&|?{};. But it makes mistakes for some cases like the following:

if (something) /'/.exec(x);
require("a");
return /"/.test(x);
require("b");

Then a or b won't be detected. Please check out a discussion on Stack Overflow](http://stackoverflow.com/questions/5519596/when-parsing-javascript-what-determines-the-meaning-of-a-slash) for more detail.

Should we do more complicated things to support those cases or keep this parser simple and just working for most of the cases?

@glenjamin
Copy link

I was having a go at fixing this, had something working before noticing your open PR.

My first stab also missed the division case entirely - although I did cover escape sequences in regexp literals.

I've pushed my work, including a whole bunch of failing scenarios
https://github.com/glenjamin/mine/blob/regex/test/files/regex.js

I've been using http://www.antlr3.org/grammar/1153976512034/ecmascriptA3.g and http://tomcopeland.blogs.com/EcmaScript.html for reference.

If you use the latter and start at http://tomcopeland.blogs.com/EcmaScript.html#prod68 clicking the left-most non-terminal you end up with the literal regex.
http://tomcopeland.blogs.com/EcmaScript.html#prod29 covers the division case
http://tomcopeland.blogs.com/EcmaScript.html#prod9 and http://tomcopeland.blogs.com/EcmaScript.html#prod59 cover the division assignment case

In theory we should be able to use these grammars to figure out the exact subset needed to reliably extract all the require() calls - or maybe we do actually need everything? If that's the case something based on http://zaach.github.io/jison/docs/ might be more apt.

Anyway, that's my 2¢ after spending the evening trying to figure out why i couldn't browserify some modules!

@dodo dodo mentioned this pull request Mar 8, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants