Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Unterminated Regular Expression Error except when comment on next line #10

Open
jaydiablo opened this Issue Jul 3, 2012 · 3 comments

Comments

Projects
None yet
1 participant

With the latest update to JSMin I was able to get it to minify the d3.js library correctly, thanks again for that. So I decided to do a little work on the PHP port of JSMin to get it up to date with the latest release, however I ran into an inconsistency that I thought I would point out.

Once again, the d3 author(s) are doing something that JSLint doesn't recommend:

// Compute the angular scale factor: from value to radians.
var k = ((typeof endAngle === "function"
    ? endAngle.apply(this, arguments)
    : endAngle) - startAngle)
    / d3.sum(values);

Running JSMin on just this snippet will cause it to spit up the "Unterminated Regular Expression literal." due to the division operator being on its own line. Which made me question why/how it's getting through the whole d3.js file without encountering the same error. It appears to be because the next line in the file is a comment:

// Compute the angular scale factor: from value to radians.
var k = ((typeof endAngle === "function"
    ? endAngle.apply(this, arguments)
    : endAngle) - startAngle)
    / d3.sum(values);

// Optionally sort the data.

I believe that JSMin sees that first character of the comment line as the ending regex token, so doesn't complain in this situation.

As a test I decided to try modifying jsmin.c by changing the way it fetches the next character to use "next()" rather than "get()" so that the comments would be ignored. Changed from this (in the action function):

case 3:
        theB = next();
        if (theB == '/' && (theA == '(' || theA == ',' || theA == '=' ||
                            theA == ':' || theA == '[' || theA == '!' ||
                            theA == '&' || theA == '|' || theA == '?' ||
                            theA == '{' || theA == '}' || theA == ';' ||
                            theA == '\n')) {
            putc(theA, stdout);
            putc(theB, stdout);
            for (;;) {
                theA = get();
                if (theA == '[') {
                    for (;;) {
                        putc(theA, stdout);
                        theA = get();
                        if (theA == ']') {
                            break;
                        }
                        if (theA == '\\') {
                            putc(theA, stdout);
                            theA = get();
                        }
                        if (theA == EOF) {
                            error("Unterminated set in Regular Expression literal.");
                        }
                    }
                } else if (theA == '/') {
                    break;
                } else if (theA =='\\') {
                    putc(theA, stdout);
                    theA = get();
                }
                if (theA == EOF) {
                    error("Unterminated Regular Expression literal.");
                }
                putc(theA, stdout);
            }
            theB = next();
        }
    }

to this:

case 3:
        theB = next();
        if (theB == '/' && (theA == '(' || theA == ',' || theA == '=' ||
                            theA == ':' || theA == '[' || theA == '!' ||
                            theA == '&' || theA == '|' || theA == '?' ||
                            theA == '{' || theA == '}' || theA == ';' ||
                            theA == '\n')) {
            putc(theA, stdout);
            putc(theB, stdout);
            for (;;) {
                // Changing this to next() to test what it does ignoring comments
                //theA = get();
                theA = next();
                if (theA == '[') {
                    for (;;) {
                        putc(theA, stdout);
                        theA = get();
                        if (theA == ']') {
                            break;
                        }
                        if (theA == '\\') {
                            putc(theA, stdout);
                            theA = get();
                        }
                        if (theA == EOF) {
                            error("Unterminated set in Regular Expression literal.");
                        }
                    }
                } else if (theA == '/') {
                    break;
                } else if (theA =='\\') {
                    putc(theA, stdout);
                    theA = get();
                }
                if (theA == EOF) {
                    error("Unterminated Regular Expression literal.");
                }
                putc(theA, stdout);
            }
            theB = next();
        }
    }

Which made JSMin throw an error on this snippet:

// Compute the angular scale factor: from value to radians.
var k = ((typeof endAngle === "function"
    ? endAngle.apply(this, arguments)
    : endAngle) - startAngle)
    / d3.sum(values);

// Optionally sort the data.

Which is more consistent than before, however, this snippet:

// Compute the angular scale factor: from value to radians.
var k = ((typeof endAngle === "function"
    ? endAngle.apply(this, arguments)
    : endAngle) - startAngle)
    / d3.sum(values);

// Optionally sort the data.

var x = y / z;

Will still get through minification without encountering the regular expression error, because it sees that division operator on the last line before it reaches EOF.

Is there something that can be done so that JSMin reliably throws this error, or reliably doesn't? It seems like a tricky situation.

Just realized that in the case of the comment following the division operator, JSMin actually leaves the comment, and moves code that follows up onto the same line as the comment. For example, this snippet:

// Compute the angular scale factor: from value to radians.
var k = ((typeof endAngle === "function"
    ? endAngle.apply(this, arguments)
    : endAngle) - startAngle)
    / d3.sum(values);

// Optionally sort the data.
var index = d3.range(data.length);
if (sort != null) index.sort(sort === d3_layout_pieSortByValue
    ? function(i, j) { return values[j] - values[i]; }
    : function(i, j) { return sort(data[i], data[j]); });

gets minified to:

var k=((typeof endAngle==="function"?endAngle.apply(this,arguments):endAngle)-startAngle)
/ d3.sum(values);

//Optionally sort the data.var index=d3.range(data.length);if(sort!=null)index.sort(sort===d3_layout_pieSortByValue?function(i,j){return values[j]-values[i];}:function(i,j){return sort(data[i],data[j]);});

Leaving a bunch of code in the comment.

Shouldn't JSMin complain if a regex tries to span multiple lines, rather than just EOF? In both webkit and firefox if I define a regex with a newline in it, the browser complains:

var regex = /[0-9]
.*/;

But JSMin doesn't.

If I change the two regex EOF checks from this:

if (theA == EOF) {

to

if (theA == EOF || theA == '\n') {

I can make JSMin reliably complain about that division operator on a new line situation, regardless of what comes later in the file (as long as the "/" it isn't on the same line).

This wouldn't fix d3.js, but at least JSMin wouldn't minify it without spitting up an error (which results in a JS error in the browser).

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment