Fixed _intersects method. #115

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
2 participants
@ghost

ghost commented Mar 16, 2013

I think that _intersects() is wrongly implemented. Here's a simple suite of tests that show the error and a correct implementation:

var _intersects = function (start1, end1, start2, end2) {
    if (start2 >= start1 && start2 < end1) {
        return true;
    }

    return end2 > start1 && end2 < end1;
};

_intersects = function (start1, end1, start2, end2) {
    return start2 <= end1 && end2 >= start1;
};

// disjoint
console.log(false === _intersects(1, 3, 4, 7));
// corner case
console.log(true  === _intersects(1, 4, 4, 7));
// overlapping
console.log(true  === _intersects(1, 5, 4, 7));
// reverse args
console.log(false === _intersects(4, 7, 1, 3));
console.log(true  === _intersects(4, 7, 1, 4));
console.log(true  === _intersects(4, 7, 1, 5));

// equal
console.log(true === _intersects(1, 4, 1, 4));
// inside
console.log(true === _intersects(1, 4, 2, 3));
// left end
console.log(true === _intersects(1, 4, 1, 3));
// right end
console.log(true === _intersects(1, 4, 2, 4));
// reverse args
console.log(true === _intersects(1, 4, 1, 4));
console.log(true === _intersects(2, 3, 1, 4));
console.log(true === _intersects(1, 3, 1, 4));
console.log(true === _intersects(2, 4, 1, 4));
Owner

ccampbell commented Mar 17, 2013

So your implementation definitely looks more correct in terms of what intersect should mean.

I seem to remember doing this intentionally when I first wrote the library to make it so if a new match starts on the same character another match ends on the new one would still be processed. Obviously this is a place where a comment would have helped.

After implementing your change, a large number of the tests in tests/index.html fail. You can try running for yourself by opening that page in a browser.

I believe the reason I made it the way it is now is basically as a safety since javascript regex does not have positive or negative lookbehind support. This means if I have a rule matching = and then I have another rule matching = "some string" (looking for a string preceded by an equal sign) the string one would never get processed if it thinks that it intersects with the = alone. So there is basically a 1 character cushion right now of what is considered intersecting.

There is probably a better solution to this problem like building in fake support in rainbow for positive and negative lookbehind expressions and not including them in the match.

Is there a specific case you have found that is failing given the current implementation?

@ghost

ghost commented Mar 17, 2013

Here's the test that I've been playing with. The 0 in 100/0 is not highlighted:

<!DOCTYPE html>
<html>
    <head>
        <meta charset=utf-8 />
        <title>test</title>
        <script src="js/rainbow.js"></script>
        <script>
            Rainbow.extend('njs', [
                {
                    name: 'js-comment',
                    pattern: /\/\/[\s\S]*?$/gm
                },
                {
                    name: 'js-number',
                    pattern: /[0-9]+/g
                },
                {
                    name: 'js-regexp',
                    pattern: /\/[^\/\n]+\/[igm]{0,3}/g
                }
            ], true);        
        </script>
        <style>
            .js-comment { color: green; }
            .js-number, .js-regexp { color: red; }
        </style>
    </head>
    <body>
        <pre><code data-language="njs">100/0                       // Infinity
-100/0                      // -Infinity
</code></pre>
    </body>
</html>

Also note that with the current implementation:

_intersects(2, 3, 1, 4) === false
_intersects(2, 4, 1, 4) === false

Maybe my patterns are wrong, but something like this fails:

{ } == { }                  // false

The 'false' is highlighted as a keyword, which seems to be connected with the results of _intersects that I've shown above.

@ghost

ghost commented Mar 17, 2013

Moreover, I don't know what to think about the tests, because 18 tests are failing on the master branch. Are you maintaining them?

Owner

ccampbell commented Mar 17, 2013

Yeah, unfortunately most of the tests that are failing are ones that were added via pull requests. Seems like the 13 failures from D were added with #106. I should have checked that before merging that pull request. I will ping @workhorsy to get that resolved.

The other 5 failures are intentional failures to document that they should pass, but others and I have not gotten around to fixing them.

ccampbell referenced this pull request Mar 17, 2013

Merged

D support #106

Owner

ccampbell commented Mar 19, 2013

So this is definitely a tough problem. It seems to crop up mostly due to the javascript regex pattern matching. In your example above a simple fix would be adding a negative lookahead to your regex pattern.

So it would end up being

{
    name: 'js-regexp',
    pattern: /\/[^\/\n]+\/[igm]{0,3}(?!\/)/g
}

This works because you are not allowing / in your regex matching when really \/ is valid in regex. Again it comes back to javascript not having support for lookbehind which is frustrating.

Also if I were to change the regex matching to be {1,3} it would make this issue less likely to come up but then basic regular expressions without modifiers would fail.

I think your idea of changing intersects might be the right approach, the problem is that breaks a bunch of other rules so I will have to look at it more closely.

ccampbell closed this Mar 19, 2013

ccampbell reopened this Mar 19, 2013

Owner

ccampbell commented Mar 19, 2013

Sorry for closing. I tried to type `` and then space bar but instead typed "tab tab space" by accident which is the keyboard shortcut to close and comment on a ticket. Ugh.

@ghost

ghost commented Mar 19, 2013

There seem to be many problems at once. On the one hand, regexps cause trouble. After some Googling, I confirmed my suspicion that it's impossible to write a lexer for JavaScript that would have a proper regexp token:

if (expr) /whatever/.test('text');

Either you do some hacks to the lexer, or the recognition will not be exact and you decide to have some margin of error.

The other thing is that I'm finding it a bit difficult to handle the way Rainbow overrides some rules with others. The more natural approach for me would be to have a proper lexer and an ability to add basic parser rules to be able to have the extra functionality of coloring types, variables, etc.

Owner

ccampbell commented Mar 19, 2013

I'm sorry you are having trouble. I hear what you are saying. Using a lexer/parser is definitely a more reliable way to parse a language. That said it would make the library way bigger (and possibly slower) and would essentially go against the two main design goals (simplicity and size). Rainbow was modeled after textmate's tmlanguage files which I think are pretty intuitive.

If you have other examples of things you are having trouble with I'd be happy to try and help. After all, most of the languages that rainbow highlights I would argue that it does a pretty decent job. There are also plenty of other JavaScript syntax highlighting libraries out there that you could try.

@ghost

ghost commented Mar 19, 2013

Yep, I think you're right - what I like in Rainbow is that it's small.
I think that the source of my trouble is that I wanted to play around with JS grammar from scratch. I think I'll go back and tweak the original one.

As to the issue, I'm not sure what to do with it now. I think you can just close it.
Thanks for your support.

Owner

ccampbell commented Jul 3, 2016

Closing this since it seems to be working more or less as expected

ccampbell closed this Jul 3, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment