-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Fixed greedy matching bug #2032
Conversation
Yeah, I'd strongly prefer @Golmote take a look at this. It also may be worth getting actual measurements on how much this impacts performance. |
I was thinking about adding a benchmark which would compare the local Prism version against the one of Prism's master branch. Suggestion for what language/code? |
DangerJS would work for this: https://danger.systems/js/ Ideally, I'd like to end up on GitHub actions and drop Travis, but it's still in beta. |
That would be awesome but we still have to implement the benchmark ourselves. |
Only one I know offhand is this: https://github.com/bestiejs/benchmark.js |
The code looks OK to me. Greedy matching has always been subject to a performance issue I guess. But I can't think of a way to leverage that without rewriting everything from scratch haha. A benchmark would be nice tho indeed, to get the proper measure of it. 👍 |
I re-implemented this after #1909. This actually increased the performance quite a bit. Pre #1909, it was on average about 1.3x slower (if I remember correctly) than the non-fixed version. The average is now 1.06x slower ranging from 1.12x faster to 1.54x slower. @Golmote, could you please give this another look? Benchmark
A small explanation on why we see this performance range (1.12x faster to 1.54x slower): The basic idea behind this fix is that when a greedy token changes the token stream, it can only affect other tokens within a specific range. Within that range, we then have to rematch and that can extend the range further causing even more rematching (this can be thought of as the changes propagating through the token stream). This explains both the speedup and the slowdown. |
it has been almost a year, what's going on? I need this fix... |
@yairEO The main problem is that there's nobody to review it. This is a fundamental change to the core matching algorithm, so it better correct. I had to reimplement this after #1909, so I was reluctant to merge this even after @Golmote's approval (that was given before #1909). @mAAdhaTTah @Golmote |
This fixes #1492 and all issues caused by it.
Instead of the old rematching algorithm, which only allowed for at most one match, the new one matches until a certain point is reached (called
rematchReach
).The idea is that greedy matches if the change the token stream, won't change the entire token stream but only a small part of it. I call this the reach of the rematching. The idea is to rematch everything within that small part of the token stream.
While being correct, this can also be a lot slower than the old implementation because greedy tokens can expand the current rematching reach which might lead to the whole text being match again and again until the text is matched correctly. So one should keep in mind that greedy now works correctly but also got more expensive.
Notes
I refactored parts of the Core code.
This also fixes another bug greedy matching had. Greedy matching could (in certain edge cases) generate a token stream with adjacent strings. This is a problem because non-greedy matching won't work correctly then.
This was caused by
!strarr[k - 1].greedy
here. I removed it because it doesn't make any sense to me.Because this was the only usage of the
.greedy
property of theToken
class, I removed it entirely.