Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up in the presence of timeouts. #58

Merged
merged 1 commit into from Jan 13, 2023
Merged

Conversation

ghemawat
Copy link
Contributor

@ghemawat ghemawat commented Jan 9, 2023

Currently regexp2 can be quite slow when a MatchTimeout is supplied (a micro-benchmark shows 2300ns compared to 34ns when no timeout is supplied). This slowdown is caused by repeated timeout checks which call time.Now().

The new approach introduces a fast but approximate clock that is just an atomic variable updated by a goroutine once very 10ms. The timeout check can now compare this variable.

Removed "timeout check skip" mechanism since a timeout check is now very cheap.

Added a simple micro-benchmark that compares the speed of searching 100 byte text with and without a timeout.

Performance impact:

  1. A micro-benchmark that looks for an "easy" regexp in a 100 byte string goes from ~2300ns to ~34ns.
  2. Chrome (syntax highlighter) speeds up from ~500ms to ~50ms on a 24KB source file.

@dop251
Copy link
Contributor

dop251 commented Jan 9, 2023

Hi. Sorry to intervene and I may be missing something but would it be more efficient just to do time.AfterFunc() (which doesn't even start a goroutine) that would set the atomic value after the deadline has been reached, rather than waking up every 10ms?

@ghemawat
Copy link
Contributor Author

ghemawat commented Jan 9, 2023

I haven't measured AfterFunc, but it is likely to be slower by a lot (fast matches are only 20-30ns, AfterFunc does allocation, channel creation, priority queue management etc.). Furthermore, the AfterFunc result will have to be canceled when the search finishes early (the common case).

I can measure it if necessary.

@dop251
Copy link
Contributor

dop251 commented Jan 9, 2023

It does an allocation, that's true, but it does not create a channel and does not start a goroutine, and the queue management is done in the runtime, quite efficiently I would hope... It also does not need to be cancelled, if the atomic is set after nobody needs it, it's no big deal.

Overall I think you're probably right, but I'd still measure it just in case...

@dop251
Copy link
Contributor

dop251 commented Jan 9, 2023

BTW, do I get it right that with your solution if I set the deadline to a minute and the search finishes in 20ms, it will still be waking up every 10ms for the remaining almost a minute? I can't see clockEnd being shortened anywhere...

@ghemawat
Copy link
Contributor Author

ghemawat commented Jan 9, 2023

  1. I tried using time.AfterFunc. The time for matching in a 100 byte string jumped from ~45ns to ~360ns (baseline is slightly different than in the commit message since this measurement was done on a different machine).

  2. Yes, if you set a deadline of a minute, the go routine will run for a minute (+ one second). Though note that at most one goroutine runs, and if a caller repeatedly adds timeouts, the existing goroutine will keep running.

@dop251
Copy link
Contributor

dop251 commented Jan 9, 2023

I still think it's rather wasteful, even if it's just one goroutine. Maybe add a means to reset clockEnd when the deadline is no longer needed? It's a simple atomic set, should be too taxing.

@ghemawat
Copy link
Contributor Author

ghemawat commented Jan 9, 2023

I don't understand how to safely reset clockEnd in a simple way. There may be many regexp matches in flight with different deadlines. To safely reset clockEnd, we would need to keep track of all live deadlines (including removal when done). That would add back a priority queue which is one of the main costs in the time.AfterFunc implementation.

@dop251
Copy link
Contributor

dop251 commented Jan 9, 2023

Maybe instead of resetting clockEnd have an atomic usage counter and stop the goroutine if it goes to zero?

@dop251
Copy link
Contributor

dop251 commented Jan 9, 2023

Actually, this counter could replace clockEnd.

@ghemawat
Copy link
Contributor Author

The counter is perhaps a good idea; it could simplify the code. Two things that worry me:

(1) The logic for starting the clock updater will become more complicated (makeDeadline will need to detect when it is the first thing added to a zero counter).

(2) Runners will modify shared state which will make the cache line containing the counter bounce back and forth between cores.

I can try to measure it when I get a chance but I would like to have Doug give his opinion first.

@dlclark
Copy link
Owner

dlclark commented Jan 10, 2023

I'm fine with 1 extra goroutine total if the consumer uses a regex with a timeout. It's a plus if the goroutine goes away after the last timeout expires. We should document that behavior though in case people see it in debug dumps.

@dop251
Copy link
Contributor

dop251 commented Jan 10, 2023

Goroutine in itself is not a problem, the problem is that it is waking up every 10ms doing something rather resource consuming. And, with how the PR stands now, it will continue doing so for the whole duration of the timeout, even if the search finishes earlier.

The change is optimised for a very specific case, but it's no silver bullet: the work is shifted to another goroutine and unless you're constantly running multiple regexes in parallel, the overall impact on performance is most likely going to be negative.

Luckily I'm not using timeouts, but if I were I'd be inclined to fork the project if the change is accepted as is.

@dlclark
Copy link
Owner

dlclark commented Jan 10, 2023

@dop251 I appreciate the perspective and I think it's worth digging into to make sure I understand why this PR would cause you to fork if you used timeouts.

The current timeout code checks the wall-clock time every X iterations of the regex engine. This is both very slow and has a non-instant granularity even with a single regex running.

My understanding is that this PR has the benefit of improving the timeout check performance by ~6000% for the cost of having an extra goroutine and some additional code complexity. If you were using Timeouts then in theory you'd see your regexp2 performance dramatically improve if this PR was merged. It sounds like that's not how you see it so I'd definitely like to understand.

@ghemawat it makes sense to add a new benchmark to confirm the worst-case performance. Theory is nice, but I'd feel more comfortable with some data. How do small, super fast regexes with expanding, long timeouts compare between the 2 versions? I.e. compare the lowest-cost scenario in the old code (not enough cycles in the match to trigger the date check) with the highest-cost version in the new code (lots of "move the last date check" synchronization). If there are other "worst case" scenarios you can imagine feel free to add them as well.

@dop251
Copy link
Contributor

dop251 commented Jan 10, 2023

I'll try to explain where I'm coming from. The timeout functionality is most likely used as a safety measure to protect against 'catastrophic backtracks' (at least this is how I would use it), i.e. the timeouts are never meant to trigger and therefore I would set it to a rather large value (maybe a minute).

Now, imagine there is a normal regex that finishes relatively quickly (let's say 1ms) not triggering the timeout, which will be a typical case. Assume this was a one-off run, i.e. the code is not running any regexes for quite some time after this. After the regex has finished the goroutine will continue to run for the whole minute (and a bit more) waking up every 10ms doing all the stuff that was causing the slowdown in the first place (like getting the current time and scheduling). But what's worse is that in the described scenario this work is done for nothing. So, this:

My understanding is that this PR has the benefit of improving the timeout check performance by ~6000% for the cost of having an extra goroutine and some additional code complexity.

does not hold in this scenario. The cost is not just an extra goroutine and some additional code complexity. The overall performance will suffer. Not the regex itself, but the application in general, because there will be a goroutine running in parallel, wasting the CPU.

To improve things I would consider terminating the goroutine as soon as possible, i.e. when nobody needs it rather than when the last timeout expires. My proposal with the usage counter could help although I share Sanjay's concerns about the cache. It needs to be benchmarked, but I'd argue that even if it results in a slightly worse latency it's still worth doing because the overall performance will be better in the scenario I described above.

Also, I would consider reducing the granularity from 10ms to 50 or maybe even 100, bearing in mind that timeouts are safety measure and not meant to trigger.

@ghemawat
Copy link
Contributor Author

Worst case performance

do small, super fast regexes with expanding, long timeouts compare between the 2 versions

Good idea. It probably won't happen in realistic scenarios, but knowing the worst case is useful. I added a case to the benchmark that increases the timeout on each iteration to force makeDeadline to go into the extendClock path every time. The results:

name                               time/op
ShortSearch/no-timeout-12            45.0ns ± 4%
ShortSearch/fixed-timeout-12         45.6ns ± 7%
ShortSearch/increasing-timeout-12    65.5ns ± 2%

So yes, it does add about 20ns, but still much better than where we started from.

Does that cover the case you were thinking of?

Increasing the clock period

Also, I would consider reducing the granularity from 10ms to 50 or maybe even 100,

I agree with this.

Wakeup cost

I measured the cost of having a goroutine that wakes up every 10ms. Compared to a program that sleeps without doing anything, the periodic wakeup seems to add about 0.9s of CPU for a 60 second run. That's 0.9s for 6000 wakeups which works out to 150µs per wakeup.

Alternative approach

I had another thought based on the observation that all runners are tracked by getRunner/putRunner. We are already paying the cache traffic cost for the shared state used by those two functions.

If mixing concerns in the code is not an issue, we could change getRunner/putRunner to maintain a count of live runners that have a timeout (protected by re.muRun Mutex). The background clock updater would stop if it observes a zero count. This would mean at most one extra time the goroutine wakes up. Though we have to be careful that we don't fall into the following pattern:

match:
  runner created
  goroutine starts
  runner stopped
  goroutine ends
match:
  runner created
  goroutine starts
  runner stopped
  goroutine ends

Goroutine creation will happen at most 1/clockPeriod times per second (e.g., 100x/s with current 10ms clock). This is another reason to increase the clockPeriod.

WDYT?

@ghemawat
Copy link
Contributor Author

I realized a mistake in my suggested alternative approach: re.muRun is per regexp, so it is unsuitable for protecting a count of active deadlines across all runners.

I tried out the counter approach (stored in an atomic variable). Serial costs:

50ns  no timeout
50ns  clockEnd approach (what is currently in the PR)
70ns  counter approach

Parallel costs (one copy of benchmark running on each of the 12 hyperthreads on my machine):

400ns no timeout
400ns clockEnd approach
730ns counter approach

I prefer the clockEnd approach with the clock period increased to 100ms (which will bring the background goroutine overhead to 0.15% when running). But I can live with either approach and will let Doug decide which one he prefers.

@dlclark
Copy link
Owner

dlclark commented Jan 11, 2023

A few thoughts:

  1. I'm fine with changing the sleep to 100ms and keeping the clockEnd approach.
  2. I'd like the detailed Timeout behavior documented in the README.md included in the PR. Be sure to include details on the timer resolution and notes on performance considerations users should be aware of when using Timeouts.
  3. I now realize that the MatchTimeout code isn't well unit tested. @ghemawat are you willing to add some unit tests around the behavior of MatchTimeout? I'm thinking at least some black-box style testing to make sure timeouts are relatively close to the requested wall clock time, using timeouts and not needing them, multiple regex's across goroutines with timeouts, etc.

@dlclark
Copy link
Owner

dlclark commented Jan 11, 2023

Also I should say I really appreciate the thoughtful dialog on this PR @dop251 and @ghemawat.

@ghemawat
Copy link
Contributor Author

Thanks. I will make the changes as suggested (including tests). I won't be able to get to this until Friday though.

@ghemawat
Copy link
Contributor Author

Quick question though: would you prefer me to make further changes in separate commits added to this PR, or do you prefer everything in the PR squashed together in a single commit?

@dlclark
Copy link
Owner

dlclark commented Jan 11, 2023

I prefer single commits (mainly to avoid merge commits), but it's not a requirement.

Currently regexp2 can be quite slow when a MatchTimeout is
supplied (a micro-benchmark shows 3000ns compared to 45ns
when no timeout is supplied). This slowdown is caused by
repeated timeout checks which call time.Now().

The new approach introduces a fast but approximate clock that
is just an atomic variable updated by a goroutine once very
100ms. The new timeout check just compares this variable to
the precomputed deadline.

Removed "timeout check skip" mechanism since a timeout
check is now very cheap.

Added a simple micro-benchmark that compares the speed
of searching 100 byte text with and without a timeout.

Performance impact:
1. A micro-benchmark that looks for an "easy" regexp in a 100
   byte string goes from ~3000ns to ~45ns.
2. Chroma (syntax highlighter) speeds up from ~500ms to ~50ms
   on a 24KB source file.
3. A background CPU load of ~0.15% is present until the end of
   of all match deadlines (even for matches that have finished).
@ghemawat
Copy link
Contributor Author

I pushed a new commit with the requested changes. The new changes are:

  1. README.md has a new section on timeouts.
  2. clockPeriod is now 100ms.
  3. Regexp.MatchTimeout comment has been extended.
  4. TestBacktrack_CatastrophicTimeout has been extended to check several timeout values and check that the timeout does not occur too early or too late.
  5. The commit message has been updated.

@dlclark dlclark merged commit af93f4c into dlclark:master Jan 13, 2023
@dlclark
Copy link
Owner

dlclark commented Jan 13, 2023

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants