template: speedup lexer #1335

jo3-l · 2022-08-27T04:58:53Z

Speed up the template lexer by ~4x.

name                 old time/op    new time/op    delta
Parse/lorem_ipsum-4    2.01µs ± 7%    0.73µs ± 2%  -63.81%  (p=0.000 n=7+8)
Parse/short-4          70.2µs ± 1%    15.2µs ± 1%  -78.39%  (p=0.000 n=7+8)
Parse/medium-4          165µs ± 1%      38µs ± 1%  -77.05%  (p=0.000 n=8+8)
Parse/long-4            517µs ± 1%     117µs ± 5%  -77.31%  (p=0.000 n=8+8)
Parse/very-long-4      1.79ms ± 1%    0.40ms ± 1%  -77.67%  (p=0.000 n=8+8)

name                 old alloc/op   new alloc/op   delta
Parse/lorem_ipsum-4    1.23kB ± 0%    1.16kB ± 0%   -5.56%  (p=0.000 n=8+8)
Parse/short-4          5.48kB ± 0%    5.08kB ± 0%   -7.30%  (p=0.000 n=8+8)
Parse/medium-4         12.7kB ± 0%    12.7kB ± 0%     ~     (all equal)
Parse/long-4           35.3kB ± 0%    34.4kB ± 0%   -2.75%  (p=0.000 n=8+8)
Parse/very-long-4       113kB ± 0%     110kB ± 0%   -2.42%  (p=0.000 n=8+8)

name                 old allocs/op  new allocs/op  delta
Parse/lorem_ipsum-4      12.0 ± 0%      10.0 ± 0%  -16.67%  (p=0.000 n=8+8)
Parse/short-4             109 ± 0%        93 ± 0%  -14.68%  (p=0.000 n=8+8)
Parse/long-4              779 ± 0%       705 ± 0%   -9.50%  (p=0.000 n=8+8)
Parse/very-long-4       2.61k ± 0%     2.35k ± 0%   -9.84%  (p=0.000 n=8+8)

Best reviewed commit-by-commit; see associated messages for justification and intermediate benchmark results.

Note that although this is a large diff, a large portion of it is due to benchmark fixtures which may be ignored.

Taken from the yagpdb-cc repository. To avoid licensing issues, I only used programs written by me. Baseline performance: name time/op Parse/lorem_ipsum-4 2.01µs ± 7% Parse/short-4 70.2µs ± 1% Parse/medium-4 165µs ± 1% Parse/long-4 517µs ± 1% Parse/very-long-4 1.79ms ± 1% name alloc/op Parse/lorem_ipsum-4 1.23kB ± 0% Parse/short-4 5.48kB ± 0% Parse/medium-4 12.7kB ± 0% Parse/long-4 35.3kB ± 0% Parse/very-long-4 113kB ± 0% name allocs/op Parse/lorem_ipsum-4 12.0 ± 0% Parse/short-4 109 ± 0% Parse/medium-4 298 ± 0% Parse/long-4 779 ± 0% Parse/very-long-4 2.61k ± 0%

Notably, this includes https://go-review.googlesource.com/c/go/+/421883, which changes the lexer to not spawn a new goroutine. name old time/op new time/op delta Parse/lorem_ipsum-4 2.01µs ± 7% 0.73µs ± 2% -63.56% (p=0.000 n=7+8) Parse/short-4 70.2µs ± 1% 16.6µs ± 1% -76.32% (p=0.000 n=7+8) Parse/medium-4 165µs ± 1% 40µs ± 1% -75.74% (p=0.000 n=8+8) Parse/long-4 517µs ± 1% 123µs ± 1% -76.28% (p=0.000 n=8+7) Parse/very-long-4 1.79ms ± 1% 0.42ms ± 1% -76.81% (p=0.000 n=8+8) name old alloc/op new alloc/op delta Parse/lorem_ipsum-4 1.23kB ± 0% 1.16kB ± 0% -5.56% (p=0.000 n=8+8) Parse/short-4 5.48kB ± 0% 5.42kB ± 0% -1.17% (p=0.000 n=8+8) Parse/medium-4 12.7kB ± 0% 12.6kB ± 0% -0.51% (p=0.000 n=8+8) Parse/long-4 35.3kB ± 0% 35.3kB ± 0% -0.17% (p=0.000 n=8+8) Parse/very-long-4 113kB ± 0% 113kB ± 0% -0.05% (p=0.000 n=8+8) name old allocs/op new allocs/op delta Parse/lorem_ipsum-4 12.0 ± 0% 10.0 ± 0% -16.67% (p=0.000 n=8+8) Parse/short-4 109 ± 0% 107 ± 0% -1.83% (p=0.000 n=8+8) Parse/medium-4 298 ± 0% 296 ± 0% -0.67% (p=0.000 n=8+8) Parse/long-4 779 ± 0% 777 ± 0% -0.26% (p=0.000 n=8+8) Parse/very-long-4 2.61k ± 0% 2.61k ± 0% -0.08% (p=0.000 n=8+8)

lexer.next is a hot function, as demonstrated by profiling. Most programs will consist of ASCII characters only, which we can optimize for. Ideally DecodeRuneInString would be inlined here and this wouldn't be a problem at all, but that won't be the case until golang/go#31666 is resolved. name old time/op new time/op delta Parse/lorem_ipsum-4 733ns ± 2% 733ns ± 2% ~ (p=0.933 n=8+8) Parse/short-4 16.6µs ± 1% 15.5µs ± 2% -6.75% (p=0.000 n=8+8) Parse/medium-4 40.1µs ± 1% 38.7µs ± 1% -3.51% (p=0.000 n=8+8) Parse/long-4 123µs ± 1% 115µs ± 1% -5.86% (p=0.001 n=7+7) Parse/very-long-4 416µs ± 1% 396µs ± 1% -4.70% (p=0.000 n=8+8) name old alloc/op new alloc/op delta Parse/lorem_ipsum-4 1.16kB ± 0% 1.16kB ± 0% ~ (all equal) Parse/short-4 5.42kB ± 0% 5.42kB ± 0% ~ (all equal) Parse/medium-4 12.6kB ± 0% 12.6kB ± 0% ~ (all equal) Parse/long-4 35.3kB ± 0% 35.3kB ± 0% ~ (all equal) Parse/very-long-4 113kB ± 0% 113kB ± 0% ~ (all equal) name old allocs/op new allocs/op delta Parse/lorem_ipsum-4 10.0 ± 0% 10.0 ± 0% ~ (all equal) Parse/short-4 107 ± 0% 107 ± 0% ~ (all equal) Parse/medium-4 296 ± 0% 296 ± 0% ~ (all equal) Parse/long-4 777 ± 0% 777 ± 0% ~ (all equal) Parse/very-long-4 2.61k ± 0% 2.61k ± 0% ~ (all equal)

CommandNode.append and by extension runtime.growslice was showing up more than expected during profiling. Allocate enough space for four arguments up front so we don't need to reallocate as much. Although this doesn't benefit performance much, it does have a clear positive effect on memory usage. name old time/op new time/op delta Parse/lorem_ipsum-4 733ns ± 2% 728ns ± 2% ~ (p=0.315 n=8+8) Parse/short-4 15.5µs ± 2% 15.2µs ± 1% -2.12% (p=0.001 n=8+8) Parse/medium-4 38.7µs ± 1% 38.0µs ± 1% -1.99% (p=0.000 n=8+8) Parse/long-4 115µs ± 1% 117µs ± 5% ~ (p=0.281 n=7+8) Parse/very-long-4 396µs ± 1% 400µs ± 1% +1.02% (p=0.002 n=8+8) name old alloc/op new alloc/op delta Parse/lorem_ipsum-4 1.16kB ± 0% 1.16kB ± 0% ~ (all equal) Parse/short-4 5.42kB ± 0% 5.08kB ± 0% -6.20% (p=0.000 n=8+8) Parse/medium-4 12.6kB ± 0% 12.7kB ± 0% +0.51% (p=0.000 n=8+8) Parse/long-4 35.3kB ± 0% 34.4kB ± 0% -2.58% (p=0.000 n=8+8) Parse/very-long-4 113kB ± 0% 110kB ± 0% -2.37% (p=0.000 n=8+8) name old allocs/op new allocs/op delta Parse/lorem_ipsum-4 10.0 ± 0% 10.0 ± 0% ~ (all equal) Parse/short-4 107 ± 0% 93 ± 0% -13.08% (p=0.000 n=8+8) Parse/medium-4 296 ± 0% 276 ± 0% -6.76% (p=0.000 n=8+8) Parse/long-4 777 ± 0% 705 ± 0% -9.27% (p=0.000 n=8+8) Parse/very-long-4 2.61k ± 0% 2.35k ± 0% -9.77% (p=0.000 n=8+8)

jo3-l added 4 commits August 26, 2022 21:32

ashishjh-bst merged commit 5ec0a82 into botlabs-gg:dev Aug 29, 2022

jo3-l deleted the perf/speedup-template-lexer branch June 29, 2023 23:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

template: speedup lexer #1335

template: speedup lexer #1335

jo3-l commented Aug 27, 2022

template: speedup lexer #1335

template: speedup lexer #1335

Conversation

jo3-l commented Aug 27, 2022