Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

template: speedup lexer #1335

Merged
merged 4 commits into from Aug 29, 2022
Merged

Conversation

jo3-l
Copy link
Contributor

@jo3-l jo3-l commented Aug 27, 2022

Speed up the template lexer by ~4x.

name                 old time/op    new time/op    delta
Parse/lorem_ipsum-4    2.01µs ± 7%    0.73µs ± 2%  -63.81%  (p=0.000 n=7+8)
Parse/short-4          70.2µs ± 1%    15.2µs ± 1%  -78.39%  (p=0.000 n=7+8)
Parse/medium-4          165µs ± 1%      38µs ± 1%  -77.05%  (p=0.000 n=8+8)
Parse/long-4            517µs ± 1%     117µs ± 5%  -77.31%  (p=0.000 n=8+8)
Parse/very-long-4      1.79ms ± 1%    0.40ms ± 1%  -77.67%  (p=0.000 n=8+8)

name                 old alloc/op   new alloc/op   delta
Parse/lorem_ipsum-4    1.23kB ± 0%    1.16kB ± 0%   -5.56%  (p=0.000 n=8+8)
Parse/short-4          5.48kB ± 0%    5.08kB ± 0%   -7.30%  (p=0.000 n=8+8)
Parse/medium-4         12.7kB ± 0%    12.7kB ± 0%     ~     (all equal)
Parse/long-4           35.3kB ± 0%    34.4kB ± 0%   -2.75%  (p=0.000 n=8+8)
Parse/very-long-4       113kB ± 0%     110kB ± 0%   -2.42%  (p=0.000 n=8+8)

name                 old allocs/op  new allocs/op  delta
Parse/lorem_ipsum-4      12.0 ± 0%      10.0 ± 0%  -16.67%  (p=0.000 n=8+8)
Parse/short-4             109 ± 0%        93 ± 0%  -14.68%  (p=0.000 n=8+8)
Parse/long-4              779 ± 0%       705 ± 0%   -9.50%  (p=0.000 n=8+8)
Parse/very-long-4       2.61k ± 0%     2.35k ± 0%   -9.84%  (p=0.000 n=8+8)

Best reviewed commit-by-commit; see associated messages for justification and intermediate benchmark results.

Note that although this is a large diff, a large portion of it is due to benchmark fixtures which may be ignored.

Taken from the yagpdb-cc repository. To avoid licensing issues, I only used
programs written by me.

Baseline performance:

name                 time/op
Parse/lorem_ipsum-4  2.01µs ± 7%
Parse/short-4        70.2µs ± 1%
Parse/medium-4        165µs ± 1%
Parse/long-4          517µs ± 1%
Parse/very-long-4    1.79ms ± 1%

name                 alloc/op
Parse/lorem_ipsum-4  1.23kB ± 0%
Parse/short-4        5.48kB ± 0%
Parse/medium-4       12.7kB ± 0%
Parse/long-4         35.3kB ± 0%
Parse/very-long-4     113kB ± 0%

name                 allocs/op
Parse/lorem_ipsum-4    12.0 ± 0%
Parse/short-4           109 ± 0%
Parse/medium-4          298 ± 0%
Parse/long-4            779 ± 0%
Parse/very-long-4     2.61k ± 0%
Notably, this includes https://go-review.googlesource.com/c/go/+/421883,
which changes the lexer to not spawn a new goroutine.

name                 old time/op    new time/op    delta
Parse/lorem_ipsum-4    2.01µs ± 7%    0.73µs ± 2%  -63.56%  (p=0.000 n=7+8)
Parse/short-4          70.2µs ± 1%    16.6µs ± 1%  -76.32%  (p=0.000 n=7+8)
Parse/medium-4          165µs ± 1%      40µs ± 1%  -75.74%  (p=0.000 n=8+8)
Parse/long-4            517µs ± 1%     123µs ± 1%  -76.28%  (p=0.000 n=8+7)
Parse/very-long-4      1.79ms ± 1%    0.42ms ± 1%  -76.81%  (p=0.000 n=8+8)

name                 old alloc/op   new alloc/op   delta
Parse/lorem_ipsum-4    1.23kB ± 0%    1.16kB ± 0%   -5.56%  (p=0.000 n=8+8)
Parse/short-4          5.48kB ± 0%    5.42kB ± 0%   -1.17%  (p=0.000 n=8+8)
Parse/medium-4         12.7kB ± 0%    12.6kB ± 0%   -0.51%  (p=0.000 n=8+8)
Parse/long-4           35.3kB ± 0%    35.3kB ± 0%   -0.17%  (p=0.000 n=8+8)
Parse/very-long-4       113kB ± 0%     113kB ± 0%   -0.05%  (p=0.000 n=8+8)

name                 old allocs/op  new allocs/op  delta
Parse/lorem_ipsum-4      12.0 ± 0%      10.0 ± 0%  -16.67%  (p=0.000 n=8+8)
Parse/short-4             109 ± 0%       107 ± 0%   -1.83%  (p=0.000 n=8+8)
Parse/medium-4            298 ± 0%       296 ± 0%   -0.67%  (p=0.000 n=8+8)
Parse/long-4              779 ± 0%       777 ± 0%   -0.26%  (p=0.000 n=8+8)
Parse/very-long-4       2.61k ± 0%     2.61k ± 0%   -0.08%  (p=0.000 n=8+8)
lexer.next is a hot function, as demonstrated by profiling. Most programs
will consist of ASCII characters only, which we can optimize for. Ideally
DecodeRuneInString would be inlined here and this wouldn't be a problem at all,
but that won't be the case until golang/go#31666
is resolved.

name                 old time/op    new time/op    delta
Parse/lorem_ipsum-4     733ns ± 2%     733ns ± 2%    ~     (p=0.933 n=8+8)
Parse/short-4          16.6µs ± 1%    15.5µs ± 2%  -6.75%  (p=0.000 n=8+8)
Parse/medium-4         40.1µs ± 1%    38.7µs ± 1%  -3.51%  (p=0.000 n=8+8)
Parse/long-4            123µs ± 1%     115µs ± 1%  -5.86%  (p=0.001 n=7+7)
Parse/very-long-4       416µs ± 1%     396µs ± 1%  -4.70%  (p=0.000 n=8+8)

name                 old alloc/op   new alloc/op   delta
Parse/lorem_ipsum-4    1.16kB ± 0%    1.16kB ± 0%    ~     (all equal)
Parse/short-4          5.42kB ± 0%    5.42kB ± 0%    ~     (all equal)
Parse/medium-4         12.6kB ± 0%    12.6kB ± 0%    ~     (all equal)
Parse/long-4           35.3kB ± 0%    35.3kB ± 0%    ~     (all equal)
Parse/very-long-4       113kB ± 0%     113kB ± 0%    ~     (all equal)

name                 old allocs/op  new allocs/op  delta
Parse/lorem_ipsum-4      10.0 ± 0%      10.0 ± 0%    ~     (all equal)
Parse/short-4             107 ± 0%       107 ± 0%    ~     (all equal)
Parse/medium-4            296 ± 0%       296 ± 0%    ~     (all equal)
Parse/long-4              777 ± 0%       777 ± 0%    ~     (all equal)
Parse/very-long-4       2.61k ± 0%     2.61k ± 0%    ~     (all equal)
CommandNode.append and by extension runtime.growslice was showing up
more than expected during profiling. Allocate enough space for four
arguments up front so we don't need to reallocate as much. Although
this doesn't benefit performance much, it does have a clear positive effect on
memory usage.

name                 old time/op    new time/op    delta
Parse/lorem_ipsum-4     733ns ± 2%     728ns ± 2%     ~     (p=0.315 n=8+8)
Parse/short-4          15.5µs ± 2%    15.2µs ± 1%   -2.12%  (p=0.001 n=8+8)
Parse/medium-4         38.7µs ± 1%    38.0µs ± 1%   -1.99%  (p=0.000 n=8+8)
Parse/long-4            115µs ± 1%     117µs ± 5%     ~     (p=0.281 n=7+8)
Parse/very-long-4       396µs ± 1%     400µs ± 1%   +1.02%  (p=0.002 n=8+8)

name                 old alloc/op   new alloc/op   delta
Parse/lorem_ipsum-4    1.16kB ± 0%    1.16kB ± 0%     ~     (all equal)
Parse/short-4          5.42kB ± 0%    5.08kB ± 0%   -6.20%  (p=0.000 n=8+8)
Parse/medium-4         12.6kB ± 0%    12.7kB ± 0%   +0.51%  (p=0.000 n=8+8)
Parse/long-4           35.3kB ± 0%    34.4kB ± 0%   -2.58%  (p=0.000 n=8+8)
Parse/very-long-4       113kB ± 0%     110kB ± 0%   -2.37%  (p=0.000 n=8+8)

name                 old allocs/op  new allocs/op  delta
Parse/lorem_ipsum-4      10.0 ± 0%      10.0 ± 0%     ~     (all equal)
Parse/short-4             107 ± 0%        93 ± 0%  -13.08%  (p=0.000 n=8+8)
Parse/medium-4            296 ± 0%       276 ± 0%   -6.76%  (p=0.000 n=8+8)
Parse/long-4              777 ± 0%       705 ± 0%   -9.27%  (p=0.000 n=8+8)
Parse/very-long-4       2.61k ± 0%     2.35k ± 0%   -9.77%  (p=0.000 n=8+8)
@ashishjh-bst ashishjh-bst merged commit 5ec0a82 into botlabs-gg:dev Aug 29, 2022
@jo3-l jo3-l deleted the perf/speedup-template-lexer branch June 29, 2023 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants