Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major speed regression for gron.awk in goawk 1.11.0+ #93

Closed
xonixx opened this issue Jan 23, 2022 · 2 comments · Fixed by #95
Closed

Major speed regression for gron.awk in goawk 1.11.0+ #93

xonixx opened this issue Jan 23, 2022 · 2 comments · Fixed by #95

Comments

@xonixx
Copy link
Contributor

xonixx commented Jan 23, 2022

I know that this is sort of expected as it changes the logic of string-related functions to be more POSIX-compliant.
And I can confirm that adding -b flag makes it as fast as before.

BUT! What is very interesting, GAWK is able to show very close (I would say, the same) speed for with and without -b switch!
Did they find a way to make string functions O(1), not O(N)?
If yes, shell we use the same approach for GoAWK?

If need be I can provide more info and instructions to reproduce the results I've found.

=====================
Benchmarking gawk51 -b ...
24571

real    0m1.150s
user    0m1.148s
sys     0m0.008s

=====================
Benchmarking gawk51 ...
24571

real    0m1.179s
user    0m1.182s
sys     0m0.000s

=====================
Benchmarking mawk134 ...
24571

real    0m0.704s
user    0m0.711s
sys     0m0.000s

=====================
Benchmarking goawk1.13.0 -b ...
24571

real    0m1.597s
user    0m1.882s
sys     0m0.077s

=====================
Benchmarking goawk1.13.0 ...
(HANGS...)

The command I use for benchmark is

time $AWK -f gron.awk test_data/big.json | wc -l

The files gron.awk and test_data/big.json can be found in https://github.com/xonixx/gron.awk repo.

@benhoyt
Copy link
Owner

benhoyt commented Jan 23, 2022

Oooh, this is good to know, thank you! And yes, the kinds of things gron does is basically the worst case for this (lots of string length / substr operations). I'm glad you can use the -b switch as a stop-gap.

I've tested locally, and goawk -b (on my bytecode branch) takes 1.1s, whereas goawk without -b took 8m24s. Yikes!

I guess we're essentially seeing "accidentally quadratic" behavior here (because individual string length operations are O(N) now, and there are N of them)?

Gawk probably determines the string length as its splitting the input. We would have to do that in GoAWK to improve this. It might slow down others things, though, and wouldn't be trivial, which is why I didn't do that at first.

Thanks for the report. Maybe making -b the default (with its current performance characteristics) was a bad idea. I think the gron.awk case shows this is bad enough I should probably change back to bytes as the default -- thoughts?

@benhoyt
Copy link
Owner

benhoyt commented Jan 24, 2022

I'm thinking of just reverting the change from bytes to unicode outright (and hope not many people are depending on interp.Config.Bytes or goawk -b yet -- unlikely, seeing it just came out a few weeks ago). This is annoying but I think it's the right thing -- the O(N) behavior this introduced is obviously not a good thing. And I can't think of a simple way to fix it, so better to revert quickly and then fix slowly later. Thoughts?

benhoyt added a commit that referenced this issue Jan 24, 2022
The reason is because the new O(N) behavior of these functions was just
too problematic, and caused "accidentally quadratic" issues such as:
#93

This reverts commit b7ec795, but
leaves in the new bytes tests that were added.

Fixes #93
benhoyt added a commit that referenced this issue Jan 25, 2022
…ch (#95)

The reason is because the new O(N) behavior of these functions was just
too problematic, and caused "accidentally quadratic" issues such as:
#93

A JSON-processing script that previously took 1 second now took over 8
minutes. That's not tenable. The expectation is clearly that these functions
are O(N).

This commit reverts b7ec795, but
leaves in the new bytes tests that were added.

Fixes #93
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants